代码之家 › 专栏 › 技术社区 › Jacob

Python剧作家定位器未返回预期值

playwright-python playwright python

Jacob · 技术社区 · 1 年前

我没有从下面的代码中得到预期的返回值。

from playwright.sync_api import sync_playwright
import time
import random

def main():
    with sync_playwright() as p:
        browser = p.firefox.launch(headless=False)
        page = browser.new_page()
        url = "https://www.useragentlist.net/"
        page.goto(url)
        time.sleep(random.uniform(2,4))

        test = page.locator('xpath=//span[1][@class="copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').inner_text()
        print(test)

        count = page.locator('xpath=//span["copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').count()
        print(count)


        browser.close()


if __name__ == '__main__':
    main()

page.locator().caunt()返回一个值0,我从上面的行中获取文本没有问题,但我需要访问所有元素,我的定位器和count的实现有什么问题?

1 回复 | 直到 1 年前

ggorlen Hoàng Huy Khánh 1 年前

您的第二个定位器XPath没有 @class= ,所以它与第一个有效的不同。将字符串存储在变量中,这样您就不必键入两次,也不会遇到复制粘贴或过时数据错误。

无论如何,你的方法似乎过于复杂。每个用户代理都在 <code> tag——只需刮掉它:

from playwright.sync_api import sync_playwright # 1.44.0


def main():
    with sync_playwright() as p:
        browser = p.firefox.launch()
        page = browser.new_page()
        url = "https://www.useragentlist.net/"
        page.goto(url, wait_until="domcontentloaded")
        agents = page.locator("code").all_text_contents()
        print(agents)
        browser.close()


if __name__ == "__main__":
    main()

定位器会自动等待,所以不需要睡觉。99%的时间避免使用XPath——它们很脆弱,难以阅读和维护。只需使用CSS选择器或用户可见的定位器。目标是选择最简单的选择器来消除你想要的元素的歧义,仅此而已。 span/pre/code/strong 是一个僵化的层次结构——如果其中一个发生了变化,你的代码就会不必要地中断。

顺便说一句,用户代理是在静态HTML中,所以除非你试图绕过一个块,否则你可以通过请求和Beautiful Soup更快地做到这一点:

from requests import get  # 2.31.0
from bs4 import BeautifulSoup  # 4.10.0

response = get("https://www.useragentlist.net")
response.raise_for_status()
print([x.text for x in BeautifulSoup(response.text, "lxml").select("code")])

更好的是(可能)使用像这样的库 fake_useragent 生成随机用户代理。

推荐文章

arnie · AttributeError:“NoneType”对象没有属性“fill”

10 月前

JD2775 · 文本输入未使用Playwright在表单中持久化

10 月前

LLScheme · 无法使用Playwright&Pyppeteer找到关键字

11 月前

Jacob · Python剧作家定位器未返回预期值

1 年前

MaduKan · 剧作家-页脚模板解析问题

1 年前

Miller90 · 在控制台中通过测试时出错

1 年前

Michael Durrant · 由于未定义pytest,跳过编写程序测试

1 年前

soccerway · 如何在CI中运行时记录剧作家测试的详细输出日志

1 年前

ShacoPoggers · 如何在剧作家python中获取tagName

1 年前

Kai Xu · 单击更新页面的按钮后,页面将刷新;测试/断言超时,有两个标题但没有一个

1 年前