代码之家  ›  专栏  ›  技术社区  ›  Peyman

为什么scrappy xpath找不到我的浏览器xpath找到的内容?

  •  0
  • Peyman  · 技术社区  · 6 年前

    我想在一个页面中通过xpath找到一些东西(scrappy的第一个项目),例如页面 https://github.com/rg3/youtube-dl/pull/11272 .

    在我的Opera Inspect和Firefox中 三路径 附加组件,此xpath表达式具有相同的结果:

    //div[@class='file js comment container js resolvable timeline thread container has inline notes']

    就像这样:

    enter image description here

    但是 在scrapy 1.6xpath中,当我想得到它的结果时,它找不到任何东西,只返回一个空列表。

     def parse(self, response):
        print(response.xpath('''//div[@class='file js-comment-container js-resolvable-timeline-thread-container has-inline-notes']'''))
    

    结果就是 [] .

    你认为问题出在哪里?我该怎么修?事先谢谢。

    注: 是的,我知道 机器人文本 甚至 ROBOTSTXT_OBEY = False

    1 回复  |  直到 6 年前
        1
  •  1
  •   stranac    6 年前

    似乎有些类是由javascript添加的。
    但是,如果您能够找到合适的选择器,那么即使没有执行javascript,您仍然能够选择要瞄准的div:

    >>> fetch('https://github.com/rg3/youtube-dl/pull/11272')
    2019-02-09 14:50:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://github.com/rg3/youtube-dl/pull/11272> (refere
    r: None)
    >>> response.css('div.file')
    [<Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" dat
    a='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ',
    normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant
    -or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comme
    nt-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '
    ), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and con
    tains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector
     xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div cl
    ass="file js-comment-container js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-
    space(@class), ' '), ' file ')]" data='<div class="file js-comment-container js'>, <Selector xpath="descendant-or-self::
    div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file ')]" data='<div class="file js-comment-contain
    er js'>, <Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' file
    ')]" data='<div class="file js-comment-container js'>]
    >>> len(_)
    9