代码之家 › 专栏 › 技术社区 › gongarek

Scrapy中附加页的下一页

scrapy web-scraping python-3.x

-1

gongarek · 技术社区 · 7 年前

为什么我不能转到parse\u next中的下一页并合并某个date do对象?

def parse(self, response):
    item = TItem()
    ...
    link_www = lekarz.xpath('whatever/@href').extract_first()
    request = scrapy.Request(link_www, callback=self.parse_next)
    request.meta['item'] = item
    yield request

    next_page = response.css('whenever::attr(href)').extract_first()
    if next_page is not None:
        yield response.follow(next_page, callback=self.parse)

attri = []

def parse_next(self, response):
    item = response.meta['item']
    self.attri.append(xpath("whatever")).extract_first

    next_pager = response.css('whatever_too_xd').extract_first()
    if next_pager is not None:
        yield response.follow(next_pager, callback=self.parse_next)
    else:
        item['hehe'] = self.attri
        yield item

输出:

KeyError:“项目”

为什么?

1 回复 | 直到 7 年前

stasdeep 7 年前

您没有将项目传递给回调。为此,只需添加 meta 参数到 response.follow 调用:

response.follow(next_page, callback=self.parse_next, meta={'item': item})

推荐文章

Lukinator · 为什么这个使用Selenium的网络爬虫不返回整个网站?

7 月前

user28864790 · 无法使用Python中的Selenium Webdriver在Chrome中登录网站

7 月前

babylinguist · 如何使用rechart模拟按钮点击

7 月前

Stackie · 无法使用Selenium访问废料数据的链接

7 月前

Avraham · 如何在JS中将beautifulsoup中的文本设置为.innerText而非.textContent

11 月前

Rayan CH TFG · 需要解释Python中的web抓取lambda函数

11 月前

hyoni · 使用Python BeautifulSoup进行网页抓取

11 月前

Canberra · 从网站上删除纬度和经度

12 月前

bill999 · 如何从抓取的网站创建数据框架,保留数据的嵌套结构

1 年前

CompressedSquid · 在TAMU餐饮网站上无法使用Selenium进行动态网页浏览

1 年前