代码之家  ›  专栏  ›  技术社区  ›  Jaanus

零碎地将链接的请求合并为一个请求

  •  0
  • Jaanus  · 技术社区  · 7 年前

    最后我想结账。问题是,有了废链,它想结账的次数和我在篮子里的物品一样多。

    def start_requests(self):
        params = getShopList()
        for param in params:
            yield scrapy.FormRequest('https://foo.bar/shop', callback=self.addToBasket,
                                     method='POST', formdata=param)
    
    
    def addToBasket(self, response):
        yield scrapy.FormRequest('https://foo.bar/addToBasket', callback=self.checkoutBasket,
                                 method='POST',
                                 formdata=param)
    
    def checkoutBasket(self, response):
        yield scrapy.FormRequest('https://foo.bar/checkout', callback=self.final, method='POST',
                                 formdata=param)
    
    def final(self):
        print("Success, you have purchased 59 items")
    

    编辑:

      def closed(self, reason):
            if reason == "finished":
                print("spider finished")
                return scrapy.Request('https://www.google.com', callback=self.finalmethod)
            print("Spider closed but not finished.")
    
        def finalmethod(self, response):
            print("finalized")
    
    2 回复  |  直到 7 年前
        1
  •  0
  •   Sraw    7 年前

    我想蜘蛛完成后你可以手动结账:

    def closed(self, reason):
        if reason == "finished":
            return requests.post(checkout_url, data=param)
        print("Spider closed but not finished.")
    

    closed .

    class MySpider(scrapy.Spider):
        name = 'whatever'
    
        def start_requests(self):
            params = getShopList()
            for param in params:
                yield scrapy.FormRequest('https://foo.bar/shop', callback=self.addToBasket,
                                         method='POST', formdata=param)
    
    
        def addToBasket(self, response):
            yield scrapy.FormRequest('https://foo.bar/addToBasket',
                                     method='POST', formdata=param)
    
        def closed(self, reason):
            if reason == "finished":
                return requests.post(checkout_url, data=param)
            print("Spider closed but not finished.")
    
        2
  •  0
  •   Jaanus    7 年前

    我用零碎的信号解决了它 spider_idle

    当蜘蛛空闲时发送,这意味着蜘蛛没有

    • 正在请求计划项目

    https://doc.scrapy.org/en/latest/topics/signals.html

    from scrapy import signals, Spider
    
    class MySpider(scrapy.Spider):
        name = 'whatever'
    
        def start_requests(self):
            self.crawler.signals.connect(self.spider_idle, signals.spider_idle) ## notice this
            params = getShopList()
            for param in params:
                yield scrapy.FormRequest('https://foo.bar/shop', callback=self.addToBasket,
                                         method='POST', formdata=param)
    
    
        def addToBasket(self, response):
            yield scrapy.FormRequest('https://foo.bar/addToBasket',
                                     method='POST', formdata=param)
    
        def spider_idle(self, spider): ## when all requests are finished, this is called
            req = scrapy.Request('https://foo.bar/checkout', callback=self.checkoutFinished)
            self.crawler.engine.crawl(req, spider)
    
        def checkoutFinished(self, response):
            print("Checkout finished")