代码之家  ›  专栏  ›  技术社区  ›  Henry Dashwood

使用AsyncioSelectorReactor进行scratch和pytest

  •  1
  • Henry Dashwood  · 技术社区  · 1 年前

    复制我的问题

    • python 3.12.1
    • 碎屑2.11.2
    • pytest 8.2.1

    在里面 bookspider.py 我有:

    from typing import Iterable
    
    import scrapy
    from scrapy.http import Request
    
    
    class BookSpider(scrapy.Spider):
        name = None
    
        def start_requests(self) -> Iterable[Request]:
            yield scrapy.Request("https://books.toscrape.com/")
    
        def parse(self, response):
            books = response.css("article.product_pod")
            for book in books:
                yield {
                    "name": self.name,
                    "title": book.css("h3 a::text").get().strip(),
                }
    

    在里面 test_bookspider.py 我有:

    import json
    import os
    
    from pytest_twisted import inlineCallbacks
    from scrapy.crawler import CrawlerRunner
    from twisted.internet import defer
    
    from bookspider import BookSpider
    
    
    @inlineCallbacks
    def test_bookspider():
        runner = CrawlerRunner(
            settings={
                "REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
                "FEEDS": {"books.json": {"format": "json"}},
                "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
                # "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
            }
        )
        yield runner.crawl(BookSpider, name="books")
    
        with open("books.json", "r") as f:
            books = json.load(f)
        assert len(books) >= 1
        assert books[0]["name"] == "books"
        assert books[0]["title"] == "A Light in the ..."
    
        os.remove("books.json")
    
        defer.returnValue(None)
    

    具有 "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor" 未注释,我得到以下错误:

    Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

    具有 "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor" 取消注释我的测试通过。

    有人能解释这种行为,以及更广泛地解释如何使用pytest测试CrawlerRunner或CrawlerProcess吗?

    1 回复  |  直到 1 年前
        1
  •  2
  •   wRAR    1 年前

    如果您使用 pytest-twisted 你需要告诉它通过 --reactor=asyncio 到您的pytest命令,否则它将安装默认的reactor。看见 https://github.com/pytest-dev/pytest-twisted#using-the-plugin

    如何使用pytest测试CrawlerRunner或CrawlerProcess?

    你不应该使用 CrawlerProcess 在pytest测试中,因为它会为您启动和停止反应器。如果你真的需要测试这些,你应该编写每个使用单个进程的测试 爬网程序进程 调用。