代码之家  ›  专栏  ›  技术社区  ›  Kamikaze_goldfish

Scrapy试图在python中获取企业名称href

  •  1
  • Kamikaze_goldfish  · 技术社区  · 7 年前

    我正在努力为黄页中的每一项业务获取href。我是一个非常新的使用刮痧和在我的第二天。我使用请求来获取实际的url,以便通过spider进行搜索。我的代码有什么问题?我想最终让scrapy去每家公司,搜集地址和其他信息。

    # -*- coding: utf-8 -*-
    import scrapy
    import requests
    
    search = "Plumbers"
    location = "Hammond, LA"
    url = "https://www.yellowpages.com/search"
    q = {'search_terms': search, 'geo_location_terms': location}
    page = requests.get(url, params=q)
    page = page.url
    
    class YellowpagesSpider(scrapy.Spider):
        name = 'quotes'
        allowed_domains = ['yellowpages.com']
        start_urls = [page]
    
        def parse(self, response):
            self.log("I just visited: " + response.url)
            items = response.css('span.text::text')
            for items in items:
                print(items)
    
    1 回复  |  直到 7 年前
        1
  •  2
  •   Thomas Strub    7 年前

    要获取名称,请使用:

    response.css('a[class=business-name]::text')
    

    要获取href,请使用:

    response.css('a[class=business-name]::attr(href)')
    

    在最后的通话中,这看起来像:

        for bas in response.css('a[class=business-name]'):
            item = { 'name' : bas.css('a[class=business-name]::text').extract_first(),
                      'url' : bas.css('a[class=business-name]::attr(href)').extract_first() }
            yield item
    

    结果:

    2018-09-13 04:12:49 [quotes] DEBUG: I just visited: https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Hammond%2C+LA
    2018-09-13 04:12:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Hammond%2C+LA>
    {'name': 'Roto-Rooter Plumbing & Water Cleanup', 'url': '/new-orleans-la/mip/roto-rooter-plumbing-water-cleanup-21804163?lid=149760174'}
    2018-09-13 04:12:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Hammond%2C+LA>
    {'name': "AJ's Plumbing And Heating Inc", 'url': '/new-orleans-la/mip/ajs-plumbing-and-heating-inc-16078566?lid=1001789407686'}
    ...