代码之家  ›  专栏  ›  技术社区  ›  ryy77

如何从亚马逊产品页面获取隐藏信息(平均评论)?(刮网)

  •  -2
  • ryy77  · 技术社区  · 7 年前

    如何从Amazon模板中获取平均星级(示例4.3/5星级)( https://www.amazon.com/s/ref=sr_pg_1?fst=as%3Aoff&rh=n%3A1055398%2Cn%3A1063306%2Ck%3Aas&keywords=as&ie=UTF8&qid=1532070774 )对于本页的每个产品。这是亚马逊产品页面。问题出现在第二个Try/Catch块上。我附上了密码。我会感谢你的帮助。谢谢您。

    import csv
    from selenium import webdriver
    from bs4 import BeautifulSoup
    import requests
    from lxml import html
    import io
    
    links = [
        'https://www.amazon.com/s/ref=sr_pg_1?fst=as%3Aoff&rh=n%3A1055398%2Cn%3A1063306%2Ck%3Aas&keywords=as&ie=UTF8&qid=1532070774'
     ]
    proxies = {
        'http': 'http://218.50.2.102:8080',
        'https': 'http://185.93.3.123:8080'
    }
    
    chrome_options = webdriver.ChromeOptions()
    
    chrome_options.add_argument('--proxy-server="%s"' % ';'.join(['%s=%s' % (k, v) for k, v in proxies.items()]))
    
    driver = webdriver.Chrome(executable_path="C:\\Users\Andrei-PC\Downloads\webdriver\chromedriver.exe",
                                  chrome_options=chrome_options)
    header = ['Product title', 'Product price', 'Review', 'ASIN']
    
    with open('csv/demo.csv', "w") as output:
        writer = csv.writer(output)
        writer.writerow(header)
    for i in range(len(links)):
    
        driver.get(links[i])
        for x in range(0,23):
            product_title = driver.find_elements_by_xpath('//li[@id="result_{}"]/div/div[3]/div/a'.format(x))
            title = [x.text for x in product_title]
    
            try:
                price = driver.find_element_by_xpath('//li[@id="result_{}"]/div/div[5]/div/a/span[2]'.format(x)).text
            except:
                price = 'No price v2'
                print('No price v2')
    
            try:
                review = driver.find_element_by_xpath('//li[@id="result_{}"]/div/div[6]/span'.format(x)).text()
    
            except:
                review = 'No review v1'
                print('No review v1')
    
            try:
                asin = driver.find_element_by_id('result_{}'.format(x)).get_attribute('data-asin')
    
            except:
                asin = 'No asin'
                print('No asin')
    
            try:
                data = [title[0], price, review, asin]
            except:
                print('no items v3 ')
            with io.open('csv/demo.csv', "a", newline="", encoding="utf-8") as output:
                writer = csv.writer(output)
                writer.writerow(data)
        print('I solved this link %s' % (links[i]))
        print('Number of product %s' % (i + 1))
    
    1 回复  |  直到 7 年前
        1
  •  1
  •   Andersson    7 年前

    您可以通过以下代码获得费率:

    stars = driver.find_element_by_class_name('a-icon-alt').get_attribute('textContent')
    
    推荐文章