代码之家  ›  专栏  ›  技术社区  ›  EricA

Python中使用带循环的漂亮汤的Webscrape交互式图表

  •  0
  • EricA  · 技术社区  · 7 年前

    下面的代码提供了页面中所有数字标记的信息。我能用一个过滤器为每个区域提取一次吗

    例如: https://opensignal.com/reports/2019/04/uk/mobile-network-experience ,我只对“区域分析”选项卡下的所有区域的数字感兴趣。

    import requests
    from bs4 import BeautifulSoup
    
    html=requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
    soup=BeautifulSoup(html,'html.parser')
    items=soup.find_all('div',class_='c-ru-graph__rect')
    
    
    for item in items:
        provider=item.find('span', class_='c-ru-graph__label').text
        prodvalue=item.find_next_sibling('span').find('span', class_='c-ru-graph__number').text
        print(provider + " : " + prodvalue)
    

    我想要一张桌子或df如下 复活节地区

                           o2      Vodaphone   3    EE
    4G Availability        82      76.9        73.0   89.2
    Upload Speed Experience 5.6    5.9         6.8    9.5
    

    有什么能帮助得到结果的指针吗?

    0 回复  |  直到 7 年前
        1
  •  1
  •   QHarr    7 年前

    以下是我将如何为所有地区。需要bs4 4.7.1。AFAICS你必须假设公司的顺序是一致的。

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    r = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience")
    soup = BeautifulSoup(r.content,'lxml') #'html.parser' if lxml not installed
    metrics = ['4g-availability', 'video-experience', 'download-speed' , 'upload-speed', 'latency']
    headers = ['02', 'Vodaphone', '3', 'EE']
    results = []
    
    for region in soup.select('.s-regional-analysis__region'):
        for metric in metrics:
            providers = [item.text for item in region.select('.c-ru-chart:has([data-metric="' + metric + '"]) .c-ru-graph__number')]
            row = {headers[i] : providers[i] for i in range(len(providers))}
            row['data-metric'] = metric
            row['region'] = region['id'] 
            results.append(row)
    
    df = pd.DataFrame(results, columns = ['region', 'data-metric', '02','Vodaphone', '3', 'EE'] )
    print(df)
    

    样本输出:

    enter image description here

        2
  •  1
  •   sentence    7 年前

    假设公司的顺序是固定的(事实上是固定的),您可以简单地将要检查的内容减少到只包含所需信息的div。

    import requests
    from bs4 import BeautifulSoup
    
    html = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
    soup = BeautifulSoup(html,'html.parser')
    
    res = soup.find_all('div', {'id':'eastern'})
    
    aval = res[0].find_all('div', {'data-chart-name':'4g-availability'})
    avalname = aval[0].find('span', {'class':'js-metric-name'}).text
    
    upload = res[0].find_all('div', {'data-chart-name':'upload-speed'})
    uploadname = upload[0].find('span', {'class':'js-metric-name'}).text
    
    companies = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__label')]
    
    row1 = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__number')]
    row2 = [i.text for i in upload[0].find_all('span', class_='c-ru-graph__number')]
    
    import pandas as pd
    
    df = pd.DataFrame({avalname:row1,
                       uploadname:row2})
    
    
    df.index = companies
    
    df = df.T
    

    输出

                              O2    Vodafone      3      EE
    4G Availability         82.0        76.9   73.0    89.2
    Upload Speed Experience  5.6         5.9    6.8     9.5