代码之家  ›  专栏  ›  技术社区  ›  Ehsan Akbar

使用-beautiful soup在python的表列中获取href链接

  •  0
  • Ehsan Akbar  · 技术社区  · 3 年前

    我有这些数据,你可以看到:

    [<td><span></span></td>, <td><span></span></td>, <td><a class="cmc-link" href="/currencies/renbtc/"><span class="circle"></span><span>renBTC</span><span class="crypto-symbol">RENBTC</span></a></td>, <td><span>$<!-- -->61947.68</span></td>, <td><span></span></td>]
    

    我想提取 href 你可以在这里看到的链接是 /currencies/renbtc/

    这是我的密码:

    from bs4 import BeautifulSoup
    import requests
    try:
        r = requests.get('https://coinmarketcap.com/')
        soup = BeautifulSoup(r.text, 'lxml')
    
        
        table = soup.find('table', class_='cmc-table')
        for row in table.tbody.find_all('tr'):    
            # Find all data for each column
             columns = row.find_all('td')
             print(columns)
             
    except requests.exceptions.RequestException as e:
        print(e)
    

    2 回复  |  直到 3 年前
        1
  •  2
  •   Matiiss    3 年前

    迭代 <td> <td> <a> ( if td.a .get('href') 属于 td.a :

    from bs4 import BeautifulSoup
    import requests
    try:
        r = requests.get('https://coinmarketcap.com/')
        soup = BeautifulSoup(r.text, 'lxml')
    
        table = soup.find('table', class_='cmc-table')
    
        for row in table.tbody.find_all('tr'):
            # Find all data for each column
            columns = row.find_all('td')
            for td in columns:
                if td.a:
                    print(td.a.get('href'))
                    # theoretically for performance you can
                    # break
                    # here to stop the loop if you expect only one anchor link per `td`
    
    except requests.exceptions.RequestException as e:
        print(e)
    
        2
  •  1
  •   HedgeHog    3 年前

    <a> ,选择它并获取其 href :

    link = columns[2].a['href']
    

    实例

    from bs4 import BeautifulSoup
    import requests
    try:
        r = requests.get('https://coinmarketcap.com/')
        soup = BeautifulSoup(r.text, 'lxml')
    
        
        table = soup.find('table', class_='cmc-table')
        for row in table.tbody.find_all('tr'):    
            # Find all data for each column
             columns = row.find_all('td')
             link = columns[2].a['href']
             print(link)
             
    except requests.exceptions.RequestException as e:
        print(e)