代码之家  ›  专栏  ›  技术社区  ›  Litmon

使用span title和span类抓取数据

  •  0
  • Litmon  · 技术社区  · 7 年前

    我正在使用pythonanaconda将数据刮到Excel表中。我在两个网站上遇到了麻烦。

    站点1

    <div id="ember3815" class="ember-view">
    <p class="org-top-card-module__company-descriptions Sans-15px-black-55%">
    <span class="company-industries org-top-card-module__dot-separated-list">
      Industry
    </span>
    <span class="org-top-card-module__location org-top-card-module__dot-separated-list">
      City, State
    </span>
    <span title="62,346 followers" class="org-top-card-module__followers-count org-top-card-module__dot-separated-list">
      62,346 followers
    </span>
    

    我在试着拉大标题。我尝试过的事情

    text = soup.find('span',{'class':"company-industries org-top-card-module__dot-separated-list"})
    
    text = soup.find('p',{'class':"org-top-card-module__company-descriptions Sans-15px-black-55%"})
    
    text = soup.body.find('span', attrs={'class': 'org-top-card-module__location org-top-card-module__dot-separated-list'})
    
    text = soup.find('span',{'class': 'org-top-card-module__location org-top-card-module__dot-separated-list'})
    

    站点2

    我需要从下面的html中提取值8052。

    <section class="zwlfE">
    <div class="nZSzR">...</div>
    <ul class="k9GMp ">
    <li class="Y8-fY ">...</li>
    <li class-"Y8-fY ">
    <a class="g47SY " title="8,052">8,052</span>" followers"
    </a>
    </li>
    <li class="Y8-fY ">...</li>
    </ul>
    <div class="-vDIg">...</div>
    </section>
    

    我试过:

    • 与上面类似,但带有div和li标记

    我试过的每件事都有结果。

    请帮忙?

    1 回复  |  直到 7 年前
        1
  •  0
  •   Rakesh    7 年前

    为了得到 span title

    from bs4 import BeautifulSoup
    html ="""<div id="ember3815" class="ember-view">
    <p class="org-top-card-module__company-descriptions Sans-15px-black-55%">
    <span class="company-industries org-top-card-module__dot-separated-list">
      Industry
    </span>
    <span class="org-top-card-module__location org-top-card-module__dot-separated-list">
      City, State
    </span>
    <span title="62,346 followers" class="org-top-card-module__followers-count org-top-card-module__dot-separated-list">
      62,346 followers
    </span>"""
    
    soup = BeautifulSoup(html, "html.parser")
    print( soup.find("span", class_="org-top-card-module__followers-count org-top-card-module__dot-separated-list")["title"])
    

    输出:

    62,346 followers
    

    对于站点2

    print( soup.find("a", class_="g47SY")["title"])