我正在尝试分析此上的HTML website .
我想从这些文件中获取文本 span 元素与 class = "post-subject"
span
class = "post-subject"
实例:
<span class="post-subject">Set of 20 moving boxes (20009 or 20011)</span> <span class="post-subject">Firestick/Old xbox games</span>
当我运行下面的代码时, soup.find() 收益率 None . 我不知道发生了什么事?
soup.find()
None
import requests from bs4 import BeautifulSoup page = requests.get('https://trashnothing.com/washington-dc-freecycle?page=1') soup = BeautifulSoup(page.text, 'html.parser') soup.find('span', {'class': 'post-subject'})
为了帮助您开始,下面应该加载页面,您需要获得正确的 gecko driver 然后可以用硒来实现。我没有看到一个类:在链接的页面上发布主题,但是您可以将登录的按钮单击自动化为:
availbutton = driver.find_element_by_id('buttonAvailability_1') availbutton.click()
from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Firefox() driver.get('https://trashnothing.com/washington-dc-freecycle?page=1') html = driver.page_source soup = BeautifulSoup(html,'lxml') print(soup.find('span', {'class': 'post-subject'}))