代码之家 › 专栏 › 技术社区 › Cats

报废位于嵌套跨度下的跨度中的信息

beautifulsoup web-scraping python

Cats · 技术社区 · 1 年前

我想通过网络抓取获得实时天气数据。我在考虑用BeautifulSoup来做这个。

<span class="Column--precip--3JCDO">
  <span class="Accessibility--visuallyHidden--H7O4p">Chance of Rain</span>
  3%
</span>

我想把这个集装箱里的3%拿出来。我已经设法使用这个代码片段从网站上获取了另一部分的数据。

temp_value = soup.find("span", {"class":"CurrentConditions--tempValue--MHmYY"}).get_text(strip=True)

我为下雨做了同样的尝试

rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).get_text(strip=True)

但我的控制台提供的输出是 -- 对于 print(rain_forecast) . 我能看到的唯一区别是,在应该从跨度中获得的“文本”之间还有另一个跨度。

我在Stack Overflow中遇到的另一种方法是使用Selenium,因为数据尚未加载到变量中,因此输出为 -- .

但我不知道这对我的应用程序来说是不是有些过头了,或者是否有一个更简单的解决方案来解决这个问题。

2 回复 | 直到 1 年前

Andrej Kesely 1 年前

如果您想获得今天的预测表,您可以使用以下示例:

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}

url = "https://weather.com/en-IN/weather/today/l/a0e0a5a98f7825e44d5b44b26d6f3c2e76a8d70e0426d099bff73e764af3087a"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

today_forecast = []
for a in soup.select(".TodayWeatherCard--TableWrapper--globn a"):
    today_forecast.append(
        t.get_text(strip=True, separator=" ") for t in a.find_all(recursive=False)
    )

df = pd.DataFrame(
    today_forecast, columns=["Time of day", "Degrees", "Text", "Chance of rain"]
)

print(df)

打印:

  Time of day Degrees                 Text          Chance of rain
0     Morning    11 Â°        Partly Cloudy                      --
1   Afternoon    20 Â°        Partly Cloudy                      --
2     Evening    14 Â°  Partly Cloudy Night  Rain Chance of Rain 3%
3   Overnight    10 Â°               Cloudy  Rain Chance of Rain 5%

-1

Chris 1 年前

from bs4 import BeautifulSoup

# Assuming you have your HTML content in 'html_content'
soup = BeautifulSoup(html_content, 'html.parser')

# Find the parent span and extract the text, excluding the nested span's text
rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).contents[-1].strip()

print(rain_forecast)

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

user29715306 · from_users=和chats=电视节目中的差异

4 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

5 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

prayner · 更新嵌套字典包含列表中的项

5 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

5 月前

Dave · 如何在for循环中修改列表值

5 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

5 月前