代码之家 › 专栏 › 技术社区 › wen tian

使用beautifulsoup从网站中提取数字?

extract beautifulsoup python

wen tian · 技术社区 · 7 年前

以下python代码:

from bs4 import BeautifulSoup
div = '<div class="hm"><span class="xg1">æ¥ç:</span> 15660<span class="pipe">|</span><span class="xg1">åå¤:</span> 435</div>'
soup = BeautifulSoup(div, "lxml")
hm = soup.find("div", {"class": "hm"})
print(hm)

在这种情况下,我需要两个数字的输出:

15660
435

我想尝试使用beautifulsoup从网站中提取数字。但我不知道怎么做?

1 回复 | 直到 7 年前

cs95 abhishek58g 7 年前

呼叫 soup.find_all ,带有正则表达式-

>>> list(map(str.strip, soup.find_all(text=re.compile(r'\b\d+\b'))))

或

>>> [x.strip() for x in soup.find_all(text=re.compile(r'\b\d+\b'))]

['15660', '435']

如果需要整数而不是字符串,请调用 int 列表内理解-

>>> [int(x.strip()) for x in soup.find_all(text=re.compile(r'\b\d+\b'))]
[15660, 435]

推荐文章

Essi · R-基于匹配值从另一个数据帧添加数据[重复]

7 年前

wen tian · 使用beautifulsoup从网站中提取数字?

7 年前

user7579444 · 在Python中,如何获取相同字符的数量及其在字符串中的位置?

7 年前

Ty Kayn · PHP7中的ZipArchive找不到zip的内容

7 年前

YazOT · 使用python从文本文件中提取特定行

7 年前

plaidshirt · JMeter JSON提取器按条件获取值

7 年前

Pau · 从字符串中提取超链接的Php函数

8 年前

kroy2008 · 从选定尾注生成的字符串中提取文本

8 年前

Fabio Favoretto · 在R中匹配不同数据帧中的站点

8 年前

hoperose · 如何使用python中的正则表达式从文件中提取特定段落?

8 年前