具有
BeautifulSoup
这相对简单-想法是按类和文本定位“里程碑”元素,然后使用
.next_sibling
:
from bs4 import BeautifulSoup
data = """
<div>
<span class="milestone">Announcement:</span>
" 2 April 2000 "
<br>
<span class="milestone">Ground Breaking:</span>
" 23 February 2002 "
<br>
</div>"""
soup = BeautifulSoup(data, "html.parser")
print(soup.find(class_="milestone", text="Announcement:").next_sibling.strip())
print(soup.find(class_="milestone", text="Ground Breaking:").next_sibling.strip())
打印:
"Â 2 April 2000 "
"Â 23 February 2002 "