您在此网站上看到的表存储在HTML注释中(
<!-- ... -->
)所以BeautifulSoup通常不会看到它们。要分析它们,请尝试下一个示例:
import requests
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(
requests.get(
"https://www.baseball-reference.com/register/league.cgi?id=c346199a"
).text,
features="html.parser",
)
s = "".join(c for c in soup.find_all(text=Comment) if "table_container" in c)
soup = BeautifulSoup(s, "html.parser")
for a in soup.select('[href*="/register/team.cgi?id="]'):
print("{:<30} {}".format(a.text, a["href"]))
打印:
Battle Creek Bombers /register/team.cgi?id=f3c4b615
Kenosha Kingfish /register/team.cgi?id=71fe19cd
Kokomo Jackrabbits /register/team.cgi?id=8f1a41fc
Rockford Rivets /register/team.cgi?id=9f4fe2ef
Traverse City Pit Spitters /register/team.cgi?id=7bc8d111
Kalamazoo Growlers /register/team.cgi?id=9995d2a1
Fond du Lac Dock Spiders /register/team.cgi?id=02911efc
...and so on.