代码之家 › 专栏 › 技术社区 › Sayeed

网页抓取:缺少某些网页的URL

python-requests beautifulsoup web-scraping

Sayeed · 技术社区 · 7 年前

我试图删除网站的前9页,但第5页和第7页似乎不见了。这使show python成为一个属性错误。不过,我认为“if”函数可以解决这个问题,但我无法找出if函数的代码。这是我的密码

import requests
from bs4 import BeautifulSoup
base_url="http://cbcs.fastvturesults.com/student/1sp15me00"
for page in range(1,10,1):
    r=requests.get(base_url+str(page))
    c=r.content
    soup=BeautifulSoup(c,"html.parser")
    items=soup.find(class_="text-muted")
    if ??????????:
        pass
    else:
        print("{}\n{}".format(items.previous_sibling,items.text))

2 回复 | 直到 7 年前

SIM 7 年前

你不需要创造 else 封锁这里。只检查 if items is not None 够了。尝试以下方法:

items = soup.find(class_="text-muted")
if items:
    print("{}\n{}".format(items.previous_sibling,items.text))

nmog 7 年前

当您尝试访问 items 什么时候 项目 设置为 None . 当美女组找不到 class_="text-muted"

解决方案:

if not items:
    continue

请注意 pass (从您的解决方案中)将只传递当前语句并转到循环中的下一行。 continue 将结束当前迭代并继续下一个迭代。

推荐文章

Omega500 · AttributeError:ResultSet对象没有属性“find\u all”

2 年前

Christina Norwood · 我需要什么Python技术来获取二进制边距数据?

2 年前

mexicanRmy · Selenium Select不处理下拉元素

2 年前

yash agarwal · Python Selenium-如何基于span标记内的文本提取元素?

2 年前

Amar · 漂亮汤错误:“NoneType”对象没有属性“find\u all”

2 年前

Fadi Ft Ftena · 基于类名的Web抓取

2 年前

Amen Aziz · 熊猫中的数据被覆盖

3 年前

Amen Aziz · csv文件中的数据不提供任何信息

3 年前

Jensen Holm · 在非常大的字符串中查找链接时遇到问题

3 年前

Yungi Jeong · 在使用selenium进行web抓取后,我在csv文件中得到了奇怪的结果。。内容不是特定的内容,而是html代码

3 年前