代码之家  ›  专栏  ›  技术社区  ›  Kamikaze_goldfish

同时用漂亮的汤和蟒蛇做循环

  •  0
  • Kamikaze_goldfish  · 技术社区  · 7 年前

    https://www.brightscope.com/ratings/a 收视率很高 other . 评级后的每个字母(如a、b、c……)都有多页。我正在尝试创建一个while循环来转到每个页面,并且在存在某个条件时,将所有的href(我还没有得到该代码)。但是,当我运行代码时,while循环继续不停地运行。如何修复它以转到每个页面并搜索要运行的条件,如果找不到,则转到下一个字母?在任何人可能会问,我已经搜索了代码,但没有看到任何 li

    https://www.brightscope.com/ratings/A/18 是最高的,它将去为A的,但它继续运行。

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.brightscope.com/ratings/"
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    hrefs = []
    ratings = []
    ks = []
    pages_scrape = []
    
    for href in soup.findAll('a'):
        if 'href' in href.attrs:
            hrefs.append(href.attrs['href'])
    for good_ratings in hrefs:
        if good_ratings.startswith('/ratings/'):
            ratings.append(url[:-9]+good_ratings)
    
    del ratings[0]
    del ratings[27:]
    count = 1
    # So it runs each letter a, b, c, ... 
    for each_rating in ratings:
        #Pulls the page
        page = requests.get(each_rating)
        #Does its soup thing
        soup = BeautifulSoup(page.text, 'html.parser')
        #Supposed to stay in A, B, C,... until it can't find the 'li' tag
        while soup.find('li'):
            page = requests.get(each_rating+str(count))
            print(page.url)
            count = count+1
            #Keeps running this and never breaks
        else:
            count = 1
            break
    
    2 回复  |  直到 7 年前
        1
  •  1
  •   leotrubach    7 年前

    博特弗苏的 find() <li> 元素,您需要使用findAll()方法并对其结果进行迭代。

        2
  •  0
  •   Deejpake    7 年前

    这个 soup.find('li') page count 页码

    while soup.find('li'):
            page = requests.get(each_rating+str(count))
            soup = BeautifulSoup(page.text, 'html.parser')
            print(page.url)
            count = count+1
            #Keeps running this and never breaks
    

    希望这有帮助