代码之家  ›  专栏  ›  技术社区  ›  buzz

使用Beautiful Soup获取第二个srcset属性

  •  1
  • buzz  · 技术社区  · 2 周前

    我正在尝试获取beautiful Soup中的第二个srcset属性,原始html如下:

    <picture class="card-picture ratio ratio-4x3">
    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640&amp;format=webp" type="image/webp"/>
    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640" type="image/jpeg"/>
    <img alt="" class="card-img object-fit-contain is-contain" loading="lazy" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
    </img>
    </picture>
    

    我的代码:

    for result in results:
        imgel = result.find("source", attrs = {'srcset' : True})['srcset']
    

    这返回第一个srcset值_我想获得第二个值png URL

    1 回复  |  直到 2 周前
        1
  •  1
  •   Andrej Kesely    2 周前

    只需全选 <source> 标记并使用普通索引:

    from bs4 import BeautifulSoup
    
    html_source = """\
    <picture class="card-picture ratio ratio-4x3">
    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640&amp;format=webp" type="image/webp"/>
    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640" type="image/jpeg"/>
    <img alt="" class="card-img object-fit-contain is-contain" loading="lazy" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
    </img>
    </picture>"""
    
    soup = BeautifulSoup(html_source, "html.parser")
    
    results = soup.select("picture")
    
    for result in results:
        second_img = result.select("source")[1]
        print(second_img)
    

    打印:

    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640" type="image/jpeg"/>
    

    OR:选择 image/jpeg :

    for result in results:
        jpeg_img = result.select_one('source[type="image/jpeg"]')
        print(jpeg_img)
    

    或者如果您想要第一个jpeg或png:

    for result in results:
        img = result.select_one('source[type="image/jpeg"], source[type="image/png"]')
        print(img)