代码之家 › 专栏 › 技术社区 › johns7843

如何使用xpath收集所有HREF?Selenium-Python

xpath selenium python

johns7843 · 技术社区 · 3 年前

我试图从本例中的艺术家那里收集所有(5)个社交媒体链接。目前,我的输出只是最后一个(第五个)社交媒体链接。我正在使用硒,我知道这不是收集这些数据的最佳选择,但这是我目前所知道的全部。注意,我只为我的问题包含了相关代码。提前感谢您的帮助/见解。

    from cgitb import text
    from os import link
    from selenium import webdriver
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.options import Options
    import time
    from random import randint
    import pandas as pd

    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('disable-infobars')
    chrome_options.add_argument('--disable-extensions')
    chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
    driver = webdriver.Chrome(chrome_options=chrome_options)




for url in urls:
driver.get(https://soundcloud.com/flux-pavilion)


time.sleep(randint(3,4))


try:
    links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
    for elem in links:
        socialmedia = (elem.get_attribute("href"))


except:
        links = "none"

artist = {
    'socialmedia': socialmedia,
    }

print(artist)

1 回复 | 直到 3 年前

zx485 potemkin 3 年前

问题不在于XPath表达式,而在于输出代码的列表处理(不存在)。

您的代码只输出了结果XPath列表的最后一项。这就是为什么您只收到一个链接(这是最后一个链接)的问题所在。

因此,将代码的输出部分更改为

[...]

url = driver.get("https://soundcloud.com/flux-pavilion")    
time.sleep(randint(3,4))
artist = []

try:
    links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
    for elem in links:
        artist.append(elem.get_attribute("href"))


except:
        links = "none"

for link in artist:
    print(link)

输出将包含您想要的所有值(链接):

driver = webdriver.Chrome(chrome_options=chrome_options)
https://gate.sc/?url=https%3A%2F%2Ftwitter.com%2FFluxpavilion&token=da4a8d-1-1653430570528
https://gate.sc/?url=https%3A%2F%2Finstagram.com%2FFluxpavilion&token=277ea0-1-1653430570529
https://gate.sc/?url=https%3A%2F%2Ffacebook.com%2FFluxpavilion&token=4c773c-1-1653430570530
https://gate.sc/?url=https%3A%2F%2Fyoutube.com%2FFluxpavilion&token=1353f7-1-1653430570531
https://gate.sc/?url=https%3A%2F%2Fopen.spotify.com%2Fartist%2F7muzHifhMdnfN1xncRLOqk%3Fsi%3DbK9XeoW5RxyMlA-W9uVwPw&token=bc2936-1-1653430570532

推荐文章

Community wiki · 无法从同一局域网内的远程机器访问Android设备

2 年前

Abdallah Faik · selenium查找元素不工作它找不到元素和发送键

3 年前

Monica · 使用Selenium和Python在没有url的情况下单击下载文件

3 年前

GettingStarted With123 · Java Selenium webdriver从autosuggest访问每个div元素中的span元素

3 年前

ARH · 如何使用Selenium识别网站中使用的所有标签

3 年前

vidhu · 无URL的自动化测试

3 年前

Alexander Flores · Webdriver不会单击第二个链接

3 年前

Yungi Jeong · 在使用selenium进行web抓取后,我在csv文件中得到了奇怪的结果。。内容不是特定的内容,而是html代码

3 年前

johns7843 · 如何使用xpath收集所有HREF?Selenium-Python

3 年前

Myoung Nam · 如何从谷歌搜索中提取多个div类?

3 年前