代码之家 › 专栏 › 技术社区 › Ferruccio Islam Bisceglia

我可以使用GET请求直接从页面上抓取推特帖子图像吗?

get twitter web-scraping http python

Ferruccio Islam Bisceglia · 技术社区 · 1 年前

我可以直接从HTML中抓取twitter图片吗?

答案是否定的,或者至少根据我的尝试,它不是

import requests

# The URL for the GET request
url = '<twitter post link here'>
# Perform the GET request
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:

    # save the html content into a variable
    response = response.text
    print(response)

    with open("my_file.html", "w", encoding="utf-8") as f:
        # write the string into the file
        f.write(response)

else:
    print(f'Failed to get URL. Status code: {response.status_code}')

我编写的Python代码不需要依赖性,尽管有请求。然后,此代码使用Python的内置功能 open ,允许创建和下载HTML文件。该脚本将对Twitter帖子进行GET调用。

问题是它所做的是返回一个空的推特页面,只有推特标志可用。

发生这种情况可能是因为它处于模态中?

1 回复 | 直到 1 年前

Eric M. 1 年前

问题是Twitter(x.com)不会直接返回带有图片或原始帖子的网页。之后,他们使用JavaScript加载实际的网页。 Twitter试图阻止你通过简单的请求抓取页面,因为他们提供付费的API。API允许开发人员轻松访问内容。即使你得到了真正的网页,它也不会包含任何图像。html文档只指向图像的地址。

Lucas Moura Gomes 1 年前

除非您使用网站提供的某种API,否则无法使用请求模块从javascript动态渲染的网站下载图像。但是,您可以使用Selenium打开浏览器并呈现HTML代码。

如果您正确设置了Selenium,下面的代码将从一个示例twitter帖子中下载图片。

import time
from selenium import webdriver
from bs4 import BeautifulSoup
import urllib.request

#file to save
picture_filename = 'my_picture.png'

#open firefox browser
driver = webdriver.Firefox()

#open url
driver.get('https://twitter.com/WagnerRM/status/1697373778114715742/photo/1')

#wait enough time for page to render
time.sleep(10)

#using beautiful soup to find the image we want
soup = BeautifulSoup(driver.page_source)
images = soup.find_all('img')

images = [img['src'] for img in images if '/media/' in img['src']]
for image in images:
    urllib.request.urlretrieve(image, picture_filename)

PS:

要打开浏览器,你首先需要安装它的网络驱动程序。上面的代码使用Firefox。例如,如果您使用的是Windows,则可以使用以下代码。

pip install webdrivermanager
webdrivermanager firefox --linkpath ./

这将把一个名为geckodriver.exe的文件下载到用于打开浏览器的当前路径。

推荐文章

Connor D · 从GET方法中向PUG模板发送推特数据

7 年前

Divya Thakkar · 如何在Swift中通过一条推文将多个图像上传到twitter?

7 年前

ambrish dhaka · 将twitter数据导入pandas时跳过属性错误

7 年前

Abhirajsinh Thakore · Swift-将视频上载到Twitter时,请求状态为400

7 年前

the_t_test_1 · 与tweepy取消推特链接

7 年前

Varun · R: 自动在云端抓取和存储Twitter数据

7 年前

Natalie · Python监督的ML文本概率分类

7 年前

Soliman Mahmoud Soliman · 我如何确保用户是否发了推文[我的意思是推文没有点击推文按钮]?

7 年前

holo · 不支持Twitter浏览器通知

7 年前

Bilal Butt · 如何从json文件中获取不同或唯一的字符串行

7 年前