确保您正在使用
user-agent
因为如果请求没有被发送,谷歌可能会阻止该请求
用户代理
。例如,默认
requests
用户代理
是
python-requests
所以网站知道这是一个发送请求的脚本,可能会阻止它。
而且,没有必要
auth=('user', 'pass')
因为搜索谷歌时,你不必在任何地方登录。
代码和
full example in the online IDE
:
from bs4 import BeautifulSoup
import requests, json, lxml
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "minecraft redstone ideas", # search query
"gl": "us", # country of the search
"hl": "en" # language
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36",
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
results = []
for index, result in enumerate(soup.select(".tF2Cxc"), start=1):
title = result.select_one(".DKV0Md").text
link = result.select_one(".yuRUbf a")["href"]
displayed_link = result.select_one(".tjvcx").text
try:
snippet = result.select_one("#rso .lyLwlc").text
except: snippet = None
results.append({
"position": index,
"title": title,
"link": link,
"displayed_link": displayed_link,
"snippet": snippet
})
print(json.dumps(results, indent=2, ensure_ascii=False))
部分输出:
[
{
"position": 1,
"title": "15 Awesome Minecraft Redstone Ideas - WhatIfGaming",
"link": "https://whatifgaming.com/awesome-minecraft-redstone-ideas/",
"displayed_link": "https://whatifgaming.com ⺠awesome-minecraft-redstone-i...",
"snippet": null
},
{
"position": 2,
"title": "Minecraft: 20 Insanely Useful Redstone Contraptions ...",
"link": "https://gamerant.com/minecraft-insanely-useful-redstone-contraptions/",
"displayed_link": "https://gamerant.com ⺠Lists",
"snippet": "Nov 1, 2021 â Minecraft: 20 Insanely Useful Redstone Contraptions ; 20 Bubble Elevator ; 19 Kelp Farm ; 18 Xray Machine ; 17 Armor Wardrobe ; 16 Micro-Crop Farm."
},
{
"position": 3,
"title": "10 Minecraft Redstone Tricks for Survival Mode - dummies",
"link": "https://www.dummies.com/article/home-auto-hobbies/games/online-games/minecraft/10-minecraft-redstone-tricks-for-survival-mode-147583",
"displayed_link": "https://www.dummies.com ⺠... ⺠Minecraft",
"snippet": "Learn how to apply redstone programming in Minecraft Survival mode, including dungeon farms, fast transportation, elevators, and more."
},
]
或者,您可以使用
Google Organic Results API
来自SerpApi。
这是一个付费API,有一个免费计划,可以处理来自谷歌或其他搜索引擎的数据块,可以扩展到月球,让最终用户思考要提取什么数据,而不是从头开始创建解析器并维护它,并找出如何绕过谷歌或其他搜索引擎的数据。
要集成的代码:
from serpapi import GoogleSearch
import json
params = {
"api_key": "serpapi_key",
"engine": "google",
"q": "minecraft redstone ideas",
"google_domain": "google.com",
"gl": "us",
"hl": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
data = []
for result in results["organic_results"]:
data.append({
"position": result.get("position"),
"title": result.get("title"),
"link": result.get("link"),
"displayed_link": result.get("displayed_link"),
"snippet": result.get("snippet")
})
print(json.dumps(data, indent=2, ensure_ascii=False))
部分输出:
[
{
"position": 1,
"title": "Minecraft: 20 Insanely Useful Redstone Contraptions ...",
"link": "https://gamerant.com/minecraft-insanely-useful-redstone-contraptions/",
"displayed_link": "https://gamerant.com ⺠Lists",
"snippet": "Minecraft: 20 Insanely Useful Redstone Contraptions ; 20 Bubble Elevator ; 19 Kelp Farm ; 18 Xray Machine ; 17 Armor Wardrobe ; 16 Micro-Crop Farm."
},
{
"position": 2,
"title": "Build These in Your Minecraft House! - YouTube",
"link": "https://www.youtube.com/watch?v=a3ggfzC0rLg",
"displayed_link": "https://www.youtube.com ⺠watch",
"snippet": null
},
{
"position": 3,
"title": "10 Minecraft Redstone Tricks for Survival Mode - dummies",
"link": "https://www.dummies.com/article/home-auto-hobbies/games/online-games/minecraft/10-minecraft-redstone-tricks-for-survival-mode-147583",
"displayed_link": "https://www.dummies.com ⺠... ⺠Minecraft",
"snippet": "Learn how to apply redstone programming in Minecraft Survival mode, including dungeon farms, fast transportation, elevators, and more."
}, ... other results
]
免责声明,我为SerpApi工作。