代码之家  ›  专栏  ›  技术社区  ›  Muhammad Zeeshan

如何使用googleapi抓取数据

  •  3
  • Muhammad Zeeshan  · 技术社区  · 10 年前
    import requests
    
    def search(query, pages=4, rsz=8):
        url = 'https://ajax.googleapis.com/ajax/services/search/web'
        params = {
            'v': 1.0,     # Version
            'q': query,   # Query string
            'rsz': rsz,   # Result set size - max 8
        }
    
        for s in range(0, pages*rsz+1, rsz):
            params['start'] = s
            r = requests.get(url, params=params)
            for result in r.json()['responseData']['results']:
                yield result
    

    在最初的2、3次尝试中,它正在检索所有需要的页面,但在2、3个尝试后,它没有得到任何结果。它返回“无”或[]。谷歌是否在几次尝试后阻止了我的IP?有什么解决方案吗?

    3 回复  |  直到 4 年前
        1
  •  0
  •   Yogiraj Banerji    10 年前

    我不确定这是否有效,但避免被不鼓励刮擦的网站阻止的唯一方法是在检索网页时使用代理。 请检查如何在代码中使用代理。

        2
  •  0
  •   Muhammad Zeeshan    10 年前

    这个问题是通过请求和BeautifulSoup解决的。

    import requests, import BeautifulSoup
    url = 'http://www.google.com/search'
    payload = { 'q' : strToSearch, 'start' : str(start), 'num' : str(num) }
    r = requests.get( url,params = payload, auth=('user', 'pass')) 
    subSoup = BeautifulSoup( subR.text, 'html.parser' )
    text = soup.get_text(separator=' ')       
    
        3
  •  0
  •   Dmitriy Zub    3 年前

    确保您正在使用 user-agent 因为如果请求没有被发送,谷歌可能会阻止该请求 用户代理 。例如,默认 requests 用户代理 python-requests 所以网站知道这是一个发送请求的脚本,可能会阻止它。

    而且,没有必要 auth=('user', 'pass') 因为搜索谷歌时,你不必在任何地方登录。


    代码和 full example in the online IDE :

    from bs4 import BeautifulSoup
    import requests, json, lxml
    
    # https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
    params = {
        "q": "minecraft redstone ideas",  # search query
        "gl": "us",                       # country of the search
        "hl": "en"                        # language                
    }
    
    # https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36",
    }
    
    html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(html.text, "lxml")
    
    results = []
    
    for index, result in enumerate(soup.select(".tF2Cxc"), start=1):
        title = result.select_one(".DKV0Md").text
        link = result.select_one(".yuRUbf a")["href"]
        displayed_link = result.select_one(".tjvcx").text
        try:
            snippet = result.select_one("#rso .lyLwlc").text
        except: snippet = None
        
        results.append({
            "position": index,
            "title": title,
            "link": link,
            "displayed_link": displayed_link,
            "snippet": snippet
        })
        
    print(json.dumps(results, indent=2, ensure_ascii=False))
    

    部分输出:

    [
      {
        "position": 1,
        "title": "15 Awesome Minecraft Redstone Ideas - WhatIfGaming",
        "link": "https://whatifgaming.com/awesome-minecraft-redstone-ideas/",
        "displayed_link": "https://whatifgaming.com › awesome-minecraft-redstone-i...",
        "snippet": null
      },
      {
        "position": 2,
        "title": "Minecraft: 20 Insanely Useful Redstone Contraptions ...",
        "link": "https://gamerant.com/minecraft-insanely-useful-redstone-contraptions/",
        "displayed_link": "https://gamerant.com › Lists",
        "snippet": "Nov 1, 2021 — Minecraft: 20 Insanely Useful Redstone Contraptions ; 20 Bubble Elevator ; 19 Kelp Farm ; 18 Xray Machine ; 17 Armor Wardrobe ; 16 Micro-Crop Farm."
      },
      {
        "position": 3,
        "title": "10 Minecraft Redstone Tricks for Survival Mode - dummies",
        "link": "https://www.dummies.com/article/home-auto-hobbies/games/online-games/minecraft/10-minecraft-redstone-tricks-for-survival-mode-147583",
        "displayed_link": "https://www.dummies.com › ... › Minecraft",
        "snippet": "Learn how to apply redstone programming in Minecraft Survival mode, including dungeon farms, fast transportation, elevators, and more."
      },
    ]
    

    或者,您可以使用 Google Organic Results API 来自SerpApi。

    这是一个付费API,有一个免费计划,可以处理来自谷歌或其他搜索引擎的数据块,可以扩展到月球,让最终用户思考要提取什么数据,而不是从头开始创建解析器并维护它,并找出如何绕过谷歌或其他搜索引擎的数据。

    要集成的代码:

    from serpapi import GoogleSearch
    import json
    
    params = {
        "api_key": "serpapi_key",
        "engine": "google",
        "q": "minecraft redstone ideas",
        "google_domain": "google.com",
        "gl": "us",
        "hl": "en"
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    
    data = []
    
    for result in results["organic_results"]:
        data.append({
            "position": result.get("position"),
            "title": result.get("title"),
            "link": result.get("link"),
            "displayed_link": result.get("displayed_link"),
            "snippet": result.get("snippet")
        })
        
    print(json.dumps(data, indent=2, ensure_ascii=False))
    

    部分输出:

    [
      {
        "position": 1,
        "title": "Minecraft: 20 Insanely Useful Redstone Contraptions ...",
        "link": "https://gamerant.com/minecraft-insanely-useful-redstone-contraptions/",
        "displayed_link": "https://gamerant.com › Lists",
        "snippet": "Minecraft: 20 Insanely Useful Redstone Contraptions ; 20 Bubble Elevator ; 19 Kelp Farm ; 18 Xray Machine ; 17 Armor Wardrobe ; 16 Micro-Crop Farm."
      },
      {
        "position": 2,
        "title": "Build These in Your Minecraft House! - YouTube",
        "link": "https://www.youtube.com/watch?v=a3ggfzC0rLg",
        "displayed_link": "https://www.youtube.com › watch",
        "snippet": null
      },
      {
        "position": 3,
        "title": "10 Minecraft Redstone Tricks for Survival Mode - dummies",
        "link": "https://www.dummies.com/article/home-auto-hobbies/games/online-games/minecraft/10-minecraft-redstone-tricks-for-survival-mode-147583",
        "displayed_link": "https://www.dummies.com › ... › Minecraft",
        "snippet": "Learn how to apply redstone programming in Minecraft Survival mode, including dungeon farms, fast transportation, elevators, and more."
      }, ... other results
    ]
    

    免责声明,我为SerpApi工作。