代码之家  ›  专栏  ›  技术社区  ›  Sparkles

如何自动刮取以下CSV

  •  1
  • Sparkles  · 技术社区  · 1 年前

    Page

    在上面的页面上,如果您单击“下载CSV”,它将把CSV文件下载到您的计算机上。我想建立一个每晚下载CSV的过程。我也很乐意收集数据,CSV似乎更容易。我真的什么都没找到。帮助

    3 回复  |  直到 1 年前
        1
  •  1
  •   Noah    1 年前
    import requests
    from bs4 import BeautifulSoup
    import os
    
    # URL of the webpage
    url = "https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc"
    
    # Send a GET request to the webpage
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find the link to the CSV file
        csv_link = soup.find('a', text='Download CSV')['href']
        
        # Download the CSV file
        csv_response = requests.get(csv_link)
        
        # Check if the request was successful
        if csv_response.status_code == 200:
            # Specify the directory to save the CSV file
            save_dir = "/path/to/save/directory"
            
            # Create the directory if it doesn't exist
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)
            
            # Save the CSV file
            with open(os.path.join(save_dir, 'data.csv'), 'wb') as f:
                f.write(csv_response.content)
            
            print("CSV file downloaded successfully.")
        else:
            print("Failed to download CSV file.")
    else:
        print("Failed to retrieve webpage.")
    
        2
  •  0
  •   n1c9    1 年前
    import requests
    
    def get_daily_stats(url):
        response = requests.get(url, headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
            'Referer': 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc'
        })
        with open('daily_stats.csv', 'wb') as f:
            f.write(response.content)
        return
    
    def main():
        url = 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc&csv=true'
        get_daily_stats(url)
    
    if __name__ == '__main__':
        main()
    

    这将为您下载CSV并将其保存到 daily_stats.csv 在脚本所在的文件夹中。您必须安装 requests python -m pip install requests .如何在晚上做这件事,更多的是取决于什么对你最有效。我的意思是,你可以每天晚上都运行它,或者你的目标是在你的计算机上有一个自动运行它的进程?

    我想这将在2025年停止工作,但你可以在那时更改URL中的年份。

        3
  •  0
  •   user24131350    1 年前
    import requests
    import datetime
    
    def download_csv(url, filename):
      response = requests.get(url)
      if response.status_code == 200:
        with open(filename, 'wb') as f:
            f.write(response.content)
        print(f"CSV file downloaded successfully as {filename}")
    else:
        print("Failed to download CSV file")
    
    if __name__ == "__main__":
      # URL of the webpage where the CSV file is located
      csv_url = "https://example.com/download/csv"
    
      # Filename to save the CSV file as
      timestamp = datetime.datetime.now().strftime("%Y-%m-%d")
      csv_filename = f"data_{timestamp}.csv"  # You can customize the 
      filename as needed
    
      # Download the CSV file
      download_csv(csv_url, csv_filename)
    

    这定义了一个函数(download_csv),该函数将URL和文件名作为输入。它使用请求库来获取网页的内容,并将其保存到计算机上指定的“文件名”中。