代码之家 › 专栏 › 技术社区 › HSJ

如何避免在Python Google Places API循环期间覆盖特定列中的值

iteration google-api loops python

HSJ · 技术社区 · 2 年前

我正在搜索来自 Google Places API (Nearby Search) 下面的脚本。

import googlemaps
import pandas
import time
import datetime
import csv
import pprint

# Set API key
akey = 'APIKEY'
client = googlemaps.Client(akey)

# Generate empty list and 
result_list = []

# Search keyword
skey = 'restaurant'
    
# Radius to search (meters)
rad = 7080

# Read csv
fname = 'Centroid10000m.csv'
csv = pandas.read_csv(fname, header=0, encoding='utf8')

# Keyword for nearby search
skey = 'restaurant'
# Radius for nearby search (meters)
rad = 7080

## Start iteration by base coordinates
for (lat, lon, pid) in zip(csv['lat'], csv['lon'], csv['id']):
    loc = (lat, lon)
    print(loc)
    print(pid)

    # Search keyword from loc within radius
    result = client.places_nearby(
        location = loc,
        radius = rad,
        keyword = skey,
        language = 'en'
        )
    
    # Keep search result in an emply variabe
    result_list.extend(result.get('results'))

    # Obtain next page token
    next_page_token = result.get('next_page_token')

    # Start loop
    while next_page_token:
            time.sleep(2)
            result = client.places_nearby(
                location = loc,
                radius = rad,
                keyword = skey,
                language = 'en',
                page_token = next_page_token
                )
            result_list.extend(result.get('results'))
            next_page_token = result.get('next_page_token')
            pprint.pprint(result)

    # Convert JSON into data frame
    df = pandas.DataFrame(result_list)

    # Extract necessary fields
    print(df.columns.tolist())
    df = df[['place_id','name','types', 'geometry', 'vicinity',
              'business_status', 'opening_hours',
              'rating','user_ratings_total',
              'plus_code']]
    
    # Add identifiers
    df['csv'] = fname
    df['id'] = pid
    df['time'] = datetime.datetime.now()
    
# Print result
print(df)

在这个脚本中,我使用一个文件作为输入。 Centroid10000m.csv :它是一个包含多对的输入 lat , lon 和 id 。数据结构如下图所示。在我的剧本中, places_nearby() 操作参考 离子 和 纬度 以增量方式实现API,直到 身份证 。

然而,在运行脚本后,我意识到 csv , time 和 身份证 所有搜索到的记录中的一个被上一次迭代中更新的变量覆盖。

例如, 身份证 应该是 38 在第一次迭代中,在下一次迭代中基于来自 中心10000m.csv 。 时间 应存储每次迭代的时间戳。

不幸的是,打印文件在中显示了相同的时间戳 时间 以及所有 身份证 是 30 。我知道这是由于以下脚本造成的,因为它覆盖了所有值。

df['id'] = pid
df['time'] = datetime.datetime.now()

我发现了一些处理类似问题的帖子,但由于它们使用的数据结构不同,我无法在脚本中实现它。由于我对Python还很陌生,所以我无法将这个想法应用到我的脚本中。

Overwriting values in column created with Python for loop

如果有人能修改我的剧本,我将不胜感激。我需要执行的是:

正在加载 身份证 从…起 中心10000m.csv 并将其存储在 身份证 其应当在迭代期间改变。
将每次迭代的时间戳存储在 时间 领域

感谢您的支持。

0 回复 | 直到 2 年前