代码之家 › 专栏 › 技术社区 › Rahul

App Engine批量加载程序性能

bulkloader bulk-load google-app-engine performance python

5

Rahul · 技术社区 · 14 年前

我正在使用appengine批量加载程序(Python运行时)将实体批量上载到数据存储。我上传的数据是以专有格式存储的,所以我用自己的连接器(registerd)实现了它 bulkload_config.py )将其转换为中间python字典。

import google.appengine.ext.bulkload import connector_interface
class MyCustomConnector(connector_interface.ConnectorInterface):
   ....
   #Overridden method
   def generate_import_record(self, filename, bulkload_state=None):
      ....
      yeild my_custom_dict

def feature_post_import(input_dict, entity_instance, bulkload_state):
    ....
    return [all_entities_to_put]

注意:我没有使用 entity_instance, bulkload_state 在我的 feature_post_import 功能。我只是在创建新的数据存储实体(基于我的 input_dict

现在,一切都很顺利。然而,批量加载数据的过程似乎花费了太多的时间。例如,一GB(约1000000个实体)的数据需要约20小时。如何提高批量加载过程的性能。我错过什么了吗?

http://groups.google.com/group/google-appengine-python/browse_thread/thread/4c8def071a86c840

为了测试批量加载过程的性能,我加载了 entities Kind . 即使这样 entity 有一个非常简单的 FloatProperty ,我还是花了同样的时间来批量加载这些 实体 .

rps_limit , bandwidth_limit 和 http_limit ,看看我是否能获得更多的吞吐量。

1 回复 | 直到 14 年前

1

4

Rahul 14 年前

有一个参数叫做 rps_limit 它确定每秒上载的实体数。这是主要的瓶颈。默认值为 20

同时增加 bandwidth_limit 做一些合理的事情。

我增加了到 500 一切都改善了。我实现了每1000个实体5.5-6秒,这是从每1000个实体50秒的重大改进。