python SDK的
Client.insert_rows
记录为
通过流式API将行插入表中。
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
但当我试图用它处理大量数据时
要进行流式传输,我遇到了以下错误:
Traceback (most recent call last):
File "demo.py", line 15, in <module>
exit(main())
File "demo.py", line 12, in main
client.insert_rows(table, rows)
File "google/cloud/bigquery/bigquery_future/client.py", line 1213, in insert_rows
return self.insert_rows_json(table, json_rows, **kwargs)
File "google/cloud/bigquery/bigquery_future/client.py", line 1293, in insert_rows_json
data=data)
File "google/cloud/bigquery/bigquery_future/client.py", line 301, in _call_api
return call()
File "google/api_core/retry.py", line 246, in retry_wrapped_func
on_error=on_error,
File "google/api_core/retry.py", line 163, in retry_target
return target()
File "google/cloud/core_future/_http.py", line 279, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST https://www.googleapis.com/bigquer
y/v2/projects/myproject/datasets/mydataset/tables/mytable/insertAll:
Request payload size exceeds the limit: 10915700 bytes.
深入研究代码,在向文档中提到的restapi发送POST-http请求之前,它肯定会对数据进行两次传递(我已经从生成器中小心地生成了这些数据)。该API将一个JSON对象指定为body,这不是一种可流式格式,我认为在该端点中没有任何流式传输的允许。
我错过了什么?SDK开发者对流媒体的定义和我的完全不同吗?流式API的大小有何限制?