代码之家  ›  专栏  ›  技术社区  ›  bukzor

bigquery python中的流插入

  •  1
  • bukzor  · 技术社区  · 7 年前

    python SDK的 Client.insert_rows 记录为

    通过流式API将行插入表中。

    https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

    但当我试图用它处理大量数据时 要进行流式传输,我遇到了以下错误:

    Traceback (most recent call last):
      File "demo.py", line 15, in <module>
        exit(main())
      File "demo.py", line 12, in main
        client.insert_rows(table, rows)
      File "google/cloud/bigquery/bigquery_future/client.py", line 1213, in insert_rows
        return self.insert_rows_json(table, json_rows, **kwargs)
      File "google/cloud/bigquery/bigquery_future/client.py", line 1293, in insert_rows_json
        data=data)
      File "google/cloud/bigquery/bigquery_future/client.py", line 301, in _call_api
        return call()
      File "google/api_core/retry.py", line 246, in retry_wrapped_func
        on_error=on_error,
      File "google/api_core/retry.py", line 163, in retry_target
        return target()
      File "google/cloud/core_future/_http.py", line 279, in api_request
        raise exceptions.from_http_response(response)
    google.api_core.exceptions.BadRequest: 400 POST https://www.googleapis.com/bigquer
    y/v2/projects/myproject/datasets/mydataset/tables/mytable/insertAll:
    Request payload size exceeds the limit: 10915700 bytes.
    

    深入研究代码,在向文档中提到的restapi发送POST-http请求之前,它肯定会对数据进行两次传递(我已经从生成器中小心地生成了这些数据)。该API将一个JSON对象指定为body,这不是一种可流式格式,我认为在该端点中没有任何流式传输的允许。

    我错过了什么?SDK开发者对流媒体的定义和我的完全不同吗?流式API的大小有何限制?

    0 回复  |  直到 7 年前