代码之家  ›  专栏  ›  技术社区  ›  Stuart Berg

达斯克:如何避免任务超时?

  •  2
  • Stuart Berg  · 技术社区  · 7 年前

    在基于dask的应用程序中(使用 distributed

    tornado.application - ERROR - Exception in Future <Future cancelled> after timeout
    Traceback (most recent call last):
      File "/miniconda/envs/flyem/lib/python3.6/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
    concurrent.futures._base.CancelledError
    

    分布的 我不清楚,可能是通过信号吗?)

    这是第二次回溯的dask部分:

      ... my code...
    
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/dask/base.py", line 156, in compute
        (result,) = compute(self, traverse=False, **kwargs)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/dask/base.py", line 397, in compute
        results = schedule(dsk, keys, **kwargs)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/distributed/client.py", line 2308, in get
        direct=direct)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/distributed/client.py", line 1647, in gather
        asynchronous=asynchronous)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/distributed/client.py", line 665, in sync
        return sync(self.loop, func, *args, **kwargs)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/distributed/utils.py", line 277, in sync
        six.reraise(*error[0])
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/six.py", line 693, in reraise
        raise value
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/distributed/utils.py", line 262, in f
        result[0] = yield future
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
        value = future.result()
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/tornado/gen.py", line 1141, in run
        yielded = self.gen.throw(*exc_info)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/distributed/client.py", line 1492, in _gather
        traceback)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/six.py", line 692, in reraise
        raise value.with_traceback(tb)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/dask/bag/core.py", line 1562, in reify
        seq = list(seq)
      File "/groups/flyem/proj/cluster/miniforge/envs/flyem/lib/python3.6/site-packages/dask/bag/core.py", line 1722, in map_chunk
        yield f(*a)
    
      ... my code ....
    
    1. after timeout 指示任务花费的时间太长,或者是否存在触发取消的其他超时,例如保姆或心跳超时?(据我所知,dask中的任务长度没有明确的超时,但我可能感到困惑。)

    2. dask 分布的 )正在取消我的任务,为什么?

    3. 这些任务可能需要很长时间——它们将大型缓冲区上载到云存储。如何增加dask中特定任务的超时?

    1 回复  |  直到 7 年前
        1
  •  1
  •   MRocklin    7 年前

    默认情况下,Dask不会对任务施加超时。

    你看到的被取消的未来不是Dask的未来,而是龙卷风的未来(龙卷风是Dask用于网络通信的库)。所以不幸的是,所有这一切都表明有些事情失败了。

    通常,在调试通过Dask运行的代码时,我们建议执行以下步骤: http://docs.dask.org/en/latest/debugging.html