下面是一个完整的玩具示例(使用SLURM集群在JupyterLab上测试)。
这个例子使用Cython编译一个简单的函数,它对两个整数求和,但是当然可以将相同的技术应用到复杂(而且更有用)的代码中。
这里的关键技巧是必须让工人找到并导入Cython库。
这需要导入
pyximport
pyximport.install()
,然后在每个工作进程上导入Cython生成的模块。这是用
register_worker_callback()
.
注意,Cython生成的模块位于
<IPYTHONCACHEDIR/cython
目录(
IPYTHONCACHEDIR
IPython.paths.get_ipython_cache_dir()
). 必须将目录添加到Python查找模块的路径中,以便可以加载Cython生成的模块。
本例假设为SLURM,但只是为了方便起见。
分布式“网络”可以用任何其他方法设置(参见
http://distributed.dask.org/en/latest/setup.html
).
from dask import delayed
%load_ext cython
# Create a toy Cython function and put it into a module named remoteCython
%%cython -n remoteCython
def cython_sum(int a, int b):
return a+b
# Set up a distributed cluster (minimal, just for illustration)
# I use SLURM.
from dask_jobqueue import SLURMCluster
from distributed import Client
cluster = SLURMCluster(memory="1GB",
processes=1,
cores=1,
walltime="00:10:00")
cluster.start_workers(1) # Start as many workers as needed.
client = Client(cluster)
def init_pyx(dask_worker):
import pyximport
pyximport.install()
import sys
sys.path.insert(0,'<IPYTHONCACHEDIR>/cython/') # <<< replace <IPYTHONCACHEDIR> as appropriate
import remoteCython
client.register_worker_callbacks(init_pyx) # This runs init_pyx() on any Worker at init
import remoteCython
# ASIDE: you can find where the full path of Cython-generated library by
# looking at remoteCython.__file__
# The following creates a task and submits to the scheduler.
# The task computes the sum of 123 and 321 via the Cython function defined above
future = client.compute(delayed(remoteCython.cython_sum)(123,321))
# The task is executed on the remote worker
# We fetch the result from the remote worker
print(future.result()) # This prints 444
# We're done. Let's release the SLURM jobs.
cluster.close()