代码之家  ›  专栏  ›  技术社区  ›  natemcintosh

@分布式似乎起作用了,功能还不稳定

  •  0
  • natemcintosh  · 技术社区  · 7 年前

    @sync @distributed 在3x嵌套 for println(errCmp[row, col]) errCmp 打印出来。例如。

    From worker 3:    2.351134946074191e9
    From worker 4:    2.3500830193505473e9
    From worker 5:    2.3502416529551845e9
    From worker 2:    2.3509105625656652e9
    From worker 3:    2.3508352842971106e9
    From worker 4:    2.3497049296121807e9
    From worker 5:    2.35048428351797e9
    From worker 2:    2.350742582031195e9
    From worker 3:    2.350616273660934e9
    From worker 4:    2.349709546599313e9
    

    errCmp公司

    我是不是错过了什么收尾词?

    function optimizeDragCalc(df::DataFrame)
        paramGrid = [cd*AoM for cd = range(1e-3, stop = 0.01, length = 50), AoM = range(2e-4, stop = 0.0015, length = 50)]
        errCmp    = zeros(size(paramGrid))
        # totalSize = size(paramGrid, 1) * size(paramGrid, 2) * size(df.time, 1)
        @sync @distributed for row = 1:size(paramGrid, 1)
            for col = 1:size(paramGrid, 2)
                # Run the propagation here
                BC = 1/paramGrid[row, col]
                slns, _ = propWholeTraj(df, BC)
                for time = 1:size(df.time, 1)
                    errDF = propError(slns[time], df, time)
                    errCmp[row, col] += sum(errDF.totalErr)
                end # time
                # println("row: ", row, " of ",size(paramGrid, 1),"   col: ", col, " of ", size(paramGrid, 2))
                println(errCmp[row, col])
            end # col
        end # row
        # plot(heatmap(z = errCmp))
        return errCmp, paramGrid
    end
    errCmp, paramGrid = @time optimizeDragCalc(df)
    
    1 回复  |  直到 7 年前
        1
  •  6
  •   Przemyslaw Szufel    7 年前

    你没有提供一个最低限度的工作例子,但我想这可能很难。这是我的。假设我们想使用 Distributed 计算…的和 Array 的列:

    using Distributed
    addprocs(2)
    @everywhere using StatsBase
    data = rand(1000,2000)
    res = zeros(2000)
    @sync @distributed for col = 1:size(data)[2]
        res[col] = StatsBase.mean(data[:,col])
        # does not work!
        # ... because data is created locally and never returned!
    end
    

    为了更正上面的代码,您需要提供一个聚合器函数(我故意简化了示例-可以进行进一步优化)。

    using Distributed
    addprocs(2)
    @everywhere using Distributed,StatsBase
    data = rand(1000,2000)    
    @everywhere function t2(d1,d2)
        append!(d1,d2)
        d1
    end
    res = @sync @distributed (t2) for col = 1:size(data)[2]
        [(myid(),col, StatsBase.mean(data[:,col]))]
    end
    

    2 当其他人在工作时 3 :

    julia> res
    2000-element Array{Tuple{Int64,Int64,Float64},1}:
     (2, 1, 0.49703681326230276)
     (2, 2, 0.5035341367791002)
     (2, 3, 0.5050607022354537)
     ⋮
     (3, 1998, 0.4975699181976122)
     (3, 1999, 0.5009498778934444)
     (3, 2000, 0.499671315490524)
    

    进一步可能的改进/修改:

    • @spawnat 在远程进程生成值(而不是主进程并发送它们)
    • SharedArray -这允许在工作人员之间自动分发数据。根据我的经验,需要非常仔细的编程。
    • 使用 ParallelDataTransfer.jl
    • 始终考虑Julia线程机制(在某些情况下,它使生活更轻松-同样取决于问题)