代码之家 › 专栏 › 技术社区 › jahmax

如何管理python线程的结果?

arrays multithreading python

jahmax · 技术社区 · 14 年前

我正在使用此代码:

def startThreads(arrayofkeywords):
    global i
    i = 0
    while len(arrayofkeywords):
        try:
            if i<maxThreads:
                keyword = arrayofkeywords.pop(0)
                i = i+1
                thread = doStuffWith(keyword)
                thread.start()
        except KeyboardInterrupt:
            sys.exit()
    thread.join()

对于Python中的线程,我几乎已经完成了所有工作,但我不知道如何管理每个线程的结果,因此,在每个线程上我都有一个字符串数组,如何将所有这些数组安全地连接到一个数组中?因为,如果我尝试写入一个全局数组,两个线程可以同时写入。

6 回复 | 直到 10 年前

Community CDub 8 年前

首先,你实际上需要保存 全部的 那些 thread 对象调用 join() 在他们身上。如前所述,您只保存最后一个,然后只有在没有例外的情况下才保存。

进行多线程编程的一个简单方法是为每个线程提供运行所需的所有数据,然后让它不向工作集之外的任何对象写入数据。如果所有线程都遵循该准则,则它们的写入操作不会相互干扰。然后,一旦线程完成,就 仅主螺纹 将结果聚合到全局数组中。这被称为“fork/join并行”。

如果对线程对象进行子类化,则可以给它空间来存储返回值,而不干扰其他线程。然后你可以这样做:

class MyThread(threading.Thread):
    def __init__(self, ...):
        self.result = []
        ...

def main():
    # doStuffWith() returns a MyThread instance
    threads = [ doStuffWith(k).start() for k in arrayofkeywords[:maxThreads] ]
    for t in threads:
        t.join()
        ret = t.result
        # process return value here

编辑:

环顾四周,好像上面的方法 isn't the preferred way to do threads in Python . 以上是线程的Java ESK模式。相反,你可以做如下的事情:

def handler(outList)
    ...
    # Modify existing object (important!)
    outList.append(1)
    ...

def doStuffWith(keyword):
    ...
    result = []
    thread = Thread(target=handler, args=(result,))
    return (thread, result)

def main():
    threads = [ doStuffWith(k) for k in arrayofkeywords[:maxThreads] ]
    for t in threads:
        t[0].start()
    for t in threads:
        t[0].join()
        ret = t[1]
        # process return value here

Alex Martelli 14 年前

使用A Queue.Queue 实例,它本质上是线程安全的。每个线程都可以 .put 当它完成时,它对全局实例的结果,以及主线程(当它知道所有工作线程都完成时,通过 .join 例如在@unholyssampler's answer)中使用它们可以循环 .get 从中提取每个结果,并使用每个结果 .extend “总体结果”列表,直到队列清空。

编辑 :您的代码还有其他大问题——如果线程的最大数目小于关键字的数目,它将永远不会终止(您正试图为每个关键字启动一个线程——永远不会减少——但是如果您已经启动了最大数目,那么您将永远循环到没有进一步的目的)。

考虑改用 线程池 有点像 this recipe ,除了代替排队的可调用文件之外,您将把关键字排队——因为您希望在线程中运行的可调用文件在每个线程中都是相同的,只是改变了参数。当然,可调用将被更改为从传入任务队列中剥离某些内容(使用 得到 ) 放 完成后发送结果队列的结果列表。

为了终止N个线程,在所有关键字之后, 放 n“哨兵”(例如 None ,假设没有关键字可以 没有 ):线程的可调用项将退出,如果它刚拉的“关键字”是 没有 .

通常情况下, 队列队列 提供组织线程(和多处理)的最佳方法!python中的体系结构,可以像我指给您的方法那样通用,也可以像我在最后两段中为您的用例建议的那样更专门化。

unholysampler 14 年前

您需要保持指向您所创建的每个线程的指针。实际上,您的代码只确保最后创建的线程完成。这并不意味着你在完成之前开始的所有工作都已经完成了。

def startThreads(arrayofkeywords):
    global i
    i = 0
    threads = []
    while len(arrayofkeywords):
        try:
            if i<maxThreads:
                keyword = arrayofkeywords.pop(0)
                i = i+1
                thread = doStuffWith(keyword)
                thread.start()
                threads.append(thread)
        except KeyboardInterrupt:
            sys.exit()
    for t in threads:
        t.join()
    //process results stored in each thread

这也解决了写访问的问题,因为每个线程都将在本地存储其数据。然后,在完成所有这些任务之后,您可以完成将每个线程本地数据组合起来的工作。

Gaskoin kekekeks 10 年前

我知道这个问题有点老了,但最好的方法是不要像其他同事提出的那样伤害自己。)

请阅读参考资料 Pool . 这样你就可以参与工作了:

def doStuffWith(keyword):
    return keyword + ' processed in thread'

def startThreads(arrayofkeywords):
    pool = Pool(processes=maxThreads)
    result = pool.map(doStuffWith, arrayofkeywords)
    print result

Donald Miner 14 年前

如果使用信号量来保护关键部分,则可以写入全局数组。当您想附加到全局数组时,可以“获取”锁,完成后,可以“释放”。这样,每个数组都只有一个线程。

退房 http://docs.python.org/library/threading.html 并搜索信号灯以获取更多信息。

sem = threading.Semaphore()
...
sem.acquire()
# do dangerous stuff
sem.release()

Fabiano 14 年前

尝试一些信号量的方法,比如获取和释放。 http://docs.python.org/library/threading.html