代码之家 › 专栏 › 技术社区 › ZHANG Juenjie

张量流根据概率分布整数

choice random tensorflow python

ZHANG Juenjie · 技术社区 · 7 年前

相应的python代码是:

frequency=np.random.choice([1,2,3,4],20,p=[0.02,0.5,0.3,0.18])
from collections import Counter
np.fromiter(Counter(frequency).values(), dtype=np.float32)

# Out[86]:
# array([8., 8., 4.], dtype=float32)

然而,我有超过1e8~许多部分,数字不是20而是一些1e10。

frequency=np.random.choice([i for i in range (10**7)],16**10,p=[0.0000001 for i in range(10**7)])
from collections import Counter
r=np.fromiter(Counter(frequency).values(), dtype=np.float32)

现在它只是屈服了 MemoryError:

我认为tensorflow gpu能够解决这个问题,因为输出结果的大小只有10**7。有人知道怎么做吗?

1 回复 | 直到 7 年前

Vedang Waradpande 7 年前

这里有几个问题需要考虑。

克服CPU内存错误:

生产线 MemoryError

    In [1]: frequency = np.random.choice([i for i in range (10**7)],16**10,p=[0.0000
   ...: 001 for i in range(10**7)])
   ...: 
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)

原因是第1行的输出没有大小 10**7 但是 16**10 . 因为这就是导致MemoryError的原因,所以我们的目标应该是永远不要创建一个如此大的列表。

factor 可存储的次数。在我的机器上 1000000 有办法。一旦我们创建了样本,我们就使用Counter将其转换为频率字典。优点是,我们知道频率字典在转换为列表或numpy数组时,永远不会超过 10**7

由于有些元素可能每次都不在采样数组中,因此我们将不直接将计数器字典转换为列表,而是在上一次迭代中使用字典更新此字典,以保留特定元素的频率。

p 到 np.random.choice()

import numpy as np
import tensorflow as tf

from click import progressbar
from collections import Counter

def large_uniform_sample_frequencies(factor=1000000, total_elements=10**7, sample_size=16**10):
    # Initialising progressbar
    bar = range(factor)

    # Initialise an empty dictionary which 
    # will be updated in each iteration
    counter_dict = {}

    for iteration in bar:
        # Generate a random sample of size (16 ** 10) / factor
        frequency = np.random.choice([i for i in range (total_elements)],
            sample_size / factor)

        # Update the frequency dictionary
        new_counter = Counter(frequency)
        counter_dict.update(new_counter)

    return np.fromiter(counter_dict.values(), dtype=np.float32)

如你所说 tensorflow-gpu 我可以假设你要么想摆脱 记忆者 或者把它和 tensorflow gpu

解决问题 记忆者 ,你可以试试 tf.multinomial() np.random.choice() 作为 shown here

例如,如果您想在训练某个模型时运行它,您可以使用分布式Tensorflow将计算图的这部分作为PS任务放在CPU上,方法是使用上面给出的代码。以下是最终代码:

# Mention the devices for PS and worker tasks
ps_dev = '/cpu:0'
worker_dev = '/gpu:0'

# Toggle True to place computation on CPU 
# and False to place it on the least loaded GPU
is_ps_task = True

# Set device for a PS task
if (is_ps_task):
    device_setter = tf.train.replica_device_setter(worker_device=worker_dev,
        ps_device=ps_dev, 
        ps_tasks=1)

# Allocate the computation to CPU
with tf.device(device_setter):
    freqs = large_uniform_sample_frequencies()