这是Fisher-Yates Knuth就地采样(
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
)
内存稳定,大约4Gb(是的,我使用的是100000000)
# Fisher-Yates-Knuth sampling, in-place Durstenfeld version
import numpy as np
def swap(data, posA, posB):
if posA != posB:
data[posB], data[posA] = data[posA], data[posB]
def get_random_element(data, datalen):
pos = datalen
while pos > 0:
idx = np.random.randint(low=0, high=pos) # sample in the [0...pos) range
pos -= 1
swap(data, idx, pos)
yield data[pos]
length = 100000000
some_long_list = list(range(0, length))
gen = get_random_element(some_long_list, length)
for k in range(0, length):
print(next(gen))
更新
为了提高速度,您可能还需要内联swap()