如你所料,这个问题有一个纯粹简单的解决方案。诀窍是巧妙地混合
np.searchsorted
,它会将您的常规网格放置在离原件最近的箱子上,并且
np.add.reduceat
要计算箱子的总和:
import numpy as np
def distribute(x, y, n):
"""
Down-samples/interpolates the y-values of each segment across a
domain with `n` points. `x` represents segment endpoints, so should
have one more element than `y`.
"""
y = np.asanyarray(y)
x = np.asanyarray(x)
new_x = np.linspace(x[0], x[-1], n + 1)
# Find the insertion indices
locs = np.searchsorted(x, new_x)[1:]
# create a matrix of indices
indices = np.zeros(2 * n, dtype=np.int)
# Fill it in
dloc = locs[:-1] - 1
indices[2::2] = dloc
indices[1::2] = locs
# This is the sum of every original segment a new segment touches
weighted = np.append(y * np.diff(x), 0)
sums = np.add.reduceat(weighted, indices)[::2]
# Now subtract the adjusted portions from the right end of the sums
sums[:-1] -= (x[dloc + 1] - new_x[1:-1]) * y[dloc]
# Now do the same for the left of each interval
sums[1:] -= (new_x[1:-1] - x[dloc]) * y[dloc]
return new_x, sums / np.diff(new_x)
seg = [0, 1.1, 2.2, 2.3, 2.8, 4]
color = [0, 0.88, 0.55, 0.11, 0.44]
seg, color = distribute(seg, color, 4)
print(seg, color)
结果是
[0. 1. 2. 3. 4.] [0. 0.792 0.374 0.44 ]
基准
EE_'s solution
我和我的同意了答案,并检查了时间安排。我稍微修改了另一个解决方案,使其具有与我相同的接口:
from scipy.interpolate import interp1d
def EE_(x, y, n):
I = np.zeros_like(x)
I[1:] = np.cumsum(np.diff(x) * y)
f = interp1d(x, I, bounds_error=False, fill_value=(0, I[-1]))
pix_x = np.linspace(x[0], x[-1], n + 1)
pix_y = (f(pix_x[1:]) - f(pix_x[:-1])) / (pix_x[1:] - pix_x[:-1])
return pix_x, pix_y
这里是测试台(方法
MadPhysicist
刚刚从
distribute
x
y
np.random.seed(0x1234ABCD)
x = np.cumsum(np.random.gamma(3.0, 0.2, size=1001))
y = np.random.uniform(0.0, 1.0, size=1000)
tests = (
MadPhysicist,
EE_,
)
for n in (5, 10, 100, 1000, 10000):
print(f'N = {n}')
results = {test.__name__: test(x, y, n) for test in tests}
for name, (x_out, y_out) in results.items():
print(f'{name}:\n\tx = {x_out}\n\ty = {y_out}')
allsame = np.array([[np.allclose(x1, x2) and np.allclose(y1, y2)
for x2, y2 in results.values()]
for x1, y1 in results.values()])
print()
print(f'Result Match:\n{allsame}')
from IPython import get_ipython
magic = get_ipython().magic
for test in tests:
print(f'{test.__name__}({n}):\n\t', end='')
magic(f'timeit {test.__name__}(x, y, n)')
我将跳过数据和协议打印输出(结果相同),并显示计时:
N = 5
MadPhysicist: 50.6 µs ± 349 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
EE_: 110 µs ± 568 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
N = 10
MadPhysicist: 50.5 µs ± 732 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
EE_: 111 µs ± 635 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
N = 100
MadPhysicist: 54.5 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
EE_: 114 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
N = 1000
MadPhysicist: 107 µs ± 5.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
EE_: 148 µs ± 5.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
N = 10000
MadPhysicist: 458 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EE_: 301 µs ± 4.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)