代码之家 › 专栏 › 技术社区 › Sam

Python:按瞬态分割声音文件

librosa audio python

Sam · 技术社区 · 4 年前

我写了一些函数来分析 sound file, 并且应该根据声音文件的瞬变(每次文件中的声音突然变化时)将其分开。您可以以慢动作播放声音文件 here 进一步理解我的意思。

def transients_from_onsets(onset_samples):
    """Takes a list of onset times for an audio file and returns the list of start and stop times for that audio file

    Args:
        onset_samples ([int]): I don't really know what these are actually. I thought they were start times for each sound change but I don't know

    Returns:
        [(int, int)]: A list of start and stop times for each sound change
    """
    starts = onset_samples[0:-1]
    stops = onset_samples[1:]
    transientTimes = []
    for s in range(len(starts)):
        transientTimes.append((starts[s], stops[s]))
    return transientTimes

def transients_from_sound_file(fileName, sr=44100):
    """Takes the path to an audio file
    and returns the list of start and stop times for that audio file
    as a frame rate

    Args:
        fileName (string): The path to an audio file
        sr (int, optional): The sample rate of the audio file. Defaults to 44100.

    Returns:
        [(int, int)]: A list of start and stop times for each sound change
    """
    y, sr = librosa.load(soundFile, sr=sr)
    C = np.abs(librosa.cqt(y=y, sr=sr))
    o_env = librosa.onset.onset_strength(sr=sr, S=librosa.amplitude_to_db(C, ref=np.max))
    onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)

    onset_samples = list(librosa.frames_to_samples(onset_frames))
    onset_samples = np.concatenate(onset_samples, len(y))
    transientTimes =  transients_from_onsets(onset_samples)
    return transientTimes, transientSamples

我写的那个还可以,但它有点偏离了应有的样子,有些声音被分成2个,而实际上它们应该只有一个。下面是我的程序输出的结果和预期结果。我想知道如何让我的结果看起来更像预期的结果。

这些是由我的程序确定的每个声音(瞬态)的开始和停止时间,后面是一个链接,指向原始声音文件被这些开始和结束时间分割时创建的声音文件

in frames: [(1536, 6144), (6144, 11264), (11264, 15360), (15360, 20992), (20992, 26624), (26624, 31744), (31744, 36352), (36352, 41984), (41984, 47104), (47104, 51712), (51712, 56832), (56832, 61440), (61440, 62976), (62976, 66560), (66560, 71680), (71680, 76800), (76800, 82944), (82944, 89088), (89088, 92672), (92672, 96768), (96768, 98304), (98304, 103936), (103936, 107008), (107008, 113664), (113664, 117760), (117760, 123904), (123904, 128512), (128512, 139264), (139264, 147968), (147968, 150016), (150016, 153088), (153088, 154624), (154624, 159232), (159232, 164864), (164864, 169472), (169472, 175616), (175616, 179200), (179200, 180736), (180736, 189440), (189440, 196096)]

in time: [(0.03566585034013605, 0.1426634013605442), (0.1426634013605442, 0.26154956916099775), (0.26154956916099775, 0.3566585034013605), (0.3566585034013605, 0.4874332879818594), (0.4874332879818594, 0.6182080725623583), (0.6182080725623583, 0.7370942403628118), (0.7370942403628118, 0.84409179138322), (0.84409179138322, 0.9748665759637188), (0.9748665759637188, 1.0937527437641723), (1.0937527437641723, 1.2007502947845805), (1.2007502947845805, 1.319636462585034), (1.319636462585034, 1.426634013605442), (1.426634013605442, 1.4622998639455782), (1.4622998639455782, 1.5455201814058959), (1.5455201814058959, 1.6644063492063492), (1.6644063492063492, 1.7832925170068026), (1.7832925170068026, 1.925955918367347), (1.925955918367347, 2.0686193197278913), (2.0686193197278913, 2.1518396371882087), (2.1518396371882087, 2.2469485714285717), (2.2469485714285717, 2.282614421768707), (2.282614421768707, 2.413389206349206), (2.413389206349206, 2.4847209070294785), (2.4847209070294785, 2.639272925170068), (2.639272925170068, 2.7343818594104308), (2.7343818594104308, 2.877045260770975), (2.877045260770975, 2.9840428117913835), (2.9840428117913835, 3.2337037641723354), (3.2337037641723354, 3.4358102494331066), (3.4358102494331066, 3.483364716553288), (3.483364716553288, 3.55469641723356), (3.55469641723356, 3.590362267573696), (3.590362267573696, 3.697359818594104), (3.697359818594104, 3.8281346031746035), (3.8281346031746035, 3.9351321541950113), (3.9351321541950113, 4.077795555555555), (4.077795555555555, 4.161015873015873), (4.161015873015873, 4.196681723356009), (4.196681723356009, 4.39878820861678), (4.39878820861678, 4.553340226757369)]

sound files

这就是开始和停止的时间

in frames: [(2067, 6431), (6431, 10795), (10795, 15389), (15389, 25495), (25495, 28940), (28940, 33534), (33534, 38587), (38587, 43640), (43640, 47085), (47085, 51679), (51679, 55814), (55814, 60867), (60867, 65231), (65231, 69595), (69595, 75337), (75337, 79931), (79931, 83606), (83606, 87740), (87740, 96928), (96928, 101521), (101521, 105885), (105885, 110479), (110479, 115073), (115073, 124031), (124031, 133218), (133218, 137353), (137353, 142176), (142176, 147229), (147229, 152282), (152282, 155728), (155728, 160321), (160321, 169739)]
in time: [(0.0479956462585034, 0.1493275283446712), (0.1493275283446712, 0.250659410430839), (0.250659410430839, 0.3573318820861678), (0.3573318820861678, 0.5919927437641723), (0.5919927437641723, 0.6719854875283447), (0.6719854875283447, 0.7786579591836735), (0.7786579591836735, 0.8959883900226757), (0.8959883900226757, 1.013318820861678), (1.013318820861678, 1.0933115646258504), (1.0933115646258504, 1.1999840362811791), (1.1999840362811791, 1.2959985487528345), (1.2959985487528345, 1.4133289795918367), (1.4133289795918367, 1.5146608616780044), (1.5146608616780044, 1.6159927437641723), (1.6159927437641723, 1.749321723356009), (1.749321723356009, 1.8559941950113379), (1.8559941950113379, 1.9413275283446711), (1.9413275283446711, 2.037318820861678), (2.037318820861678, 2.2506637641723355), (2.2506637641723355, 2.357313015873016), (2.357313015873016, 2.4586448979591835), (2.4586448979591835, 2.565317369614512), (2.565317369614512, 2.6719898412698413), (2.6719898412698413, 2.879994195011338), (2.879994195011338, 3.093315918367347), (3.093315918367347, 3.1893304308390027), (3.1893304308390027, 3.3013202721088435), (3.3013202721088435, 3.4186507029478457), (3.4186507029478457, 3.5359811337868483), (3.5359811337868483, 3.615997097505669), (3.615997097505669, 3.722646349206349), (3.722646349206349, 3.9413318820861676)]

sound files

0 回复 | 直到 4 年前