代码之家  ›  专栏  ›  技术社区  ›  Shamoon

如何将频谱图数据转换为张量(或多维numpy数组)?

  •  0
  • Shamoon  · 技术社区  · 5 年前

    我正在使用 keras 并且具有:

            corrupted_samples, corrupted_sample_rate = sf.read(
                self.corrupted_audio_file_paths[index])
    
            frequencies, times, spectrogram = scipy.signal.spectrogram(
                corrupted_samples, corrupted_sample_rate)
    

    根据 the docs ,这给出了:

    f (ndarray) - Array of sample frequencies.
    t (ndarray) - Array of segment times.
    Sxx (ndarray) - Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.
    

    我假设所有的时间都会排成一行,所以我不在乎时间的价值(我不这么认为)。同样的道理也适用于 frequencies 因此,我真正需要的是每个频率在每个时间的值,它由下式给出 Sxx (或 spectrogram )在我的代码中。我不确定该怎么做。不过,这似乎很简单。

    0 回复  |  直到 5 年前
        1
  •  2
  •   wz 98    5 年前

    基于 https://towardsdatascience.com/speech-recognition-analysis-f03ff9ce78e9 作者指出,频谱图是声音的频谱时间表示,并展示了将wav文件转换为频谱图的一些步骤。

    其中一个例子可能如下:

    ## Check the sampling rate of the WAV file.
    audio_file = './siren_mfcc_demo.wav'
    
    
    import wave
    with wave.open(audio_file, "rb") as wave_file:
        sr = wave_file.getframerate()
    print(sr)
    
    audio_binary = tf.read_file(audio_file)
    
    # tf.contrib.ffmpeg not supported on Windows, refer to issue
    # https://github.com/tensorflow/tensorflow/issues/8271
    waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='wav', samples_per_second=sr, channel_count=1)
    print(waveform.numpy().shape)
    
    signals = tf.reshape(waveform, [1, -1])
    signals.get_shape()
    
    # Compute a [batch_size, ?, 128] tensor of fixed length, overlapping windows
    # where each window overlaps the previous by 75% (frame_length - frame_step
    # samples of overlap).
    frames = tf.contrib.signal.frame(signals, frame_length=128, frame_step=32)
    print(frames.numpy().shape)
    
    # `magnitude_spectrograms` is a [batch_size, ?, 129] tensor of spectrograms. We
    # would like to produce overlapping fixed-size spectrogram patches; for example,
    # for use in a situation where a fixed size input is needed.
    magnitude_spectrograms = tf.abs(tf.contrib.signal.stft(
        signals, frame_length=256, frame_step=64, fft_length=256))
    
    print(magnitude_spectrograms.numpy().shape)
    

    上述方法是指 https://colab.research.google.com/drive/1Adcy25HYC4c9uSBDK9q5_glR246m-TSx#scrollTo=QTa1BVSOw1Oe

    希望它能帮助你。