TensorFlow tutorial on audio data preparation (https://www.tensorflow.org/io/tutorials/audio) provides the following example:
import tensorflow as tf
import tensorflow_io as tfio
audio = tfio.audio.AudioIOTensor('gs://cloud-samples-tests/speech/brooklyn.flac')
print(audio)
...and then states "The content of the audio clip will only be read as needed, either by converting AudioIOTensor
to Tensor
through to_tensor()
, or though slicing (emphasis added). Slicing is especially useful when only a small portion of a large audio clip is needed:"
audio_slice = audio[100:]
# remove last dimension
audio_tensor = tf.squeeze(audio_slice, axis=[-1])
print(audio_tensor)
This works as advertised, and it prints:
<AudioIOTensor: shape=[28979 1], dtype=<dtype: 'int16'>, rate=16000>
tf.Tensor([16 39 66 ... 56 81 83], shape=(28879,), dtype=int16)
So far, so good. Now I try this with a stereo FLAC:
audio = tfio.audio.AudioIOTensor('./audio/stereo_file.flac')
print(audio.shape)
...which prints
tf.Tensor([12371520 2], shape=(2,), dtype=int64)
I see here there are two channels as expected. Again, so far, so good.
Now I'd like to extract one channel only and take some number of samples, say 512. So I try:
audio_slice = audio[0:512, 0:1]
...and this fails and crashes Python.
Check failed: 1 == NumElements() (1 vs. 2)Must have a one element tensor
...
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
However, if I make a copy first with empty slice notation, everything works as I'd hope.
audio_slice = audio[:]
print(audio_slice.shape)
audio_slice = audio_slice[0:512, 0:1]
print(audio_slice.shape)
audio_tensor = tf.squeeze(audio_slice, axis=[-1])
print(audio_tensor.shape)
...which prints:
tf.Tensor([12371520 2], shape=(2,), dtype=int64)
(12371520, 2)
(512, 1)
(512,)
I assume the first fails on account of the audio being lazy-loaded, but I'm not sure why it works in the tutorial but fails with my stereo file. Shouldn't the slicing load data as needed?
tensorflow>=2.8.0
tensorflow-io>=0.25.0
source https://stackoverflow.com/questions/71991390/why-does-slicing-lazy-loaded-audioiotensor-fail-with-stereo-flac
Comments
Post a Comment