I am trying to use tf.data.Dataset.rejection_resample to balance my dataset, but I am running into an issue in which the method modifies the element_spec of my dataset, making it incompatible with my models.
The original element spec of my dataset is:
({'input_A': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None),
  'input_B': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None)},
 TensorSpec(shape=(None, 1, 1), dtype=tf.int64, name=None))
This is the element spec after batching.
However, if I run rejection_resample (before batching), the element spec at the end becomes:
(TensorSpec(shape=(None,), dtype=tf.int64, name=None),
 ({'input_A': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None),
   'input_B': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None)},
  TensorSpec(shape=(None, 1, 1), dtype=tf.int64, name=None)))
So rejection_resample is adding another tf.int64 tensor in the beginning of my data, which I can't find out what is it for. My problem is that this breaks compatibility between the input data and my model, since it depends on the original input tuple.
Furthermore, it also causes an inconsistency between the training and validation data. I was expecting to apply rejection_resample only on training data, but if I do that, the training dataset will have the added tensor, while the validation one won't.
So my question is what is this added tensor to the element spec, and if there is any way to drop an element from the dataset after building it. Thank you.
source https://stackoverflow.com/questions/75356723/tensorflow-data-dataset-rejection-resample-modifies-my-datasets-element-spec
Comments
Post a Comment