I always did my own code to format my data (3D, normalize...) for my LSTM models. Now I have to work with bigger dataset and need to ingess many csv files. What is the best way to make all the work fast and memory efficiency.
Tensorflow suggest a data generator and finaly convert data set to data.dataset and I found guy doing thing like this:
WINDOW_SIZE = 72
BATCH_SIZE = 32
dataset = (
tf.data.Dataset.from_tensor_slices(dataset_train)
.window(WINDOW_SIZE, shift=1)
.flat_map(lambda seq: seq.batch(WINDOW_SIZE))
.map(lambda seq_and_label: (seq_and_label[:,:-1], seq_and_label[-1:,-1]))
.batch(BATCH_SIZE)
)
I realy want to learn the best way, my goal is to use my code in production and learn in the futur more about Mlops. Thank for your help and if you have good explained exemple to set up 3d lstm data.dataset, I take all suggestion
source https://stackoverflow.com/questions/77084220/what-the-state-of-the-art-way-to-build-lstm-data-data-generator-or-tf-data-data
Comments
Post a Comment