I'm training a neural network with MXNet and turn out that some net's parameters become nan after some training iterations.
I'll let the mains part of my code for explanation:
Data preparation
import mxnet as mx
from mxnet import gluon, autograd, nd
from mxnet.gluon import nn, rnn
ctx = mx.cpu()
X_train = nd.array(X_train, dtype='float32', ctx=ctx) # nd.array of shape (14184, 30, 24)
Y_train = nd.array(Y_train, dtype='float32', ctx=ctx) # nd.array of shape (14184, 1)
batch_size = 128
train_dataset = gluon.data.ArrayDataset(X_train, Y_train)
train_loader = gluon.data.DataLoader(
train_dataset, batch_size=batch_size, shuffle=True,
)
NN's implementation
net = nn.HybridSequential()
net.add( rnn.RNN(hidden_size=64, layout='NTC') )
net.add( rnn.RNN(hidden_size=64, layout='NTC') )
net.add( nn.Dropout(rate=0.1) )
net.add( rnn.RNN(hidden_size=32, layout='NTC') )
net.add( rnn.RNN(hidden_size=32, layout='NTC') )
net.add( nn.Flatten() )
net.add( nn.Dropout(rate=0.2) )
net.add( nn.Dense(units=96, activation='relu') )
net.add( nn.Dense(units=96, activation='relu') )
net.add( nn.Dense(units=64, activation='relu') )
net.add( nn.Dense(units=64, activation='relu') )
net.add( nn.Dense(units=1, activation='relu') )
net.initialize(ctx=ctx)
net.hybridize()
Training
# Define the trainer for the model
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})
# Define the loss function
loss_fn = gluon.loss.L2Loss()
# training loop
epochs = 5
for epoch in range(epochs):
for data, labels in train_loader:
with autograd.record():
outputs = net(data)
loss = loss_fn(outputs, labels)
loss.backward()
trainer.step(batch_size)
I omitted several code lines but think that was the most importan. I realized there was troubles because, after every epoch, the printed log for the training loss was nan. And after some inspection, when I made:
net.collect_params()['rnn0_l0_i2h_weight'].data() #first layer's weights
The output was an array with nans in some rows.
I'm pretty sure that followed correctly the step-by-step indicated here, in the MXNet documentation. But maybe I'm making a mistake, I don't know. So, if someone could help me to figure out, I would be very grateful.
source https://stackoverflow.com/questions/74465972/neural-networks-parameters-turning-nan-with-mxnet
Comments
Post a Comment