Training Neural Networks Part 1

In [1]:
%pylab inline
from ipypublish import nb_setup

import keras
keras.__version__
from keras import models
from keras import layers
Populating the interactive namespace from numpy and matplotlib

Backprop recap

The Backprop algorithm was known by the mid-1980s, but it toook almost two more decades before the field of Deep Learning entered the mainstream. There were several reasons for this delay, including the fact that the processing power was not yet there, but the main reason was that Backprop simply did not work for large models that arise in practical problems. In these cases it was observed that the gradients died away before the training was complete, thus limiting the accuracy of the model. In this chapter and in the next, our objective is go through individual elements of the Gradient Descent algorithm and make improvements so that it is able to work for large models. We start by examining the parameter update equation and present several modifications that improve upon it in Sections LearningRateSelection and ParameterUpdate . We then investigate the role of the Activation Function in Section ActivationLossFunctions, and also come up with a number of alternative functions that improve performance. The correct initialization of weights at the start of the algorithm is also a huge issue, and is discussed in Section InitializingWeights. In Section DataPreprocessing we show that if the training data is pre-conditioned before being fed into the model, then it leads to several benefits in the training process. We end the chapter with a discussion of Batch Normalization which is a recently discovered technique to improve the training process, but which has already had a big impact on the field.

In [2]:
#TNN1
nb_setup.images_hconcat(["DL_images/TNN1.png"], width=600)
Out[2]:

Issues with Gradient Descent

Recall that the Gradient Descent based parameter update equation in one dimension is given by:

$$ w\leftarrow w - \eta\frac{\partial {\mathcal L}}{\partial w} $$

As shown in Figure TNN1, $\frac{\partial {\mathcal L}}{\partial w} > 0$ for points on the curve to the right of the minimum. This causes the value of $w$ to decrease, until it converges to the minimum point at which the gradient is zero. Conversely $\frac{\partial {\mathcal L}}{\partial w} < 0$ for points to the left of the minimum, which causes $w$ to increase with each iteration. There are a number of cases in which this simple iteration does not work very well, and we will describe these next:

In [3]:
#TNN2
nb_setup.images_hconcat(["DL_images/TNN2.png"], width=600)
Out[3]:
  1. Even in the simple one dimensional case, it is easy to see that the learning rate parameter $\eta$ exerts a powerful infuence on the convergence process (see Figure TNN2). If $\eta$ is too small, then the convergence happens very slowly as shown in the left hand side of the figure. On the other hand if $\eta$ is too large, then the algorithm starts to oscillate and may even diverge.
In [4]:
#TNN4
nb_setup.images_hconcat(["DL_images/TNN4.png"], width=600)
Out[4]:
  1. If Gradient Descent is run in multiple dimensions, then other problems can arise. One such problem is illustrated in Figure TNN4. The figure illustrates a two dimensional scenario in which te Loss Function $\mathcal L$ has a very steep slope along one dimension and a shallow slope along the other: i.e., it has the shape of a Narrow Steep Valley. If we run Gradient Descent in this system, we get the behavior shown in the diagram on the LHS (this is a 2-D analog of the behavior shown in the RHS of Figure TNN2). The parameter that lies along the steep part of the objective function oscillates back and forth between the valley slopes, while the parameter that lies along the shallow part of the Loss Function moves slowly down the valley. The net effect of this is that convergence happens very slowly. The right hand side of the figure shows a more ideal convergence behavior, which we will show how to achieve in Section ParameterUpdate.
In [5]:
#TNN3
nb_setup.images_hconcat(["DL_images/TNN3.png"], width=600)
Out[5]:
In [6]:
#TNN8
nb_setup.images_hconcat(["DL_images/TNN8.png"], width=600)
Out[6]:
  1. Another issue that arises when Gradient Descent is run in multiple dimensions, is that of Saddle Points. These are defined as areas on the surface of the Loss Function, which are a minima when observed along one of the dimensions, and simultaneously serve as a maxima when observes along another dimension. This is illustrated for the the two dimensional case in Figure TNN3. When the iteration approaches a saddle point from above, then even though it is not at a minimum, it slope hits zero. As a result the ieration comes to halt and the algorithm gets stuck a non-optimal point. This behavior is further illustrated in Figure TNN8.

All the issues with Gradient Descent that were raised in this section are addressed by techniques described in the next two sections.

Learning Rate Annealing

In [7]:
#TNN6
nb_setup.images_hconcat(["DL_images/TNN6.png"], width=600)
Out[7]:

As we mentioned in the previous section, the Learning Rate parameter has a big influence on the effectiveness of the Gradient Descent algorithm. If it is set to a large value then the algorithm moves quickly at the start of the iteration, but the large step size can cause a parameter overshoot as the system approaches minimum which can lead to oscillations. If set too small then the algorithm converges with high likelihood, however it can take a very long time to do so (see Fig. TNN2). Hence ideally $\eta$ should be set adaptively such it is large in the initial stages of the optimization and becomes smaller as it gets closer to the minimum.

Figure TNN6 illustrates the effect of the Learning Rate on the Loss Function during training and can be used to do a quick check on the suitability of the rate being used. A very high Learning Rate can cause the Loss Function to start to increase after a few iterations, while a moderately high rate causes the Loss to plateau at a high value after an initial rapid decrease. A very low Learning Rate on the other hand can be identified by a slow decrease in the Loss Function over training epochs. A good Learning Rate on the other hand combines a quick decrease during the initial epochs with a lower steady state value.

In [8]:
#TNN7
nb_setup.images_hconcat(["DL_images/TNN7.png"], width=400)
Out[8]:

A well known technique for achieving the best Learning Rate behavior is called Learning Rate Annealing. This is the strategy of reducing the Learning Rate as the system approaches the minimum (see Figure TNN7), such that rate is high at the start of the training and gradually falls as the training progresses. This reduction can be done in several ways, popular approaches are:

  • Track the validation accuracy and decrease the Learning Rate when it appears to plateau.

  • Automatically anneal the Learning Rate based on the number of epochs that the Gradient Descent algorithm has been through.

Instead of using the same Learning Rate for every parameter, in the next Section we will learn about techniques that tailor the rate to the parameter. Thus parameters possessing a steep gradient get lower rates compared to parameters with a smaller gradient.

Keras provides a feature called callbacks which can be used to implement Learning Rate Annealing. Callbacks is an object that is passed to the fit routine, and then gets called by the model while the training is still going on. Some of the uses for this feature incluse:

  • Interrupt training
  • Save the model parameters
  • Load a different set of parameters
  • etc..

Callbacks can be used to do Learning Rate Annealing, as illustrated in the example below. We use one of the built-in callbacks called ReduceLROnPlateau, which has three parameters: (1) The performance measure to be monitored, (2) The factor by which the Learning Rate is reduced everytime the callback is triggered, and (3) The number of epochs for which the performance measure is seen to be stationary before the callback is triggered.

In [5]:
import keras
keras.__version__
from keras import models
from keras import layers
from keras import callbacks
from tensorflow.keras import optimizers

from keras.datasets import cifar10

(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

train_images = train_images.reshape((50000, 32 * 32 * 3))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 32 * 32 * 3))
test_images = test_images.astype('float32') / 255

from tensorflow.keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

network = models.Sequential()
network.add(layers.Dense(20, activation='relu', input_shape=(32 * 32 * 3,)))
network.add(layers.Dense(15, activation='relu'))
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='sgd',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

callbacks_list = [
     keras.callbacks.ReduceLROnPlateau(
          monitor = 'val_loss',
          factor = 0.1,
          patience = 5,
      )
]

sgd = optimizers.SGD(learning_rate=0.01, decay=1e-6, momentum=0.9, nesterov=True)

network.compile(optimizer = sgd,
                loss='categorical_crossentropy',
                metrics=['accuracy'])

history = network.fit(train_images, train_labels, epochs=500, batch_size=128, 
                      callbacks = callbacks_list, validation_split=0.2)

history_dict = history.history
history_dict.keys()
//anaconda/envs/miniconda3/lib/python3.6/site-packages/keras/optimizer_v2/optimizer_v2.py:356: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  "The `lr` argument is deprecated, use `learning_rate` instead.")
Epoch 1/500
313/313 [==============================] - 2s 4ms/step - loss: 1.9762 - accuracy: 0.2794 - val_loss: 1.8635 - val_accuracy: 0.3287
Epoch 2/500
313/313 [==============================] - 1s 3ms/step - loss: 1.8028 - accuracy: 0.3539 - val_loss: 1.8617 - val_accuracy: 0.3289
Epoch 3/500
313/313 [==============================] - 1s 3ms/step - loss: 1.7598 - accuracy: 0.3658 - val_loss: 1.7539 - val_accuracy: 0.3736
Epoch 4/500
313/313 [==============================] - 1s 3ms/step - loss: 1.7268 - accuracy: 0.3805 - val_loss: 1.7565 - val_accuracy: 0.3702
Epoch 5/500
313/313 [==============================] - 1s 3ms/step - loss: 1.7037 - accuracy: 0.3916 - val_loss: 1.7178 - val_accuracy: 0.3897
Epoch 6/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6834 - accuracy: 0.3975 - val_loss: 1.7238 - val_accuracy: 0.3849
Epoch 7/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6762 - accuracy: 0.4012 - val_loss: 1.7840 - val_accuracy: 0.3620
Epoch 8/500
313/313 [==============================] - 1s 4ms/step - loss: 1.6614 - accuracy: 0.4055 - val_loss: 1.7091 - val_accuracy: 0.3938
Epoch 9/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6542 - accuracy: 0.4069 - val_loss: 1.8199 - val_accuracy: 0.3479
Epoch 10/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6475 - accuracy: 0.4086 - val_loss: 1.7347 - val_accuracy: 0.3772
Epoch 11/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6416 - accuracy: 0.4111 - val_loss: 1.7961 - val_accuracy: 0.3486
Epoch 12/500
313/313 [==============================] - 1s 4ms/step - loss: 1.6337 - accuracy: 0.4142 - val_loss: 1.6660 - val_accuracy: 0.4056
Epoch 13/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6304 - accuracy: 0.4132 - val_loss: 1.7423 - val_accuracy: 0.3767
Epoch 14/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6241 - accuracy: 0.4151 - val_loss: 1.7761 - val_accuracy: 0.3568
Epoch 15/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6191 - accuracy: 0.4180 - val_loss: 1.7345 - val_accuracy: 0.3729
Epoch 16/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6162 - accuracy: 0.4194 - val_loss: 1.6833 - val_accuracy: 0.3978
Epoch 17/500
313/313 [==============================] - 1s 3ms/step - loss: 1.6105 - accuracy: 0.4230 - val_loss: 1.7222 - val_accuracy: 0.3859
Epoch 18/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5772 - accuracy: 0.4338 - val_loss: 1.6482 - val_accuracy: 0.4117
Epoch 19/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5680 - accuracy: 0.4391 - val_loss: 1.6503 - val_accuracy: 0.4117
Epoch 20/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5677 - accuracy: 0.4361 - val_loss: 1.6445 - val_accuracy: 0.4115
Epoch 21/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5665 - accuracy: 0.4372 - val_loss: 1.6397 - val_accuracy: 0.4163
Epoch 22/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5653 - accuracy: 0.4381 - val_loss: 1.6427 - val_accuracy: 0.4149
Epoch 23/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5646 - accuracy: 0.4389 - val_loss: 1.6495 - val_accuracy: 0.4108
Epoch 24/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5644 - accuracy: 0.4384 - val_loss: 1.6430 - val_accuracy: 0.4153
Epoch 25/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5631 - accuracy: 0.4413 - val_loss: 1.6417 - val_accuracy: 0.4143
Epoch 26/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5628 - accuracy: 0.4396 - val_loss: 1.6430 - val_accuracy: 0.4132
Epoch 27/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5575 - accuracy: 0.4413 - val_loss: 1.6390 - val_accuracy: 0.4154
Epoch 28/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5565 - accuracy: 0.4419 - val_loss: 1.6391 - val_accuracy: 0.4154
Epoch 29/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5561 - accuracy: 0.4413 - val_loss: 1.6389 - val_accuracy: 0.4161
Epoch 30/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5561 - accuracy: 0.4414 - val_loss: 1.6381 - val_accuracy: 0.4157
Epoch 31/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5560 - accuracy: 0.4424 - val_loss: 1.6387 - val_accuracy: 0.4164
Epoch 32/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5557 - accuracy: 0.4426 - val_loss: 1.6384 - val_accuracy: 0.4175
Epoch 33/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5559 - accuracy: 0.4415 - val_loss: 1.6384 - val_accuracy: 0.4150
Epoch 34/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5556 - accuracy: 0.4420 - val_loss: 1.6388 - val_accuracy: 0.4156
Epoch 35/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5555 - accuracy: 0.4418 - val_loss: 1.6393 - val_accuracy: 0.4170
Epoch 36/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5552 - accuracy: 0.4421 - val_loss: 1.6386 - val_accuracy: 0.4163
Epoch 37/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5548 - accuracy: 0.4421 - val_loss: 1.6384 - val_accuracy: 0.4166
Epoch 38/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5548 - accuracy: 0.4421 - val_loss: 1.6384 - val_accuracy: 0.4168
Epoch 39/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5547 - accuracy: 0.4422 - val_loss: 1.6383 - val_accuracy: 0.4162
Epoch 40/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5547 - accuracy: 0.4425 - val_loss: 1.6384 - val_accuracy: 0.4169
Epoch 41/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6384 - val_accuracy: 0.4163
Epoch 42/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4425 - val_loss: 1.6384 - val_accuracy: 0.4164
Epoch 43/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6384 - val_accuracy: 0.4161
Epoch 44/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4423 - val_loss: 1.6384 - val_accuracy: 0.4158
Epoch 45/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4423 - val_loss: 1.6383 - val_accuracy: 0.4159
Epoch 46/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4423 - val_loss: 1.6383 - val_accuracy: 0.4159
Epoch 47/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4159
Epoch 48/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4159
Epoch 49/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 50/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 51/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 52/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 53/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 54/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 55/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 56/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 57/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 58/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 59/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 60/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 61/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 62/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 63/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 64/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 65/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 66/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 67/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 68/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 69/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 70/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 71/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 72/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 73/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 74/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 75/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 76/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 77/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 78/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 79/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 80/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 81/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 82/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 83/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 84/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 85/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 86/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 87/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 88/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 89/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 90/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 91/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 92/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 93/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 94/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 95/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 96/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 97/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 98/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 99/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 100/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 101/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 102/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 103/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 104/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 105/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 106/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 107/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 108/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 109/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 110/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 111/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 112/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 113/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 114/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 115/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 116/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 117/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 118/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 119/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 120/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 121/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 122/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 123/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 124/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 125/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 126/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 127/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 128/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 129/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 130/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 131/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 132/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 133/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 134/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 135/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 136/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 137/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 138/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 139/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 140/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 141/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 142/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 143/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 144/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 145/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 146/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 147/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 148/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 149/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 150/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 151/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 152/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 153/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 154/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 155/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 156/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 157/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 158/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 159/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 160/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 161/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 162/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 163/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 164/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 165/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 166/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 167/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 168/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 169/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 170/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 171/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 172/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 173/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 174/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 175/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 176/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 177/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 178/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 179/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 180/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 181/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 182/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 183/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 184/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 185/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 186/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 187/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 188/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 189/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 190/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 191/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 192/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 193/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 194/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 195/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 196/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 197/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 198/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 199/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 200/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 201/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 202/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 203/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 204/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 205/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 206/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 207/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 208/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 209/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 210/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 211/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 212/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 213/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 214/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 215/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 216/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 217/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 218/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 219/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 220/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 221/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 222/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 223/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 224/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 225/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 226/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 227/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 228/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 229/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 230/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 231/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 232/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 233/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 234/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 235/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 236/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 237/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 238/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 239/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 240/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 241/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 242/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 243/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 244/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 245/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 246/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 247/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 248/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 249/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 250/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 251/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 252/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 253/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 254/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 255/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 256/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 257/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 258/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 259/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 260/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 261/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 262/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 263/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 264/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 265/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 266/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 267/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 268/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 269/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 270/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 271/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 272/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 273/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 274/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 275/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 276/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 277/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 278/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 279/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 280/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 281/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 282/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 283/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 284/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 285/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 286/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 287/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 288/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 289/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 290/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 291/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 292/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 293/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 294/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 295/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 296/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 297/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 298/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 299/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 300/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 301/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 302/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 303/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 304/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 305/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 306/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 307/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 308/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 309/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 310/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 311/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 312/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 313/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 314/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 315/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 316/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 317/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 318/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 319/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 320/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 321/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 322/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 323/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 324/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 325/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 326/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 327/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 328/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 329/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 330/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 331/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 332/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 333/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 334/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 335/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 336/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 337/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 338/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 339/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 340/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 341/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 342/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 343/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 344/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 345/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 346/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 347/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 348/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 349/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 350/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 351/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 352/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 353/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 354/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 355/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 356/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 357/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 358/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 359/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 360/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 361/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 362/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 363/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 364/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 365/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 366/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 367/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 368/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 369/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 370/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 371/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 372/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 373/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 374/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 375/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 376/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 377/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 378/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 379/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 380/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 381/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 382/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 383/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 384/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 385/500
313/313 [==============================] - 1s 4ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 386/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 387/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 388/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 389/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 390/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 391/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 392/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 393/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 394/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 395/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 396/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 397/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 398/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 399/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 400/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 401/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 402/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 403/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 404/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 405/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 406/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 407/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 408/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 409/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 410/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 411/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 412/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 413/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 414/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 415/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 416/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 417/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 418/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 419/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 420/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 421/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 422/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 423/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 424/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 425/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 426/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 427/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 428/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 429/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 430/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 431/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 432/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 433/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 434/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 435/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 436/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 437/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 438/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 439/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 440/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 441/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 442/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 443/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 444/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 445/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 446/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 447/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 448/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 449/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 450/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 451/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 452/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 453/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 454/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 455/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 456/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 457/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 458/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 459/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 460/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 461/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 462/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 463/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 464/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 465/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 466/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 467/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 468/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 469/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 470/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 471/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 472/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 473/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 474/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 475/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 476/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 477/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 478/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 479/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 480/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 481/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 482/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 483/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 484/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 485/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 486/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 487/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 488/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 489/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 490/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 491/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 492/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 493/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 494/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 495/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 496/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 497/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 498/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 499/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Epoch 500/500
313/313 [==============================] - 1s 3ms/step - loss: 1.5546 - accuracy: 0.4424 - val_loss: 1.6383 - val_accuracy: 0.4158
Out[5]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy', 'lr'])
In [6]:
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)
#epochs = range(1, len(loss) + 1)

# "bo" is for "blue dot"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()
In [7]:
plt.clf()   # clear figure
acc_values = history_dict['accuracy']
val_acc_values = history_dict['val_accuracy']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

Improvements to the Parameter Update Equation

In the next few sections we present a number of modifications to the base parameter update equation $w\leftarrow w - \eta\frac{\partial {\mathcal L}}{\partial w}$, which help to improve the performance of the Gradient Descent algorithm. Some of these algorithms automatically adapt the effective Learning Rate as the training progresses (for example the ADAGRAD, RMSPROP and Adam algorithms), while others improve the speed of convergence (for example the Momentum, Nesterov Momentum and Adam algorithms).

Momentum

Momentum is one of the most popular techniques used to improve the speed of convergence of the Gradient Descent algorithm. The basic idea behind Momentum is the following: Some Loss Funtions are characterized by the figure shown on LHS of Figure TNN4. In this case the gradient along one of the dimensions is very large, while along the other dimension it is small. If we do the Gradient Descent iteration for this system then the parameter on the steep side fluctuates from one side of the "canyon" to the other, while the parameter on the shallow side progresses very slowly down the canyon. This behavior slows down the speed of convergence quite a lot. An ingenious but simple technique that can counteract this behavior is as follows: Replace the Gradient Descent iteration by the following:

At the end of the $n^{th}$ iteration of the Backprop algorithm, define a sequence $v(n)$ by

$$ v(n) = \rho\; v(n-1) - \eta \; g(n) $$

with

$$ v(0) = -\eta \; g(0) $$

where $\rho$ is new hyper-parameter called the "momentum" parameter, and $g(n)$ is the gradient evaluated at parameters value $w(n)$, defined by

$$ g(n) = \frac{\partial {\mathcal L(n)}}{\partial w} $$

for Stochastic Gradient Descent and

$$ g(n) = {\eta\over B}\sum_{m=nB}^{(n+1)B}\frac{\partial {\mathcal L(m)}}{\partial w} $$

for Batch Stochastic Gradient Descent (note that in this case $n$ is an index into the batch number). The change in parameter values on each iteration is now defined as

\begin{equation} w(n+1) = w(n) + v(n) \quad \quad (**Wn1**) \end{equation}

It can be shown from these equations that $v(n)$ can be written as

\begin{equation} v(n) = - \eta\sum_{i=0}^n \rho^{n-i} g(i) \quad \quad (**Wn2**) \end{equation}

so that

\begin{equation} w(n+1) = w(n) - \eta\sum_{i=0}^n \rho^{n-i} g(i) \quad \quad (**Wn3**) \end{equation}

When the momentum parameter $\rho = 0$, then this equation reduces to the usual Stochastic Gradient Descent iteration. On the other hand, when $\rho > 0$, then we get some interesting behaviors:

  • If the gradients $g(i)$ are such that they change sign frequently (as in the steep side of Figure TNN4), then the stepsize $\sum_{i=0}^n \rho^{n-i}g(i)$ will be small. Thus the change in these parameters with the number of iterations will limited.

  • If the gradients $g(i)$ are such that they maintain their sign (as in the shallow portion of Figure TNN4), then the stepsize $\sum_{i=0}^n \rho^{n-i}g(i)$ will be large. This means that if the gradients maintain their sign then the corresponding parameters will take bigger and bigger steps as the algorithm progresses, even though the individual gradients may be small.

In [8]:
#TNN5
nb_setup.images_hconcat(["DL_images/TNN5.png"], width=2000)
Out[8]:

The Momentum algorithm thus accelerates parameter convergence for parameters whose gradients consistently point in the same direction, and slows parameter change for parameters whose gradient changes sign frequently, thus resulting in faster convergence (this is shown on the RHS of Figure TNN4). The variable $v(n)$ is analogous to velocity in a dynamical system, while the parameter $1-\rho$ plays the role of the co-efficient of friction. The value of $\rho$ determines the degree of momentum, with the momentum becoming stronger as $\rho$ approaches $1$. Note that

$$ \sum_{i=0}^{n} \rho^{n-i}g(i) \le {g_{max}\over 1-\rho} $$

$\rho$ is usually set to the neighborhood of $0.9$ and from the above equation it follows that $\sum_{i=0}^n \rho^{n-i}g(i)\approx 10g$ assuming all the $g(i)$ are approximately equation to $g$. Hence the effective gradient in Equation (Wn3) is ten times the value of the actual gradient. This results in an "overshoot" where the value of the parameter shoots past the minimum point to the other side of the bowl, and then reverses itself. This is a desirable behavior since it prevents the algorithm from getting stuck at a saddle point or a local minima, since the momentum carries it out from these areas (see Figure TNN5).

Nesterov Momentum

Nesterov Momentum is a variation on the plain Momentum method described above. Note that the Momentum parameter update equations can be written as:

$$ v(n) = \rho\; v(n-1) - \eta \; g(w(n)) $$

$$ w(n+1) = w(n) + v(n) $$

In the first equation we have explicitly written out the fact that the gradient $g$ is computed for parameter value $w(n)$. These equations can be improved by evaluation of the gradient at parameter value $w(n+1)$ instead. This may seem like circular reasoning since in order to compute $w(n+1)$ we first need to compute $g(w(n))$. However note that $w(n+1)\approx w(n) + \rho v(n-1)$. This leads to the velocity update equation for Nesterov Momentum

$$ v(n) = \rho\; v(n-1) - \eta \; g(w(n)+\rho v(n-1)) $$

where $g(w(n)+\rho v(n-1))$ denotes the gradient computed at parameter values $w(n) + \rho v(n-1)$. By using a slightly more accurate estimate of the gradient in each step, it has been observed in practice that the Gradient Descent process speeds up considerably when compared to the plain Momentum method.

The ADAGRAD Algorithm

The Momentum and Nesterov Momentum algorithms help to improve the speed of convergence, however we still have the issue of optimally varying the Learning Rate parameter (see Section LearningRateSelection). It would be nice if this could be done automatically as part of the parameter update equation and this is precisely what the ADAGRAD algorithm does. This algorithm replaces the parameter update rule with the following equation:

\begin{equation} w(n+1) = w(n) - \frac{\eta}{\sqrt{\sum_{i=1}^n g(n)^2+\epsilon}}\; g(n) \tag 1 \end{equation}

The constant $\epsilon$ has been added to better condition the denominator and is usually set to a small number such $10^{-7}$.

Equation (1) leads to the following benefits: Each parameter gets its own adaptive Learning Rate, such that large gradients have smaller learning rates and small gradients have larger learning rates ($\eta$ is usually defaulted to $0.01$). As a result the progress along each dimension evens out over time, which helps the training process. This is a type of Learning Rate annealing, but it is more powerful since:

  • Each parameter gets its own customized rate,
  • The change in rates happens automatically as part of the parameter update equation.

Also, the accumulation of gradients in the denominator leads to smaller Learning Rates over time, which has the same effect as annealing. This is a double edged sword, since the continuous decrease in Learning Rates can lead to a halt of training in large networks that require a greater number of iterations. This problem is addressed by the RMSPROP algorithm, which is described next.

The RMSPROP Algorithm

The RMSPROP algorithm accumulates the sum of gradients using a sliding window, using the following formula:

$$ \Delta_n = \rho \Delta_{n-1} + (1-\rho) g(n)^2 $$

where $\rho$ is a decay constant (usually set to $0.9$). This operation (called a Low Pass Filter) has a windowing effect, since it forgets gradients that are far back in time. The quantity $RMS[g]_n$ defined by

$$ RMS_n = \sqrt{\Delta_n + \epsilon} $$

is used in the denominator of equation (1), resulting in following the parameter update equation:

\begin{equation} w(n+1) = w(n) - \frac{\eta}{RMS_n}\; g(n) \tag 2 \end{equation}

Note that $$ \Delta_n = (1-\rho)\sum_{i=0}^n \rho^{n-i} g(i)^2 \le \frac{g_{max}}{1-\rho} $$ which shows that the parameter $\rho$ prevents the sum from blowing up, and a large value of $\rho$ is equivalent to using a larger window of previous gradients in computing the sum. Hence RMSPROP retains the benefits of ADAGRAD while avoiding the decay of the Learning Rate to zero.

The Adam Algorithm

The Adaptive Moment Estimation (Adam) algorithm combines the best of algorithms such as Momentum that speed up the training process, with algorithms such as RMSPROP that adaptively vary the effective Learning Rate. The update equtions for Adam are as follows:

$$ \Lambda_n = \beta\Lambda_{n-1} +(1-\beta)g(n),\ \ \ {\hat\Lambda}_n = \frac{\Lambda_n}{1-\beta^n} $$

$$ \Delta_n = \alpha\Delta_{n-1} + (1-\alpha) g(n)^2,\ \ \ {\hat\Delta}_n = \frac{\Delta_n}{1-\alpha^n} $$

$$ w(n+1) = w(n) - \eta\frac{\hat\Lambda(n)}{\sqrt{\hat\Delta_n + \epsilon}} $$

The definition of the sequence $\Delta_n$ is identical to that of $\Delta_n$ in the RMSPROP, and it serves an identical purpose, i.e., it is used to customize the effective Learning Rate on a per parameter basis, so that the rates for parameters with larger gradients are equalized with those for parameters with smaller gradients.

The sequence $\Lambda_n$ is used to provide "Momentum" to the updates, and works in a fashion similar to the velocity sequence $v(n)$ in the Momentum algorithm. It is easy to show that

$$ \Lambda_n = (1-\rho)\sum_{i=0}^n \rho^{n-i} g(i) \le \frac{g_{max}}{1-\rho} $$

which shows that $\Lambda_n$ is the weighted sum of the previous $n$ gradients (compare this with the expression for $v(n)$ in the previous section). Since $\Lambda_n$ and $\Delta_n$ are initialized as vectors of 0s, they are biased towards 0 at the start of the iteration. These biases are counteracted by computing the estimates $\hat\Lambda_n$ and $\hat\Delta_n$. The parameters $\alpha$ and $\beta$ are usually defaulted to $10^{-8}$ and $0.999$ respectively.

Adam serves as the default choice for the parameter update rule, since it combines the best features of the other update algorithms.

Specifying Optimizers in Keras

Optimizers can either be instantiated before being invoked as in:

model = Sequential()
model.add(Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(Activation('softmax'))

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

Or they can be called by name, in which case default parameters are used, as in:

model.compile(loss='mean_squared_error', optimizer='sgd')

The following optimizers are available in Keras:

# Stochastic Gradient Descent
keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)

# RMSProp
keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9)

#AdaGrad
keras.optimizers.Adagrad(learning_rate=0.01)

#Adadelta
keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95)

#Adam
keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)

If any of the optimizers are invoked by name, then Keras supplies default values for all the relevant parameters, which usually work quite well in practice.

Keras has added a new feature called the KerasTuner, which as the name implies, can be used to automatically tune and find the best parameters. More information about the KerasTuner, along with a short tutorial, can be found at the following webpages: https://www.tensorflow.org/tutorials/keras/keras_tuner, https://keras.io/keras_tuner/.

Choice of Activation Functions

In [16]:
#AF1
nb_setup.images_hconcat(["DL_images/AF1.png"], width=600)
Out[16]:

The choice of Activation Function has a major influence on the training and operation of DLN systems. When DLNs were first proposed, the first choice for the activation was the sigmoid, probably as a result of what was then known about biological neurons. This turned out to be an unfortunate choice, as illustrated in Figure AF1. This picture shows a single neuron with a sigmoid activation. Using the Gradient Flow rules from Chapter TrainingNNsBackprop, we can see that the backpropagation of the gradient $\frac{\partial\mathcal L}{\partial z}$ results in

$$ \frac{\partial\mathcal L}{\partial a} = z(1-z)\frac{\partial\mathcal L}{\partial z} $$

If the neuron is saturated, i.e., $a$ lies significantly away from the origin, then from the shape of the sigmoid it follows that either $z$ or $(1-z)$ is zero, which implies that $\frac{\partial\mathcal L}{\partial a} \approx 0$. As a result the gradients flowing back to the next layer of nodes will also be zero. The neuron in this state is said to be "dead". Once a neuron is dead, it stays dead, since in order to get get it back into the active state, the inputs weights (shown on the LHS of Figure AF1) need to change. However the weights cannot change since the gradient with respect to the weights is given by $\frac{\partial\mathcal L}{\partial w} = z'\delta$ and $\delta = 0$.

Thus the choice of the sigmoid function contributed to the Vanishing Gradient problem that plagued the first DLN systems. Interestingly enough, a suitable replacement for the sigmoid (the ReLU function) was not proposed until 2010. But once it was in place, it contributed to the rapid advances in the field since then. Our objective in this section is to survey some of the many Activation Functions that have been proposed in the last few years.

The tanh Function

In [17]:
#AF2
nb_setup.images_hconcat(["DL_images/AF2.png"], width=600)
Out[17]:

Historically the $\tanh$ function was proposed next after the sigmoid. As Figure AF2 shows, its shape is very similar to the sigmoid, so that unless the input is in the neghborhood of zero, the function enters its saturated regime. It is superior to the sigmoid in one respect, i.e., its output is zero centered. This is a desirable property in Activation Functions, since we show later in Section DataPreprocessing, that this speeds up the training process. The $\tanh$ function is rarely used in modern DLNs, the exception being a type DLN called LSTM (see Chapter RNNs). Some of the variables in LSTMs function as memory units, in which $\tanh$ outputs are used to increment or decrement the memory state by numbers that are close to $+1$ or $-1$.

The ReLU Function

The most common Activation Function in use today is the Rectified Linear Unit or ReLU (see Figure ReLU). This function was proposed by Nair and Hinton (2010) as a replacement for the sigmoid. The function is given by:

$$ z = \max(0,a) $$

In [9]:
a = linspace(-5,5,100)
plot(a,maximum(0,a))
grid()
xlabel('Net Input (a)')
title("ReLU")
Out[9]:
Text(0.5,1,'ReLU')
In [19]:
#AF3
nb_setup.images_hconcat(["DL_images/AF3.png"], width=600)
Out[19]:

This is an extremely simple Activation Function which is linear with slope one for positive values of the argument, and zero otherwise. Hence this function does not suffer from the saturation problem as long as the input exceeds zero. Furthermore since the slope is one for $x>0$ and zero for $x<0$, it follows that it functions as a gate controlled by its input during the backprop process (see Figure AF3). It follows that the gradients $\frac{\partial L}{\partial w}$ propagate undiminished through the network, provided all the nodes are active. This makes DLNs using ReLU less susceptible to Vanishing Gradients.

In [20]:
#AF4
nb_setup.images_hconcat(["DL_images/AF4.png"], width=600)
Out[20]:

Even though ReLU based DLNs are much better at propagating gradients compared to what came before, they do suffer suffer from the "Dead ReLU Problem". This is illustrated in Figure AF4. The dotted line in this figure shows a case in which the weight parameters $w_i$ are such that the hyperplane $\sum w_i z_i$ does not intersect the "data cloud" of possible input activations. This implies that there does not exist any possible input values that can lead to $\sum w_i z_i > 0$. Hence the neuron's output activation will always be zero, and it will kill all gradients backpropagating down from higher layers. This is referred to as the "Dead ReLU" problem. When training large networks with millions of neurons, it is not un-common to run into a situation in which a section of the network becomes dead and takes no further part in the training process. In order to avoid this, the DLN designer should be on the alert to spot this situation and take steps to correct it. The problem is most often caused due to bad initializations and later in this chapter we will show how to properly initialize a network to avoid this. The hyperplane for a well functioning neuron is shown as the solid line in Figure AF4). Note that it intersects with the input data cloud so that there are input values that put the neuron in the active state.

Over the last few years, several other Activation Functions have been proposed, but none of them offer a significant performance benefit over the ReLU function, which remains the most widely used. A couple of popular functions are described next.

The Leaky ReLU and the PreLU Functions

In [21]:
#AF5
nb_setup.images_hconcat(["DL_images/AF5.png"], width=600)
Out[21]:

The Leaky ReLU function is shown in Figure AF5 and is defined by:

$$ z = \max(ca,a),\ \ 0\le c<1 $$

where $c$ is a hyper-parameter representing the slope of the function for $a<0$. The idea behind this function is quite straighforward: Given an incoming gradient of ${\partial\mathcal L}\over{\partial z}$, it backpropagates a gradient of $c{{\partial\mathcal L}\over{\partial z}}$ if the input $a<0$, thus avoiding the Dead ReLU problem.

Instead of deciding on the value of $c$ through experimentation, why not determine it using Backpropagation as well. This is the idea behind the Pre-ReLU or PReLU function shown in Figure AF6.

In [22]:
#AF6
nb_setup.images_hconcat(["DL_images/AF6.png"], width=1000)
Out[22]:

This function is defined as

$$ z_i = \max(\beta_i a_i,a_i), \quad 1 \le i \le S $$

Note that each neuron $i$ now has its own parameter $\beta_i, 1\le i\le S$, where $S$ is the number of nodes in the network. These parameters are iteratively estimated using Backprop. In order to do this we use the Gradient Flow rules to obtain an expression for the gradient $\frac{\partial\mathcal L}{\partial\beta_i}$ as follows:

$$ \frac{\partial\mathcal L}{\partial\beta_i} = \frac{\partial\mathcal L}{\partial z_i}\frac{\partial z_i}{\partial\beta_i},\ \ 1\le i\le S $$

Substituting the value for $\frac{\partial z_i}{\partial\beta_i}$ we obtain

$$ \frac{\partial\mathcal L}{\partial\beta_i} = a_i\frac{\partial\mathcal L}{\partial z_i}\ \ if\ \beta_i \ge 1\ \ \mbox{and} \ \ 0 \ \ \mbox{otherwise} $$

which is then used to update $\beta_i$ using $\beta_i\rightarrow\beta_i - \eta\frac{\partial\mathcal L}{\partial\beta_i}$.

Once training is complete, the PreLU based DLN network ends up with a different value of $\beta_i$ at each neuron, which increases the flexibility of the network at the cost of an increase in the number of parameters.

The MaxOut Function

In [23]:
#AF7
nb_setup.images_hconcat(["DL_images/AF7.png"], width=600)
Out[23]:

In order to motivate the design of the Maxout function, consider the Leaky ReLU function:

$$ z = \max(ca,a), $$

This can also be written as (using $z_i$ for the input activations and $z'_i$ for the output activations):

$$ z'_i = \max(c\big[\sum_j w_{ij}z_j +b_i\big],\sum_j w_{ij}z_j +b_i), $$

Hence the output activation is the max of two hyperplanes, one of which is a multiple of the other. A straightforward generalization of this is shown in Figure AF7, in which we allow the two hyperplanes to be independent with their own set of parameters, i.e.,

$$ z'_i = \max(\sum_j w_{ij}(1)z_j +b_i(1),\sum_j w_{ij}(2)z_j +b_i(2)) $$

and we have arrived at the Maxout function.

In [24]:
#maxoutActivation
nb_setup.images_hconcat(["DL_images/maxoutActivation.png"], width=600)
Out[24]:

As shown in Figure maxoutActivation, this idea can be generalized so that each activation is the max of $k$ hyperplanes. In the absence of Maxout, the layer on the left with $d$ nodes (marked $x$) would be fully connected to the later on the right with $m$ nodes (marked $h$). Maxout introduces $k$ additional layers in-between, each of them with $m$ nodes. The nodes in these “hidden layers” compute an affine function $a_{ij}$ (at the $i$-th node of the $k$-th hidden layer), without a corresponding Activation:

$$ a_{ij} = z^\top W_{\_ij} + b_{ij}, \quad 1 \leq i \leq m, 1 \leq j \leq k $$

Note that $W$ is now a Tensor of dimension $d \times m \times k$, while $b$ is now a matrix of dimension $m \times k$.

The final output at the $i$-th output node, which is the Activation Function that we are after, is then computed as the maximum of the outputs of the $k$ nodes at the $i$-th position in the “hidden layers”.

$$ z'_i = \max_{1 \leq j \leq k} a_{ij}, \quad 1 \leq i \leq m $$

In [25]:
#AF8
nb_setup.images_hconcat(["DL_images/AF8.png"], width=600)
Out[25]:

Figure AF8 shows examples of Activation Functions that have been synthesized out of linear segments by following the Maxout algorithm. Hence a single maxout node can be interpreted as making linear approximation to an arbitrary convex activation function, whose shape is learnt as part of the training process. The Maxout function works quite well in practice, but at the expense of a large increase in the number of parameters.

Specifying Activation Functions in Keras

Activations can be invoked through an Activation Layer, as in:

from keras.layers import Activation, Dense

model.add(Dense(64))
model.add(Activation('tanh'))

Or through an Activation argument, as in:

model.add(Dense(64, activation='tanh'))

The following Activation Functions are available in Keras:

#ReLU: Leaky ReLU can be specified by choosing alpha > 0
    keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0)
#tanh
    keras.activations.tanh(x)
#sigmoid
    keras.activations.sigmoid(x)
#Hard sigmoid
    keras.activations.hard_sigmoid(x)
#Exponential
    keras.activations.exponential(x)
#elu
    keras.activations.elu(x, alpha=1.0)
#softmax
    keras.activations.softmax(x, axis=-1)

Initializing the Weight Parameters

The choice of the initialization for DLN weight parameters is an extremely important decision, and can determine whether the Stochastic Gradient algorithm converges successfully or not. A bad initialization can lead to a premature end of the training process with all the activations and gradients going towards zero, or numerical instabilities and failure to converge. Even when the algorithm does converge successfully, the initial point can determine the speed of convergence, the end value of the Loss Function, the classification accuracy on the test set, etc.

Unfortunately, due to lack of theoretical understanding of the DLN optimization process, we rely on a few simple heuristics for doing initialization that have been discovered over the years. One of the only rules in this area which is known with certainty is that the initial parameters should be set in such a way as to “break symmetry” between nodes. This means that if two nodes have the same set of inputs, then they need to be initialized to different values, or else the stochastic gradient iteration being deterministic, will cause their values to change in lock step.

In practice, the DLN weight parameters are initialized with random values drawn from Gaussian or Uniform distributions and the following rules are used:

  • Guassian Initialization: If the weight is between layers with $n_{in}$ input neurons and $n_{out}$ output neurons, then they are initialized using a Gaussian random distribution with mean zero and standard deviation $\sqrt{2\over n_{in}+n_{out}}$.

  • Uniform Initialization: In the same configuration as above, the weights should be initialized using an Uniform distribution between $-r$ and $r$, where $r = \sqrt{6\over n_{in}+n_{out}}$.

When using the ReLu or its variants, these rules have to be modified slightly (see He, Zhang, Ren, Sun (2015):

  • Guassian Initialization: If the weight is between layers with $n_{in}$ input neurons and $n_{out}$ output neurons, then they are initialized using a Gaussian random distribution with mean zero and standard deviation $\sqrt{4\over n_{in}+n_{out}}$.

  • Uniform Initialization: In the same configuration as above, the weights should be initialized using an Uniform distribution between $-r$ and $r$, where $r = \sqrt{12\over n_{in}+n_{out}}$.

The reasoning behind scaling down the initialization values as the number of incident weights increases is to prevent saturation of the node activations during the forward pass of the Backprop algorithm, as well as large values of the gradients during backward pass.

Data Preprocessing

In [27]:
#INI2
nb_setup.images_hconcat(["DL_images/INI2.png"], width=600)
Out[27]:

Data Preprocessing is an important step in all ML systems, including DLNs. Data Preprocessing steps include cleaning, normalization, transformation, feature extraction and selection etc., S.B. Kotsiantis at al (2006) is a well known reference on this topic, and has descriptions of these steps. In this section we describe only the Normalization step, which is the process of Data Centering + Scaling. Normalization is applied most commonly in DLN systems used to process image data. These operations are illustrated in Figure INI2.

Data Normalization and proceeds in the following steps:

Centering: This is also sometimes called Mean Subtraction, and is the most common form of preprocessing. Given an input dataset consisting of $M$ vectors $X(m) = (x_1(m),...,x_N(m)), m = 1,...,M$, it consists of subtracting the mean across each individual input component $x_i, 1\leq i\leq N$ such that

$$ x_i(m) \leftarrow x_i(m) - \frac{\sum_{s=1}^{M}x_i(s)}{M},\ \ 1\leq i\leq N, 1\le m\le M $$

This has the geometric interpretation of centering the data around the origin for each dimension as shown in the center illustration in Figure INI2. A color image consists of three channels of RGB pixels $(X^R(m),X^G(m),X^B(m)$)$. The following two techniques have been used to center images of this type:

  • Subtract the Mean Image: For CH = R, G, B,

$$ x^{CH}_{ij}(m) \leftarrow x^{CH}_{ij}(m) - \frac{\sum_{s=1}^{M}x_{ij}^{CH}(s)}{M},\ \ 1\leq i,j\leq N, 1\le m\le M $$

This equation assumes that each image consists three channels of NxN pixels. The mean value of each pixel is computed across the entire training set. This results in a "Mean Image" of size NxNx3 which is then subtracted from the individual pixel values.

  • Subtract the Per-Channel Mean:

$$ x^{CH}_{ij}(m) \leftarrow x^{CH}_{ij}(m) - \frac{\sum_{s=1}^M\sum_{i=1}^N\sum_{k=1}^N x_{ij}^{CH}(s)}{M},\ \ 1\leq i,j\leq N, 1\le m\le M $$

This is a simpler normalization technique in which a single mean value is computed for each channel, and then subtracted from the data in that channel.

Scaling: After the data has been centered, it can be scaled in one of two ways:

  • By dividing by the standard deviation, once again along each dimension, so that the overall transform is

$$ x_i(m) \leftarrow \frac{x_i(m) - - \frac{\sum_{s=1}^{M}x_i(s)}{M}}{\sigma_i},\ \ 1\leq i\leq N, 1\le m\le M $$

  • By Normalizing each dimension so that the min and max along each axis are -1 and +1 repectively.

In general Scaling helps optimization because it balances out the rate at which the weights connected to the input nodes learn. For image processing applications Scaling shows limited benefits, hence image pre-processing is limited to Centering.

In [28]:
#INI1
nb_setup.images_hconcat(["DL_images/INI1.png"], width=600)
Out[28]:

We end this section by giving an intuitive explanation of why the Centering operation helps to speed up convergence. Recall that for a K-ary Linear Classifier, the parameter update equation is given by:

$$ w_{kj} \leftarrow w_{kj} - \eta x_j(y_k-t_k),\ \ 1\le k\le K,\ \ 1\le j\le N $$

If the training sample is such that $t_q = 1$ and $t_k = 0, j\ne q$, then the update becomes:

$$ w_{qj} \leftarrow w_{qj} - \eta x_j(y_q-1) $$

and

$$ w_{kj} \leftarrow w_{kj} - \eta x_j(y_k),\ \ k\ne q $$

Lets assume that the input data is not centered so that $x_j\ge 0, j=1,...,N$. Since $0\le y_k\le 1$ it follows that $\Delta w_{kj} = -\eta x_jy_k <0, k\ne q$ and $\Delta w_{qj} = -\eta x_j(y_q - 1) > 0$, i.e. the update results in all the weights moving in the same direction, except for one. This is shown graphically in Figure INI1, in which the system is trying move in the direction of the blue arrow which is the quickest path to the minimum. However if the input data is not centered, then it is forced to move in a zig-zag fashion as shown in the red-curve.The zig-zag motion is caused due to the fact that all the parameters move in the same direction at each step due to the lact of zero-centering in the input data.

Batch Normalization

In [29]:
#BatchNormalization
nb_setup.images_hconcat(["DL_images/BatchNormalization.png"], width=600)
Out[29]:

Data Normalization at the input layer was described in the previous section as a way of transforming the data in order to speed up the optimization process. Since Normalization is so beneficial, why not extend it to the interior of the network and normalize all activations. This is precisely what is done in an algorithm called Batch Normalization Iofee and Szegedy (2015). This technique has led to a signficant improvement in the performance of DLN models, indeed the ResNet model that won the 2015 ILSVRC competition made extensive use of Batch Normalization, He, Zhang, Ren, Sun (2015).

It was a straightforward exercise to apply the Normalization operations to the input data, since the entire training data set is available at the start of the training process. This is not case with the hidden layer activations, since these values change over the course of the training due to the algorithm driven updates of system parameters. Ioffe and Szegady (2015) solved this problem by doing the normalization in batches (hence the name), such that during each batch the parameters remain fixed.

Batch Normalization proceeds as follows: As shown in Figure BatchNormalization, we introduce a new layer between the Fully Connected and the Activation layers. This layer is responsible for doing centering and scaling for the pre-activation sums $a_i^{(k)}$, before it is passed through the non-linearity.

In [30]:
#BN2
nb_setup.images_hconcat(["DL_images/BN2.png"], width=600)
Out[30]:
  1. Let $a(m),1\leq m\leq B$ denote the $m^{th}$ pre-activation value in a training batch of size B, where we have omitted the subscripts $i$ and $k$ for clarity. Then the algorithm computes the following values (see Figure BN2):

$$ \mu_B = \frac{1}{B}\sum_{m=1}^{B}a(m) $$

$$ \sigma_B^2 = \frac{1}{B}\sum_{m=1}^B (a(m)-\mu_B)^2 $$

$$ \hat{a}(m) = \frac{a(m)-\mu_B}{\sqrt{\sigma_B^2+\epsilon}} $$

$$c(m) = \gamma\hat{a}(m) + \beta $$

This is done on a Layer-by Layer basis so that the normalized outputs of layer $r$ are fed into layer $r+1$, which is then normalized.

  1. Run Backprop for each of the samples in the batch using the normalized activations. In addition to the weights, compute the gradients for the parameters $(\gamma,\beta)$ at each node.

  2. Average the gradients across the batch and use them to update the weight and batch normalization parameters.

  3. Go back to step 1 and repeat for the next batch

  4. During the test phase, once the network has been trained, the pre-activations are normalized using the mean and variance for the entire training set, rather than just the batch.

The algorithm introduces new parameters $\gamma$ and $\beta$ whose values are estimated using the backprop iteration. Note that at each layer, the algorithm processes all B vectors in the batch in parallel, so that the resulting pre-activation $c(m)$ is influenced not just by its own input $x(m)$, but also by the other inputs in the batch. This is the same as introducing noise into the system, since we are letting each classification decision be influenced by more than one piece of input data. In the next chapter we will see that this functions as a kind of Regularizer which improves the generalization performance of the system on the Test dataset.

Power of Batch Normalization

Much of the power of the Batch Normalization technique arises from the following: We first normalize the pre-activations so that they have zero mean and unit variance, but then in the very next step we re-introduce a mean and variance through the parameters $(\gamma,\beta)$. But note that these re-introduced values are actually learnt as part of the Gradient Descent algorithm, so they assume values that are more suited to the task of minimizing the loss function.

Batch Normalization has been shown to lead to several benefits, including:

  • It enables higher learning rates: In a non-normalized network, a large learning rate can lead to oscillations and cause the loss function increase rather than decrease. Batch Normalization helps prevents these problems by preventing small changes in the parameters from amplifying into larger and sub-optimal changes in activations and gradients. Higher learning rates in turn speed up the training process considerably.

  • It enables better Gradient Propagation through the network, thus enabling DLNs with more hidden layers.

  • It helps to reduce strong dependencies on the parameter initialization values.

  • It helps to regularize the model. Regularization is a process that we discuss in the next chapter and has to do with the ability of the model to generalize beyond its training set. Indeed experiments show that Batch Normalization reduces the need to use other regularization techniques such as Dropout (see Chapter ImprovingModelGeneralization).

Batch Normalization can be added to a Keras model in a straightforward manner, as illustrated in the model below. In this case we added Batch Normalization after the Activation Function. We can alternately add it before the Activation by adding a separate Activation Layer and positioning it after the Batch Normalization Layer. Both of these choices work equally well.

In [10]:
import keras
keras.__version__
from keras import models
from keras import layers

from keras.datasets import cifar10

(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

train_images = train_images.reshape((50000, 32 * 32 * 3))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 32 * 32 * 3))
test_images = test_images.astype('float32') / 255

from tensorflow.keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

network = models.Sequential()
network.add(layers.Dense(20, activation='relu', input_shape=(32 * 32 * 3,)))
network.add(layers.BatchNormalization())
network.add(layers.Dense(15, activation='relu'))
network.add(layers.BatchNormalization())
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='sgd',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

history = network.fit(train_images, train_labels, epochs=100, batch_size=128, validation_split=0.2)
Epoch 1/100
313/313 [==============================] - 2s 4ms/step - loss: 2.0454 - accuracy: 0.2574 - val_loss: 1.9718 - val_accuracy: 0.2729
Epoch 2/100
313/313 [==============================] - 1s 4ms/step - loss: 1.8770 - accuracy: 0.3261 - val_loss: 1.9238 - val_accuracy: 0.3127
Epoch 3/100
313/313 [==============================] - 1s 4ms/step - loss: 1.8176 - accuracy: 0.3535 - val_loss: 1.8546 - val_accuracy: 0.3422
Epoch 4/100
313/313 [==============================] - 1s 4ms/step - loss: 1.7829 - accuracy: 0.3654 - val_loss: 1.8335 - val_accuracy: 0.3443
Epoch 5/100
313/313 [==============================] - 1s 4ms/step - loss: 1.7565 - accuracy: 0.3751 - val_loss: 1.8010 - val_accuracy: 0.3578
Epoch 6/100
313/313 [==============================] - 1s 4ms/step - loss: 1.7341 - accuracy: 0.3850 - val_loss: 1.7895 - val_accuracy: 0.3656
Epoch 7/100
313/313 [==============================] - 1s 4ms/step - loss: 1.7149 - accuracy: 0.3937 - val_loss: 1.7603 - val_accuracy: 0.3775
Epoch 8/100
313/313 [==============================] - 1s 3ms/step - loss: 1.7002 - accuracy: 0.3989 - val_loss: 1.7942 - val_accuracy: 0.3682
Epoch 9/100
313/313 [==============================] - 1s 3ms/step - loss: 1.6880 - accuracy: 0.4019 - val_loss: 1.9262 - val_accuracy: 0.3396
Epoch 10/100
313/313 [==============================] - 1s 3ms/step - loss: 1.6795 - accuracy: 0.4033 - val_loss: 1.8280 - val_accuracy: 0.3561
Epoch 11/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6683 - accuracy: 0.4107 - val_loss: 1.7505 - val_accuracy: 0.3777
Epoch 12/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6593 - accuracy: 0.4139 - val_loss: 1.7184 - val_accuracy: 0.3999
Epoch 13/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6505 - accuracy: 0.4166 - val_loss: 1.7643 - val_accuracy: 0.3754
Epoch 14/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6457 - accuracy: 0.4180 - val_loss: 1.8116 - val_accuracy: 0.3517
Epoch 15/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6400 - accuracy: 0.4209 - val_loss: 2.0111 - val_accuracy: 0.3259
Epoch 16/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6329 - accuracy: 0.4210 - val_loss: 1.9907 - val_accuracy: 0.3190
Epoch 17/100
313/313 [==============================] - 1s 3ms/step - loss: 1.6275 - accuracy: 0.4254 - val_loss: 1.7388 - val_accuracy: 0.3819
Epoch 18/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6231 - accuracy: 0.4277 - val_loss: 1.6983 - val_accuracy: 0.4043
Epoch 19/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6190 - accuracy: 0.4267 - val_loss: 1.8123 - val_accuracy: 0.3542
Epoch 20/100
313/313 [==============================] - 1s 3ms/step - loss: 1.6133 - accuracy: 0.4291 - val_loss: 1.8137 - val_accuracy: 0.3578
Epoch 21/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6110 - accuracy: 0.4296 - val_loss: 1.7636 - val_accuracy: 0.3700
Epoch 22/100
313/313 [==============================] - 1s 4ms/step - loss: 1.6057 - accuracy: 0.4300 - val_loss: 1.7790 - val_accuracy: 0.3721
Epoch 23/100
313/313 [==============================] - 1s 3ms/step - loss: 1.6006 - accuracy: 0.4319 - val_loss: 1.7484 - val_accuracy: 0.3799
Epoch 24/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5997 - accuracy: 0.4325 - val_loss: 1.7328 - val_accuracy: 0.3922
Epoch 25/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5951 - accuracy: 0.4342 - val_loss: 1.7913 - val_accuracy: 0.3571
Epoch 26/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5916 - accuracy: 0.4343 - val_loss: 1.7165 - val_accuracy: 0.3930
Epoch 27/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5890 - accuracy: 0.4367 - val_loss: 1.7627 - val_accuracy: 0.3753
Epoch 28/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5884 - accuracy: 0.4364 - val_loss: 1.7135 - val_accuracy: 0.3969
Epoch 29/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5836 - accuracy: 0.4382 - val_loss: 1.8348 - val_accuracy: 0.3506
Epoch 30/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5812 - accuracy: 0.4394 - val_loss: 1.7350 - val_accuracy: 0.3776
Epoch 31/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5774 - accuracy: 0.4424 - val_loss: 1.7484 - val_accuracy: 0.3915
Epoch 32/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5778 - accuracy: 0.4394 - val_loss: 1.7454 - val_accuracy: 0.3839
Epoch 33/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5717 - accuracy: 0.4422 - val_loss: 1.8048 - val_accuracy: 0.3702
Epoch 34/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5707 - accuracy: 0.4443 - val_loss: 1.7770 - val_accuracy: 0.3782
Epoch 35/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5680 - accuracy: 0.4440 - val_loss: 1.7772 - val_accuracy: 0.3759
Epoch 36/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5651 - accuracy: 0.4441 - val_loss: 1.6759 - val_accuracy: 0.4102
Epoch 37/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5628 - accuracy: 0.4448 - val_loss: 1.7599 - val_accuracy: 0.3711
Epoch 38/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5627 - accuracy: 0.4453 - val_loss: 1.8165 - val_accuracy: 0.3633
Epoch 39/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5581 - accuracy: 0.4474 - val_loss: 1.9760 - val_accuracy: 0.3470
Epoch 40/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5551 - accuracy: 0.4468 - val_loss: 1.7694 - val_accuracy: 0.3703
Epoch 41/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5540 - accuracy: 0.4489 - val_loss: 1.9135 - val_accuracy: 0.3668
Epoch 42/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5531 - accuracy: 0.4498 - val_loss: 1.7214 - val_accuracy: 0.3894
Epoch 43/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5488 - accuracy: 0.4486 - val_loss: 1.7750 - val_accuracy: 0.3717
Epoch 44/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5452 - accuracy: 0.4523 - val_loss: 1.7677 - val_accuracy: 0.3785
Epoch 45/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5446 - accuracy: 0.4504 - val_loss: 1.7706 - val_accuracy: 0.3814
Epoch 46/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5425 - accuracy: 0.4505 - val_loss: 1.8473 - val_accuracy: 0.3483
Epoch 47/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5409 - accuracy: 0.4525 - val_loss: 1.7036 - val_accuracy: 0.4026
Epoch 48/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5393 - accuracy: 0.4524 - val_loss: 1.7251 - val_accuracy: 0.3926
Epoch 49/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5398 - accuracy: 0.4516 - val_loss: 1.7921 - val_accuracy: 0.3752
Epoch 50/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5366 - accuracy: 0.4548 - val_loss: 1.8633 - val_accuracy: 0.3531
Epoch 51/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5348 - accuracy: 0.4532 - val_loss: 1.8452 - val_accuracy: 0.3697
Epoch 52/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5311 - accuracy: 0.4579 - val_loss: 1.7058 - val_accuracy: 0.3991
Epoch 53/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5301 - accuracy: 0.4550 - val_loss: 1.7616 - val_accuracy: 0.3774
Epoch 54/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5283 - accuracy: 0.4563 - val_loss: 1.8156 - val_accuracy: 0.3608
Epoch 55/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5259 - accuracy: 0.4575 - val_loss: 1.7593 - val_accuracy: 0.3869
Epoch 56/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5246 - accuracy: 0.4557 - val_loss: 1.7336 - val_accuracy: 0.3894
Epoch 57/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5248 - accuracy: 0.4574 - val_loss: 1.7164 - val_accuracy: 0.3967
Epoch 58/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5235 - accuracy: 0.4581 - val_loss: 1.7852 - val_accuracy: 0.3741
Epoch 59/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5191 - accuracy: 0.4592 - val_loss: 1.7967 - val_accuracy: 0.3838
Epoch 60/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5198 - accuracy: 0.4613 - val_loss: 1.7796 - val_accuracy: 0.3774
Epoch 61/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5162 - accuracy: 0.4597 - val_loss: 1.7670 - val_accuracy: 0.3762
Epoch 62/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5149 - accuracy: 0.4606 - val_loss: 1.7057 - val_accuracy: 0.4054
Epoch 63/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5153 - accuracy: 0.4633 - val_loss: 1.7186 - val_accuracy: 0.3981
Epoch 64/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5161 - accuracy: 0.4607 - val_loss: 1.9725 - val_accuracy: 0.3359
Epoch 65/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5139 - accuracy: 0.4604 - val_loss: 1.7640 - val_accuracy: 0.3780
Epoch 66/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5105 - accuracy: 0.4645 - val_loss: 1.6931 - val_accuracy: 0.4051
Epoch 67/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5085 - accuracy: 0.4630 - val_loss: 1.6776 - val_accuracy: 0.4089
Epoch 68/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5078 - accuracy: 0.4633 - val_loss: 1.8668 - val_accuracy: 0.3601
Epoch 69/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5081 - accuracy: 0.4658 - val_loss: 1.8240 - val_accuracy: 0.3688
Epoch 70/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5076 - accuracy: 0.4616 - val_loss: 1.9045 - val_accuracy: 0.3500
Epoch 71/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5050 - accuracy: 0.4647 - val_loss: 1.7008 - val_accuracy: 0.4092
Epoch 72/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5026 - accuracy: 0.4651 - val_loss: 1.9768 - val_accuracy: 0.3190
Epoch 73/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5044 - accuracy: 0.4635 - val_loss: 1.9276 - val_accuracy: 0.3519
Epoch 74/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5020 - accuracy: 0.4665 - val_loss: 1.6845 - val_accuracy: 0.3986
Epoch 75/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5011 - accuracy: 0.4656 - val_loss: 1.7275 - val_accuracy: 0.3913
Epoch 76/100
313/313 [==============================] - 1s 4ms/step - loss: 1.5011 - accuracy: 0.4643 - val_loss: 1.6934 - val_accuracy: 0.4008
Epoch 77/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4957 - accuracy: 0.4695 - val_loss: 1.7335 - val_accuracy: 0.3902
Epoch 78/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4969 - accuracy: 0.4672 - val_loss: 1.9356 - val_accuracy: 0.3511
Epoch 79/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4972 - accuracy: 0.4681 - val_loss: 1.6760 - val_accuracy: 0.4110
Epoch 80/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4938 - accuracy: 0.4670 - val_loss: 1.8073 - val_accuracy: 0.3904
Epoch 81/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4928 - accuracy: 0.4682 - val_loss: 1.8726 - val_accuracy: 0.3560
Epoch 82/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4934 - accuracy: 0.4681 - val_loss: 1.7440 - val_accuracy: 0.3909
Epoch 83/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4920 - accuracy: 0.4702 - val_loss: 1.6982 - val_accuracy: 0.4038
Epoch 84/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4884 - accuracy: 0.4689 - val_loss: 1.7666 - val_accuracy: 0.3886
Epoch 85/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4881 - accuracy: 0.4700 - val_loss: 1.8335 - val_accuracy: 0.3716
Epoch 86/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4892 - accuracy: 0.4706 - val_loss: 1.7214 - val_accuracy: 0.4012
Epoch 87/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4864 - accuracy: 0.4705 - val_loss: 1.8549 - val_accuracy: 0.3719
Epoch 88/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4879 - accuracy: 0.4702 - val_loss: 1.8209 - val_accuracy: 0.3651
Epoch 89/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4853 - accuracy: 0.4714 - val_loss: 2.0016 - val_accuracy: 0.3473
Epoch 90/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4865 - accuracy: 0.4703 - val_loss: 1.9332 - val_accuracy: 0.3662
Epoch 91/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4838 - accuracy: 0.4706 - val_loss: 1.6676 - val_accuracy: 0.4213
Epoch 92/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4865 - accuracy: 0.4697 - val_loss: 1.7121 - val_accuracy: 0.4051
Epoch 93/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4822 - accuracy: 0.4716 - val_loss: 2.0031 - val_accuracy: 0.3270
Epoch 94/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4820 - accuracy: 0.4711 - val_loss: 1.8442 - val_accuracy: 0.3707
Epoch 95/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4812 - accuracy: 0.4728 - val_loss: 1.8203 - val_accuracy: 0.3711
Epoch 96/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4821 - accuracy: 0.4706 - val_loss: 1.8450 - val_accuracy: 0.3650
Epoch 97/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4779 - accuracy: 0.4735 - val_loss: 1.8016 - val_accuracy: 0.3794
Epoch 98/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4794 - accuracy: 0.4726 - val_loss: 1.9037 - val_accuracy: 0.3542
Epoch 99/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4763 - accuracy: 0.4745 - val_loss: 1.8292 - val_accuracy: 0.3685
Epoch 100/100
313/313 [==============================] - 1s 4ms/step - loss: 1.4763 - accuracy: 0.4735 - val_loss: 1.8494 - val_accuracy: 0.3561

Layer Normalization

Layer Normalization was introduced by [Ba, Kiros, Hinton] (2016) (https://arxiv.org/pdf/1607.06450.pdf), as an alternative to Batch Normalization. In contrast to Batch Normalization, Layer Normalization works by averaging over the elements of a Hidden Layer, for a single batch. This means that Layer Normalization works even for the case of Batch Size of one. The equations for Layer Normalization are as follows:

$$ \mu_H = \frac{1}{H}\sum_{m=1}^{H}a(m) $$

$$ \sigma_H^2 = \frac{1}{H}\sum_{m=1}^H (a(m)-\mu_H)^2 $$

$$ \hat{a}(m) = \frac{a(m)-\mu_H}{\sqrt{\sigma_H^2+\epsilon}} $$

$$c(m) = \gamma\hat{a}(m) + \beta $$

Note that in these equations H is the number of nodes in the hidden layer. The authors showed that Layer Normalization works better than Batch Normalization for Dense Feed Forward and Recurrent Networks. More recently Layer Normalization has founf widespread use as the normalization method of choice for Transformer Networks.

The Vanishing Gradient Problem

In [32]:
#VG1
nb_setup.images_hconcat(["DL_images/VG1.png"], width=600)
Out[32]:

We mentioned at the start of this chapter that there was a twenty year gap between the time the Backprop algorithm was discovered, and when DLNs entered the mainstream. Most of this delay can be attributed to a problem called Vanishing Gradients. The issues that caused this problem were gradually recognized and addressed by the techniques and algorithms described in this chapter.

The main reason for the prevalance of Vanishing Gradients in early DLNs was a combination of bad initializations and the use of sigmoid functions for activations.

The importance of proper parameter initializations was not appreciated in the early DLNs, as a result they were frequently initialized using a Normal distribution with mean 0 and small variance, such as $N(0,0.01)$. Figure VG1 plots the mean, variance and the distributions for the activations at each of the layers in the DLN. The first layer shows a healthy distribution, however we progress deeper into the network, the variance rapidly falls to zero as does the mean. This is mainly due to the fact that the weights are very small, and successive layers keep multiplying them over and over again. Since the gradient with respect to the weights is given by the product of the activations and $\delta$ (= $\frac{\partial L}{\partial a}$)

$$ \frac{\partial L}{\partial w} = z\delta $$

it follows that the gradients will also go to zero. If we were to initialize the weights using a large variance such as $N(0,1)$, this did not solve the problem either, since it causes the sigmoid activation to go into saturation, this killing the gradients $\delta$ (as explained in Section \@ref(ActivationLossFunctions)) and causing $\frac{\partial L}{\partial w}$ to once again go to zero.

In [33]:
#VG2
nb_setup.images_hconcat(["DL_images/VG2.png"], width=600)
Out[33]:

Figure VG2 shows the graphs for the activation function statistics in a well functioning DLN which incorporates all these techniques, and it canbe seen that the activation distributions even in the deeper layers have a good spread, which shows that the network is continuing to learn from new training data.

The discovery of the initialization rules described in Section InitializingWeights combined with replacement of sigmoids by ReLU (as well as Data Preprocessing describes in Section DataPreprocessing) all helped to put the Vanishing Gradient Problem to rest, at least for Dense Feed Forward Networks with approxoimately 20-30 layers or less. In order to build networks with hundreds of layers, a newer architectural advance, called Residual Connections, had to be put in place. This is described in the Chapter of Convolutional Neural Networks.

References and Slides