In the previous article, we looked at how to construct a very simple artificial neural network to model our chicken-or-beef meal decision. However, this is overly simplistic for most 'real-world' problems. In this final article, we'll look at how to extend this basic network, and how to deal with that ever-present bugbear in machine learning: over-fitting.
The multilayer perceptron (MLP) model we constructed in the previous article was very simple, with a small number of layers. Each layer focuses on applying a certain function on a set of features and can be considered as a very abstract filter. Each successive layer in a feed-forward network will look at approximations (representations) of features with higher and higher levels of abstraction. This ability to increase the abstraction level of features is the key to the amazing power and results seen with machine learning models based on ANNs.
Once we increase the number of layers beyond the input and output, the layers no longer have access to the outside world, and are hence hidden. We may add a hidden layer by simply repeating most of the initial part:
model1 = Sequential() # 1st (Input) layer
model1.add(Dense(16, activation='relu', input_shape=(4,)))
Add the 2nd (hidden) layer. This is an internalization of features that are not seen externally to the model:
model1.add(Dense(16, activation='relu'))
And retain the remainder (the output layer):
# 3rd (Output) layer
model1.add(Dense(num_classes, activation='softmax')) model1.summary() model1.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy']) history1 = model1.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test)) score1 = model1.evaluate(x_test, y_test, verbose=0) print('Test loss: {0}, Test accuracy: {1}'.format(score1[0], score1[1]))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 16) 80
_________________________________________________________________
dense_3 (Dense) (None, 16) 272
_________________________________________________________________
dense_4 (Dense) (None, 2) 34
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Train on 7500 samples, validate on 7500 samples
Epoch 1/7
7500/7500 [==============================] - 0s 61us/step - loss: 0.4846 - acc: 0.7465 - val_loss: 0.3742 - val_acc: 0.8861
Epoch 2/7
7500/7500 [==============================] - 0s 28us/step - loss: 0.3175 - acc: 0.9235 - val_loss: 0.2656 - val_acc: 0.9111
Epoch 3/7
7500/7500 [==============================] - 0s 30us/step - loss: 0.2274 - acc: 0.9244 - val_loss: 0.1892 - val_acc: 0.9479
Epoch 4/7
7500/7500 [==============================] - 0s 35us/step - loss: 0.1570 - acc: 0.9612 - val_loss: 0.1258 - val_acc: 0.9743
Epoch 5/7
7500/7500 [==============================] - 0s 25us/step - loss: 0.1054 - acc: 0.9807 - val_loss: 0.0846 - val_acc: 0.9976
Epoch 6/7
7500/7500 [==============================] - 0s 33us/step - loss: 0.0734 - acc: 0.9911 - val_loss: 0.0601 - val_acc: 0.9953
Epoch 7/7
7500/7500 [==============================] - 0s 32us/step - loss: 0.0551 - acc: 0.9929 - val_loss: 0.0464 - val_acc: 0.9972
Test loss: 0.04642097859084606, Test accuracy: 0.9972
Adding a layer usually decreases the validation (test) loss dramatically and improves the accuracy. Adding this layer increased the network's depth. As the number of hidden layers increase, the network becomes deeper; this is what is referred to as Deep Learning.
Over-fitting is an ever-present issue with machine learning models. One means of reducing over-fitting is to induce network dropout. This involves selecting a subset of the model inputs at random during each training phase. This is simply done in Keras, setting the rate parameter:
keras.layers.Dropout(0.2) # Induces a 20% drop-out rate
<tensorflow.python.keras.layers.core.Dropout at 0x7f89f158de80>
We can add drop-out to the first two layers of our MLP:
model2 = Sequential()
# 1st (Input) layer
model2.add(Dense(16, activation='relu', input_shape=(4,)))
model2.add(Dropout(0.1)) # 2nd (Hidden) layer
model2.add(Dense(16, activation='relu'))
model2.add(Dropout(0.1)) # 3rd (Output) layer
model2.add(Dense(num_classes, activation='softmax')) model2.summary() model2.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy']) history2 = model2.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test)) score2 = model2.evaluate(x_test, y_test, verbose=0) print('Test loss: {0}, Test accuracy: {1}'.format(score2[0], score2[1]))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_5 (Dense) (None, 16) 80
_________________________________________________________________
dropout (Dropout) (None, 16) 0
_________________________________________________________________
dense_6 (Dense) (None, 16) 272
_________________________________________________________________
dropout_1 (Dropout) (None, 16) 0
_________________________________________________________________
dense_7 (Dense) (None, 2) 34
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Train on 7500 samples, validate on 7500 samples
Epoch 1/7
7500/7500 [==============================] - 1s 131us/step - loss: 0.4367 - acc: 0.8077 - val_loss: 0.3428 - val_acc: 0.9029
Epoch 2/7 7500/7500 [==============================] - 0s 47us/step - loss: 0.3088 - acc: 0.9161 - val_loss: 0.2523 - val_acc: 0.9011 Epoch 3/7 7500/7500 [==============================] - 0s 32us/step - loss: 0.2225 - acc: 0.9349 - val_loss: 0.1703 - val_acc: 0.9468 Epoch 4/7 7500/7500 [==============================] - 0s 34us/step - loss: 0.1569 - acc: 0.9559 - val_loss: 0.1156 - val_acc: 0.9693 Epoch 5/7 7500/7500 [==============================] - 0s 32us/step - loss: 0.1165 - acc: 0.9619 - val_loss: 0.0806 - val_acc: 0.9852 Epoch 6/7 7500/7500 [==============================] - 0s 36us/step - loss: 0.0941 - acc: 0.9693 - val_loss: 0.0607 - val_acc: 0.9964 Epoch 7/7 7500/7500 [==============================] - 0s 31us/step - loss: 0.0761 - acc: 0.9728 - val_loss: 0.0496 - val_acc: 0.9968 Test loss: 0.049639763354261714, Test accuracy: 0.9968
Note the width on the above models is 16 neurons. This is not many! Let's increase this to 512 for the non-output layers:
# Base model
model3 = Sequential()
# 1st (Input) layer
model3.add(Dense(512, activation='relu', input_shape=(4,)))
model3.add(Dropout(0.2)) # Increased the drop-out rate
# 2nd (Hidden) layer model3.add(Dense(512, activation='relu')) model3.add(Dropout(0.2)) # 3rd (Output) layer model3.add(Dense(num_classes, activation='softmax')) print("Model 3 summary: {0}".format(model3.summary())) model3.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy']) history3 = model3.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) score3 = model3.evaluate(x_test, y_test, verbose=0) rint('Test loss: {0}, Test accuracy: {1}'.format(score3[0], score3[1])) _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_8 (Dense) (None, 512) 2560 _________________________________________________________________ dropout_2 (Dropout) (None, 512) 0 _________________________________________________________________ dense_9 (Dense) (None, 512) 262656 _________________________________________________________________ dropout_3 (Dropout) (None, 512) 0 _________________________________________________________________ dense_10 (Dense) (None, 2) 1026 ================================================================= Total params: 266,242 Trainable params: 266,242 Non-trainable params: 0 _________________________________________________________________ Model 3 summary: None Train on 7500 samples, validate on 7500 samples Epoch 1/7 7500/7500 [==============================] - 3s 359us/step - loss: 0.1441 - acc: 0.9433 - val_loss: 0.0370 - val_acc: 0.9949 Epoch 2/7 7500/7500 [==============================] - 2s 325us/step - loss: 0.0422 - acc: 0.9824 - val_loss: 0.0272 - val_acc: 0.9900 Epoch 3/7 7500/7500 [==============================] - 3s 341us/step - loss: 0.0352 - acc: 0.9852 - val_loss: 0.0171 - val_acc: 0.9969 Epoch 4/7 7500/7500 [==============================] - 2s 292us/step - loss: 0.0277 - acc: 0.9883 - val_loss: 0.0145 - val_acc: 0.9960 Epoch 5/7 7500/7500 [==============================] - 2s 269us/step - loss: 0.0262 - acc: 0.9893 - val_loss: 0.0129 - val_acc: 0.9967 Epoch 6/7 7500/7500 [==============================] - 2s 289us/step - loss: 0.0243 - acc: 0.9889 - val_loss: 0.0117 - val_acc: 0.9979 Epoch 7/7 7500/7500 [==============================] - 2s 283us/step - loss: 0.0206 - acc: 0.9917 - val_loss: 0.0101 - val_acc: 0.9971 Test loss: 0.010115630360713597, Test accuracy: 0.9970666666666667
So, increasing the model's depth improved the validation accuracy and reduced the error.
It is interesting to see how the training and validation errors of each of these models improves with each epoch:
A great pictorial summary of the various architectures may be found in Fjodor van Veen's article, "The Neural Network Zoo": http://www.asimovinstitute.org/neural-network-zoo/
Because ANNs are 'universal approximators,' as well as based on a large number of small, simple, units, they are great where:
They are not so great because:
This was a very brief introduction to the field of artificial neural networks (ANNs) and Deep Learning!
We have examined the theoretical justification for (ANNs), demonstrating that they are great 'universal approximators'. We also covered their use-cases and some of their pitfalls.
We also had a brief introduction to TensorFlow and Keras. We built a feed-forward network (a Multi-Layer Perceptron; MLP) to approximate complex functions. We 'tweaked' this model, improving the output, and evaluated its performance. In order to determine this, we also covered the concepts of appropriate activation functions, optimizers and cost functions.
Perhaps most importantly: we have figured out how to be satisfied with a simple meal at an altitude of 30,000 feet. People have been trying to achieve this for years!
If this piques your interest in the subject, this article is a modification of a teaching module we use in our Machine Learning and Deep Learning classes here at Accelebrate.
There are plenty of great resources available to learn about artificial neural networks, from the deeply theoretical to the immensely practical. Here is a brief selection of some suggested resources to learn more on this popular and useful topic.
Websites:
YouTube channels:
Platforms:
Written by Ra Inta
We offer private, customized training for 3 or more people at your site or online.
Our live, instructor-led lectures are far more effective than pre-recorded classes
If your team is not 100% satisfied with your training, we do what's necessary to make it right
Whether you are at home or in the office, we make learning interactive and engaging
We accept check, ACH/EFT, major credit cards, and most purchase orders
Alabama
Birmingham
Huntsville
Montgomery
Alaska
Anchorage
Arizona
Phoenix
Tucson
Arkansas
Fayetteville
Little Rock
California
Los Angeles
Oakland
Orange County
Sacramento
San Diego
San Francisco
San Jose
Colorado
Boulder
Colorado Springs
Denver
Connecticut
Hartford
DC
Washington
Florida
Fort Lauderdale
Jacksonville
Miami
Orlando
Tampa
Georgia
Atlanta
Augusta
Savannah
Hawaii
Honolulu
Idaho
Boise
Illinois
Chicago
Indiana
Indianapolis
Iowa
Cedar Rapids
Des Moines
Kansas
Wichita
Kentucky
Lexington
Louisville
Louisiana
New Orleans
Maine
Portland
Maryland
Annapolis
Baltimore
Frederick
Hagerstown
Massachusetts
Boston
Cambridge
Springfield
Michigan
Ann Arbor
Detroit
Grand Rapids
Minnesota
Minneapolis
Saint Paul
Mississippi
Jackson
Missouri
Kansas City
St. Louis
Nebraska
Lincoln
Omaha
Nevada
Las Vegas
Reno
New Jersey
Princeton
New Mexico
Albuquerque
New York
Albany
Buffalo
New York City
White Plains
North Carolina
Charlotte
Durham
Raleigh
Ohio
Akron
Canton
Cincinnati
Cleveland
Columbus
Dayton
Oklahoma
Oklahoma City
Tulsa
Oregon
Portland
Pennsylvania
Philadelphia
Pittsburgh
Rhode Island
Providence
South Carolina
Charleston
Columbia
Greenville
Tennessee
Knoxville
Memphis
Nashville
Texas
Austin
Dallas
El Paso
Houston
San Antonio
Utah
Salt Lake City
Virginia
Alexandria
Arlington
Norfolk
Richmond
Washington
Seattle
Tacoma
West Virginia
Charleston
Wisconsin
Madison
Milwaukee
Alberta
Calgary
Edmonton
British Columbia
Vancouver
Manitoba
Winnipeg
Nova Scotia
Halifax
Ontario
Ottawa
Toronto
Quebec
Montreal
Puerto Rico
San Juan