The Chicken or the Beef? Why Everyone Loves Neural Networks (Part III, Improving the Output)

April 29, 2019 in Python Articles

Written by Ra Inta


Constructing a Conceptual Artificial Neural Network (continued)

In the previous article, we looked at how to construct a very simple artificial neural network to model our chicken-or-beef meal decision. However, this is overly simplistic for most 'real-world' problems. In this final article, we'll look at how to extend this basic network, and how to deal with that ever-present bugbear in machine learning: over-fitting.

Adding Hidden Layers

The multilayer perceptron (MLP) model we constructed in the previous article was very simple, with a small number of layers. Each layer focuses on applying a certain function on a set of features and can be considered as a very abstract filter. Each successive layer in a feed-forward network will look at approximations (representations) of features with higher and higher levels of abstraction. This ability to increase the abstraction level of features is the key to the amazing power and results seen with machine learning models based on ANNs.

Once we increase the number of layers beyond the input and output, the layers no longer have access to the outside world, and are hence hidden. We may add a hidden layer by simply repeating most of the initial part:

model1 = Sequential()

# 1st (Input) layer 
model1.add(Dense(16, activation='relu', input_shape=(4,)))

Add the 2nd (hidden) layer. This is an internalization of features that are not seen externally to the model:

model1.add(Dense(16, activation='relu')) 

And retain the remainder (the output layer):

# 3rd (Output) layer
model1.add(Dense(num_classes, activation='softmax')) model1.summary() model1.compile(loss='categorical_crossentropy',
            optimizer=RMSprop(),
             metrics=['accuracy']) history1 = model1.fit(x_train, y_train,
             batch_size=batch_size,
             epochs=epochs,
             verbose=1,
             validation_data=(x_test, y_test)) score1 = model1.evaluate(x_test, y_test, verbose=0) print('Test loss: {0},      Test accuracy: {1}'.format(score1[0], score1[1]))
_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
dense_2 (Dense)              (None, 16)                80       
_________________________________________________________________
dense_3 (Dense)              (None, 16)                272      
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 34       
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Train on 7500 samples, validate on 7500 samples
Epoch 1/7
7500/7500 [==============================] - 0s 61us/step - loss: 0.4846 - acc: 0.7465 - val_loss: 0.3742 - val_acc: 0.8861
Epoch 2/7
7500/7500 [==============================] - 0s 28us/step - loss: 0.3175 - acc: 0.9235 - val_loss: 0.2656 - val_acc: 0.9111
Epoch 3/7
7500/7500 [==============================] - 0s 30us/step - loss: 0.2274 - acc: 0.9244 - val_loss: 0.1892 - val_acc: 0.9479
Epoch 4/7
7500/7500 [==============================] - 0s 35us/step - loss: 0.1570 - acc: 0.9612 - val_loss: 0.1258 - val_acc: 0.9743
Epoch 5/7
7500/7500 [==============================] - 0s 25us/step - loss: 0.1054 - acc: 0.9807 - val_loss: 0.0846 - val_acc: 0.9976
Epoch 6/7
7500/7500 [==============================] - 0s 33us/step - loss: 0.0734 - acc: 0.9911 - val_loss: 0.0601 - val_acc: 0.9953
Epoch 7/7
7500/7500 [==============================] - 0s 32us/step - loss: 0.0551 - acc: 0.9929 - val_loss: 0.0464 - val_acc: 0.9972
Test loss: 0.04642097859084606,      Test accuracy: 0.9972

Adding a layer usually decreases the validation (test) loss dramatically and improves the accuracy. Adding this layer increased the network's depth. As the number of hidden layers increase, the network becomes deeper; this is what is referred to as Deep Learning.

Network Dropout

Over-fitting is an ever-present issue with machine learning models. One means of reducing over-fitting is to induce network dropout. This involves selecting a subset of the model inputs at random during each training phase. This is simply done in Keras, setting the rate parameter:
keras.layers.Dropout(0.2)  # Induces a 20% drop-out rate

<tensorflow.python.keras.layers.core.Dropout at 0x7f89f158de80>

We can add drop-out to the first two layers of our MLP:

model2 = Sequential()
# 1st (Input) layer
model2.add(Dense(16, activation='relu', input_shape=(4,)))
model2.add(Dropout(0.1)) # 2nd (Hidden) layer
model2.add(Dense(16, activation='relu'))
model2.add(Dropout(0.1)) # 3rd (Output) layer
model2.add(Dense(num_classes, activation='softmax')) model2.summary() model2.compile(loss='categorical_crossentropy',
             optimizer=RMSprop(),
              metrics=['accuracy']) history2 = model2.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test)) score2 = model2.evaluate(x_test, y_test, verbose=0) print('Test loss: {0},      Test accuracy: {1}'.format(score2[0], score2[1]))
_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
dense_5 (Dense)              (None, 16)                80       
_________________________________________________________________
dropout (Dropout)            (None, 16)                0        
_________________________________________________________________
dense_6 (Dense)              (None, 16)                272      
_________________________________________________________________
dropout_1 (Dropout)          (None, 16)                0        
_________________________________________________________________
dense_7 (Dense)              (None, 2)                 34       
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Train on 7500 samples, validate on 7500 samples
Epoch 1/7
7500/7500 [==============================] - 1s 131us/step - loss: 0.4367 - acc: 0.8077 - val_loss: 0.3428 - val_acc: 0.9029
Epoch 2/7 7500/7500 [==============================] - 0s 47us/step - loss: 0.3088 - acc: 0.9161 - val_loss: 0.2523 - val_acc: 0.9011 Epoch 3/7 7500/7500 [==============================] - 0s 32us/step - loss: 0.2225 - acc: 0.9349 - val_loss: 0.1703 - val_acc: 0.9468 Epoch 4/7 7500/7500 [==============================] - 0s 34us/step - loss: 0.1569 - acc: 0.9559 - val_loss: 0.1156 - val_acc: 0.9693 Epoch 5/7 7500/7500 [==============================] - 0s 32us/step - loss: 0.1165 - acc: 0.9619 - val_loss: 0.0806 - val_acc: 0.9852 Epoch 6/7 7500/7500 [==============================] - 0s 36us/step - loss: 0.0941 - acc: 0.9693 - val_loss: 0.0607 - val_acc: 0.9964 Epoch 7/7 7500/7500 [==============================] - 0s 31us/step - loss: 0.0761 - acc: 0.9728 - val_loss: 0.0496 - val_acc: 0.9968 Test loss: 0.049639763354261714,      Test accuracy: 0.9968

Increasing the Width of the Model

Note the width on the above models is 16 neurons. This is not many! Let's increase this to 512 for the non-output layers:

# Base model
model3 = Sequential()
# 1st (Input) layer
model3.add(Dense(512, activation='relu', input_shape=(4,)))
model3.add(Dropout(0.2))  # Increased the drop-out rate
# 2nd (Hidden) layer
model3.add(Dense(512, activation='relu')) model3.add(Dropout(0.2)) # 3rd (Output) layer model3.add(Dense(num_classes, activation='softmax')) print("Model 3 summary: {0}".format(model3.summary())) model3.compile(loss='categorical_crossentropy',           optimizer=RMSprop(),           metrics=['accuracy']) history3 = model3.fit(x_train, y_train,               batch_size=batch_size,               epochs=epochs,               verbose=1,               validation_data=(x_test, y_test)) score3 = model3.evaluate(x_test, y_test, verbose=0) rint('Test loss: {0},      Test accuracy: {1}'.format(score3[0], score3[1])) _________________________________________________________________ Layer (type)                 Output Shape              Param #   ================================================================= dense_8 (Dense)              (None, 512)               2560      _________________________________________________________________ dropout_2 (Dropout)          (None, 512)               0         _________________________________________________________________ dense_9 (Dense)              (None, 512)               262656    _________________________________________________________________ dropout_3 (Dropout)          (None, 512)               0         _________________________________________________________________ dense_10 (Dense)             (None, 2)                 1026      ================================================================= Total params: 266,242 Trainable params: 266,242 Non-trainable params: 0 _________________________________________________________________ Model 3 summary: None Train on 7500 samples, validate on 7500 samples Epoch 1/7 7500/7500 [==============================] - 3s 359us/step - loss: 0.1441 - acc: 0.9433 - val_loss: 0.0370 - val_acc: 0.9949 Epoch 2/7 7500/7500 [==============================] - 2s 325us/step - loss: 0.0422 - acc: 0.9824 - val_loss: 0.0272 - val_acc: 0.9900 Epoch 3/7 7500/7500 [==============================] - 3s 341us/step - loss: 0.0352 - acc: 0.9852 - val_loss: 0.0171 - val_acc: 0.9969 Epoch 4/7 7500/7500 [==============================] - 2s 292us/step - loss: 0.0277 - acc: 0.9883 - val_loss: 0.0145 - val_acc: 0.9960 Epoch 5/7 7500/7500 [==============================] - 2s 269us/step - loss: 0.0262 - acc: 0.9893 - val_loss: 0.0129 - val_acc: 0.9967 Epoch 6/7 7500/7500 [==============================] - 2s 289us/step - loss: 0.0243 - acc: 0.9889 - val_loss: 0.0117 - val_acc: 0.9979 Epoch 7/7 7500/7500 [==============================] - 2s 283us/step - loss: 0.0206 - acc: 0.9917 - val_loss: 0.0101 - val_acc: 0.9971 Test loss: 0.010115630360713597,      Test accuracy: 0.9970666666666667

So, increasing the model's depth improved the validation accuracy and reduced the error.

It is interesting to see how the training and validation errors of each of these models improves with each epoch:

An Overview of Network Architectures

A great pictorial summary of the various architectures may be found in Fjodor van Veen's article, "The Neural Network Zoo": http://www.asimovinstitute.org/neural-network-zoo/

Appropriate Applications of Artificial Neural Networks

Because ANNs are 'universal approximators,' as well as based on a large number of small, simple, units, they are great where:

  • The relationships between variables are poorly understood or analytically complex
  • There is a lot of data

They are not so great because:

  • Principal features are not explicitly apparent; decisions are opaque for deep networks
  • They can be slow to train, requiring several epochs

Conclusion

This was a very brief introduction to the field of artificial neural networks (ANNs) and Deep Learning!

We have examined the theoretical justification for (ANNs), demonstrating that they are great 'universal approximators'. We also covered their use-cases and some of their pitfalls.

We also had a brief introduction to TensorFlow and Keras. We built a feed-forward network (a Multi-Layer Perceptron; MLP) to approximate complex functions. We 'tweaked' this model, improving the output, and evaluated its performance. In order to determine this, we also covered the concepts of appropriate activation functions, optimizers and cost functions.

Perhaps most importantly: we have figured out how to be satisfied with a simple meal at an altitude of 30,000 feet. People have been trying to achieve this for years!

If this piques your interest in the subject, this article is a modification of a teaching module we use in our Machine Learning and Deep Learning classes here at Accelebrate.

Additional Resources

There are plenty of great resources available to learn about artificial neural networks, from the deeply theoretical to the immensely practical. Here is a brief selection of some suggested resources to learn more on this popular and useful topic.

Websites:

YouTube channels:

Platforms:


Written by Ra Inta

Ra Inta

Ra is originally from New Zealand, has a PhD in physics, is a data scientist, and has taught for Accelebrate in the US and in Africa. His specialties are R Programming and Python.
  



Contact Us:

Accelebrate’s training classes are available for private groups of 3 or more people at your site or online anywhere worldwide.

Don't settle for a "one size fits all" public class! Have Accelebrate deliver exactly the training you want, privately at your site or online, for less than the cost of a public class.

For pricing and to learn more, please contact us.

Contact Us Train For Us

Toll-free in US/Canada:
877 849 1850
International:
+1 678 648 3113

Toll-free in US/Canada:
866 566 1228
International:
+1 404 420 2491

925B Peachtree Street, NE
PMB 378
Atlanta, GA 30309-3918
USA

Subscribe to our Newsletter:

Never miss the latest news and information from Accelebrate:

Microsoft Gold Partner

Please see our complete list of
Microsoft Official Courses

Recent Training Locations

Alabama

Huntsville

Montgomery

Birmingham

Alaska

Anchorage

Arizona

Phoenix

Tucson

Arkansas

Fayetteville

Little Rock

California

San Francisco

Oakland

San Jose

Orange County

Los Angeles

Sacramento

San Diego

Colorado

Denver

Boulder

Colorado Springs

Connecticut

Hartford

DC

Washington

Florida

Fort Lauderdale

Miami

Jacksonville

Orlando

Saint Petersburg

Tampa

Georgia

Atlanta

Augusta

Savannah

Idaho

Boise

Illinois

Chicago

Indiana

Indianapolis

Iowa

Ceder Rapids

Des Moines

Kansas

Wichita

Kentucky

Lexington

Louisville

Louisiana

Banton Rouge

New Orleans

Maine

Portland

Maryland

Annapolis

Baltimore

Hagerstown

Frederick

Massachusetts

Springfield

Boston

Cambridge

Michigan

Ann Arbor

Detroit

Grand Rapids

Minnesota

Saint Paul

Minneapolis

Mississippi

Jackson

Missouri

Kansas City

St. Louis

Nebraska

Lincoln

Omaha

Nevada

Reno

Las Vegas

New Jersey

Princeton

New Mexico

Albuquerque

New York

Buffalo

Albany

White Plains

New York City

North Carolina

Charlotte

Durham

Raleigh

Ohio

Canton

Akron

Cincinnati

Cleveland

Columbus

Dayton

Oklahoma

Tulsa

Oklahoma City

Oregon

Portland

Pennsylvania

Pittsburgh

Philadelphia

Rhode Island

Providence

South Carolina

Columbia

Charleston

Spartanburg

Greenville

Tennessee

Memphis

Nashville

Knoxville

Texas

Dallas

El Paso

Houston

San Antonio

Austin

Utah

Salt Lake City

Virginia

Richmond

Alexandria

Arlington

Washington

Tacoma

Seattle

West Virginia

Charleston

Wisconsin

Madison

Milwaukee

Alberta

Edmonton

Calgary

British Columbia

Vancouver

Nova Scotia

Halifax

Ontario

Ottawa

Toronto

Quebec

Montreal

Puerto Rico

San Juan

© 2013-2019 Accelebrate, Inc. All Rights Reserved. All trademarks are owned by their respective owners.