The Deep Learning with Keras Workshop

2. Machine Learning versus Deep Learning

Overview

In this chapter, we will begin creating Artificial Neural Networks (ANNs) using the Keras library. Before utilizing the library for modeling, we will get an introduction to the mathematics that comprise ANNs—understanding linear transformations and how they can be applied in Python. You'll build a firm grasp of the mathematics that make up ANNs. By the end of this chapter, we will have applied that knowledge by building a logistic regression model with Keras.

Introduction

In the previous chapter, we discussed some applications of machine learning and even built models with the scikit-learn Python package. The previous chapter covered how to preprocess real-world datasets so that they can be used for modeling. To do this, we converted all the variables into numerical data types and converted categorical variables into dummy variables. We used the logistic regression algorithm to classify users of a website by their purchase intention from the online shoppers purchasing intention dataset. We advanced our model-building skills by adding regularization to the dataset to improve the performance of our models.

In this chapter, we will continue learning how to build machine learning models and extend our knowledge so that we can build an Artificial Neural Network (ANN) with the Keras package. (Remember that ANNs represent a large class of machine learning algorithms that are so-called because their architecture resembles the neurons in the human brain.)

Keras is a machine learning library designed specifically for building neural networks. While scikit-learn's functionality spans a broader area of machine learning algorithms, the functionality of scikit-learn for neural networks is minimal.

ANNs can be used for the same machine learning tasks that other algorithms can perform, such as logistic regression for classification tasks, linear regression for regression problems, and k-means for clustering. Whenever we begin any machine learning problem, to determine what kind of task it is (regression, classification, or clustering), we need to ask the following questions:

What outcomes matter the most to me or my business? For example, if you are predicting the value of stock market indices, you could predict whether the price is higher or lower than the previous time point (which would be a classification task) or you could predict the value itself (which would be a regression problem). Each may lead to a different subsequent action or trading strategy.
The following plot shows a candlestick chart. It describes the price movements in financial data and is depicting a stock price. The colors represent whether the stock price increased (green) or decreased (red) in value over each period, and each candlestick shows the open, close, high, and low values of the data—important pieces of information for stock prices.
Note
You can find the high-quality color images for this chapter at: https://packt.live/38nenXS.
One goal of modeling this data would be to predict what happens the following day. A classification task might predict a positive or negative change in the stock price and since there are only two possible values, this would be a binary classification task. Another option would be to predict the value of the stock the following day. Since the predicted value would be a continuous variable, this would be a regression task:

Figure 2.1: A candlestick chart indicating the movement of a stock index over the span of a month

Do we have the appropriately labeled data to train a model? For a supervised learning task, we must have at least some labeled data in order to train a model. For example, if we want to build a model to classify images into dog images and cat images, we would need training data, the images themselves, and labels for the data indicating whether they are dog images or cat images. ANNs often need a lot of data. For image classification, this can be millions of images to develop accurate, robust models. This may be a determining factor when deciding which algorithm is appropriate for a given task.

ANNs are a type of machine learning algorithm that can be used to solve a task. They excel in certain aspects and have drawbacks in others, and these pros and cons should be considered before choosing this type of algorithm. Deep learning networks are distinguished from single-layer ANNs by their depth—the total number of hidden layers within the network.

So, deep learning is really just a specific subgroup of machine learning that relies on ANNs with multiple layers. We encounter the results of deep learning on a regular basis, whether it's in image classification models such as the friend recognition models that help tag friends in your Facebook photos, or the recommendation algorithms that help suggest your next favorite songs on Spotify. Deep learning models are becoming more prevalent over traditional machine learning models for a variety of reasons, including the growing sizes of unstructured data that deep learning models excel at and lower computational costs.

Choosing whether to use ANNs or traditional machine learning algorithms such as linear regression and decision trees for a particular task is a matter of experience and an understanding of the inner workings of the algorithm itself. As such, the benefits of using traditional machine learning algorithms or ANNs will be mentioned in the next section.

Advantages of ANNs over Traditional Machine Learning Algorithms

The best performance: For any supervised learning task, the best models have been ANNs that are trained on a lot of data. For example, in classification tasks such as classifying images from the ImageNet challenge (a large-scale visual recognition challenge for classifying images into 1000 classes), ANNs can attain greater accuracy than humans.
Scale effectively with data: Traditional machine learning algorithms, such as logistic regression and decision trees, plateau in performance, whereas the ANN architecture is able to learn higher-level features—nonlinear combinations of the input features that may be important for classification or regression tasks. This allows ANNs to perform better when provided with large amounts of data - especially those ANNs with a deep architecture. For example, ANNs that perform well in the ImageNet challenge are provided with 14 million images for training. The following figure shows the performance scaling with the amount of data for both deep learning algorithms and traditional machine learning algorithms:

Figure 2.2: Performance scaling with the amount of data for both deep learning algorithms and traditional machine learning algorithms

No need for feature engineering: ANNs are able to identify which features are important in modeling so that they are able to model directly from raw data. For example, in the binary classification of dog and cat images into their respective classes, there is no need to define features such as the color size or weight of the animal. The images themselves are sufficient for the ANN to successfully determine classification. In traditional machine learning algorithms, these features must be engineered in an iterative process that is manual and can be time-consuming.
Adaptable and transferable: Weights and features that are learned from ANNs can be applied to similar tasks. In computer vision tasks, pre-trained classification models can be used as the starting points for building models for other classification tasks. For example, VGG-16 is a 16-layer deep learning model that's used by ImageNet to classify 1000 random objects. The weights that are learned in the model can be transferred to classify other objects in significantly less time.

However, there are some advantages of using traditional machine learning algorithms over ANNs, as explained in the following section.

Advantages of Traditional Machine Learning Algorithms over ANNs

Relatively good performance when the available training data is small: In order to attain high performance, ANNs require a lot of data, and the deeper the network, the more data is required. With the increase in layers, the number of parameters that need to be learned also increases. This results in more time to train on the training data to reach the optimal parameter values. For example, VGG-16 has over 138 million parameters and required 14 million hand-labeled images to train and learn all the parameters.
Cost-effective: Both financially and computationally, deep networks can take a lot of computing power and time to train. This demands a lot of resources that may not be available to all. Moreover, these models are time-consuming to tune effectively and require a domain expert who's familiar with the inner workings of the model to achieve optimal performance.
Easy to interpret: Many traditional machine learning models are easy to interpret. So, identifying which feature had the most predictive power in the model is straightforward. This can be incredibly useful when working with non-technical team members who wish to understand and interpret the results of the model. ANNs are considered more of a black box, in that while they are successful in classifying images and other tasks, the understanding behind how the predictions are made is unintuitive and buried in layers of computations. As such, interpreting the results requires more effort.

Hierarchical Data Representation

One reason that ANNs are able to perform so well is that a large number of layers allows the network to learn representations of the data at many different levels. This is illustrated in the following diagram, in which the representation of an ANN being used to identify faces is shown. At lower levels of the model, simple features are learned, such as edges and gradients, as can be seen by looking at the features that were learned in the initial layers. As the model progresses, combinations of lower-level features activate to form face parts, and at later layers of the model, generic faces are learned. This is known as feature hierarchy and illustrates the power that this layered representation has for model building and interpretation.

Many examples of input for real-world applications of deep neural networks involve images, video, and natural language text. The feature hierarchy that is learned by deep neural networks allows them to discover latent structures within unlabeled, unstructured data, such as images, video, and natural language text, which makes them useful for processing real-world data—most often raw and unprocessed.

The following diagram shows an example of the learned representation of a deep learning model—lower features such as the edges and gradients activate together to form generic face shapes, which can be seen in the deeper layers:

Figure 2.3: Learned representation at various parts of a deep learning model

Since deep neural networks have become more accessible, various companies have started exploiting their applications. The following are some examples of some companies that use ANNs:

Yelp: Yelp uses deep neural networks to process, classify, and label their images more efficiently. Since photos are one important aspect of Yelp reviews, the company has placed an emphasis on classifying and categorizing them. This is achieved more efficiently with deep neural networks.
Clarifai: This cloud-based company is able to classify images and videos using deep neural network-based models.
Enlitic: This company uses deep neural networks to analyze medical image data such as X-rays or MRIs. The use of such networks in this application increases diagnostic accuracy and decreases diagnostic time and cost.

Now that we understand the potential applications of using ANNs, we can understand the mathematics behind how they work. While they may seem intimidating and complex, they can be broken down into a series of linear and nonlinear transformations, which themselves are simple to understand. An ANN is created by sequentially combining a series of linear and nonlinear transformations. The next section discusses the basic components and operations involved in linear transformations that comprise the mathematics of ANNs.

Linear Transformations

In this section, we will introduce linear transformations. Linear transformations are the backbone of modeling with ANNs. In fact, all the processes of ANN modeling can be thought of as a series of linear transformations. The working components of linear transformations are scalars, vectors, matrices, and tensors. Operations such as addition, transposition, and multiplication are performed on these components.

Scalars, Vectors, Matrices, and Tensors

Scalars, vectors, matrices, and tensors are the actual components of any deep learning model. Having a fundamental understanding of how to utilize these components, as well as the operations that can be performed on them, is key to understanding how ANNs operate. Scalars, vectors, and matrices are examples of the general entity known as a tensor, so the term tensors may be used throughout this chapter but may refer to any component. Scalars, vectors, and matrices refer to tensors with a specific number of dimensions.

The rank of a tensor is an attribute that determines the number of dimensions the tensor spans. The definitions of each are listed here:

Scalar: They are single numbers and are an example of 0-order tensors. For instance, the temperature at any given point is a scalar field.
Vector: Vectors are one-dimensional arrays of single numbers and are an example of first-order tensors. The velocity of a given object is an example of a vector field since it will have a speed in the two (x,y) or three (x,y,z) dimensions.
Matrix: Matrices are rectangular arrays that span over two dimensions that consist of single numbers. They are an example of second-order tensors. An example of where matrices might be used is to store the velocity of a given object over time. One dimension of the matrix comprises the speed in the given directions, while the other matrix dimension is comprised of each given time point.
Tensor: Tensors are the general entities that encapsulate scalars, vectors, and matrices. In general, the name is reserved for tensors of order 3 or more. An example of where tensors might be used is to store the velocity of many objects over time. One dimension of the matrix comprises the speed in the given directions, another matrix dimension is given for each given time point, and a third dimension describes the various objects.

The following diagram shows some examples of a scalar, a vector, a matrix, and a three-dimensional tensor:

Figure 2.4: A visual representation of scalars, vectors, matrices, and tensors

Tensor Addition

Tensors can be added together to create new tensors. We will use the example of matrices in this chapter, but this concept can be extended to tensors with any rank. Matrices may be added to scalars, vectors, and other matrices under certain conditions.

Two matrices may be added (or subtracted) together if they have the same shape. For such matrix-matrix addition, the resultant matrix is determined by the element-wise addition of the input matrices. The resultant matrix will, therefore, have the same shape as the two input matrices. We can define the matrix C = [cij] as the matrix sum C = A + B, where cij = aij + bij and each element in C is the sum of the same element in A and B. Matrix addition is commutative, which means that the order of A and B does not matter – A + B = B + A. Matrix addition is also associative, which means that the same result is achieved, even when the order of additions is different or even if the operation is applied more than once: A + (B + C) = (A + B) + C.

The same matrix addition principles apply for scalars, vectors, and tensors. An example of this is as follows:

Figure 2.5: An example of matrix-matrix addition

Scalars can also be added to matrices. Here, each element of the matrix is added to the scalar individually, as is shown in the below figure:

Figure 2.6: An example of matrix-scalar addition

It is possible to add vectors to matrices if the number of columns between the two matches each other. This is known as broadcasting.

Exercise 2.01: Performing Various Operations with Vectors, Matrices, and Tensors

Note

For the exercises and activities within this chapter, you will need to have Python 3.7, Jupyter, and NumPy installed on your system. All the exercises and activities will be primarily developed in Jupyter notebooks. It is recommended to keep a separate notebook for different assignments unless advised not to. Use the following link to download them from this book's GitHub repository: https://packt.live/2vpc9rO.

In this exercise, we are going to demonstrate how to create and work with vectors, matrices, and tensors within Python. We will assume that you have some familiarity with scalars. This can all be achieved with the NumPy library using the array and matrix functions. Tensors of any rank can be created with the NumPy array function.

Before you begin, you should set up the files and folders for this chapter in your working directory using a similar structure and naming convention as you did in the previous chapter. You can verify your folder structure by comparing it to the GitHub repository, linked above.

Follow these steps to perform this exercise:

Open Jupyter Notebook to implement this exercise. Import the necessary dependency. Create a one-dimensional array, or a vector, as follows:
```
import numpy as np
vec1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
vec1
```
The preceding code produces the following output:
```
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
```

Create a two-dimensional array, or matrix, with the array function:

mat1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
mat1

The preceding code produces the following output:

array([[ 1, 2, 3],
       [ 4, 5, 6],
       [ 7, 8, 9],
       [10, 11, 12]])

Use the matrix function to create matrices, which will show a similar output:

mat2 = np.matrix([[1, 2, 3], [4, 5, 6], \
                  [7, 8, 9], [10, 11, 12]])

Create a three-dimensional array, or tensor, using the array function:

ten1 = np.array([[[1, 2, 3], [4, 5, 6]], \
                 [[7, 8, 9], [10, 11, 12]]])
ten1

The preceding code produces the following output:

array([[[ 1, 2, 3],
        [ 4, 5, 6],
        [[ 7, 8, 9],
        [10, 11, 12]]])

Determining the shape of a given vector, matrix, or tensor is important since certain operations, such as addition and multiplication, can only be applied to components of certain shapes. The shape of an n-dimensional array can be determined using the shape method. Write the following code to determine the shape of vec1:
```
vec1.shape
```
The preceding code produces the following output:
```
(10, )
```
Write the following code to determine the shape of mat1:
```
mat1.shape
```
The preceding code produces the following output:
```
(4, 3)
```
Write the following code to determine the shape of ten1:
```
ten1.shape
```
The preceding code produces the following output:
```
(2, 2, 3)
```

Create a matrix with four rows and three columns with whichever numbers you like. Print the resulting matrix to verify its shape:

mat1 = np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
mat1

The preceding code produces the following output:

matrix([[ 1, 2, 3],
        [ 4, 5, 6],
        [ 7, 8, 9],
        [10, 11, 12]])

Create another matrix with four rows and three columns with whichever numbers you like. Print the resulting matrix to verify its shape:
```
mat2 = np.matrix([[2, 1, 4], [4, 1, 7], [4, 2, 9], [5, 21, 1]])
mat2
```
The preceding code produces the following output:
```
matrix([[ 2, 1, 4],
        [ 4, 1, 7],
        [ 4, 2, 9],
        [ 5, 21, 1]])
```

Add matrix 1 and matrix 2:

mat3 = mat1 + mat2
mat3

The preceding code produces the following output:

matrix([[ 3, 3, 7],
        [ 8, 6, 13],
        [ 11, 10, 18],
        [ 15, 32, 13]])

Add scalars to the arrays with the following code:

mat1 + 4

The preceding code produces the following output:

matrix([[ 5, 6, 7],
        [ 8, 9, 10],
        [ 11, 12, 13],
        [ 14, 15, 16]])

In this exercise, we learned how to perform various operations with vectors, matrices, and tensors. We also learned how to determine the shape of the matrix.

Note

To access the source code for this specific section, please refer to https://packt.live/2NNQ7VA.

You can also run this example online at https://packt.live/3eUDtQA.

Reshaping

A tensor of any size can be reshaped as long as the number of total elements remains the same. For example, a (4x3) matrix can be reshaped into a (6x2) matrix since they both have a total of 12 elements. The rank, or number of dimensions, can also be changed in the reshaping process. For example, a (4x3) matrix can be reshaped into a (3x2x2) tensor. Here, the rank has changed from 2 to 3. The (4x3) matrix can also be reshaped into a (12x1) vector, in which the rank has changed from 2 to 1.

The following diagram illustrates tensor reshaping—on the left is a tensor with shape (4x1x3), which can be reshaped to a tensor of shape (4x3). Here, the number of elements (12) has remained constant, though the shape and rank of the tensor have changed:

Figure 2.7: Visual representation of reshaping a (4x1x3) tensor into a (4x3) tensor

Matrix Transposition

The transpose of a matrix is an operator that flips the matrix over its diagonal. When this occurs, the rows become the columns and vice versa. The transpose operation is usually denoted as a T superscript upon the matrix. Tensors of any rank can also be transposed:

Figure 2.8: A visual representation of matrix transposition

The following figure shows the matrix transposition properties of matrices A and B:

Figure 2.9: Matrix transposition properties where A and B are matrices

A square matrix (that is, a matrix with an equivalent number of rows and columns) is said to be symmetrical if the transpose of a matrix is equivalent to the original matrix.

Exercise 2.02: Matrix Reshaping and Transposition

In this exercise, we are going to demonstrate how to reshape and transpose matrices. This will become important since some operations can only be applied to components if certain tensor dimensions match. For example, tensor multiplication can only be applied if the inner dimensions of the two tensors match. Reshaping or transposing tensors is one way to modify the dimensions of the tensor to ensure that certain operations can be applied. Follow these steps to complete this exercise:

Open a Jupyter notebook from the start menu to implement this exercise. Create a two-dimensional array with four rows and three columns, as follows:
```
import numpy as np
mat1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
mat1
```
This gives the following output:
```
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])
```
We can confirm its shape by looking at the shape of the matrix:
```
mat1.shape
```
The output is as follows:
```
(4, 3)
```
Reshape the array so that it has three rows and four columns instead, as follows:
```
mat2 = np.reshape(mat1, [3,4])
mat2
```
The preceding code produces the following output:
```
array([[ 1, 2, 3, 4],
       [ 5, 6, 7, 8],
       [ 9, 10, 11, 12]])
```
Confirm this by printing the shape of the array:
```
mat2.shape
```
The preceding code produces the following output:
```
(3, 4)
```

Reshape the matrix into a three-dimensional array, as follows:

mat3 = np.reshape(mat1, [3,2,2])
mat3

The preceding code produces the following output:

array([[[ 1, 2],
        [ 3, 4]],
       [[ 5, 6],
        [ 7, 8]],
       [[ 9, 10],
        [ 11, 12]]])

Print the shape of the array to confirm its dimensions:
```
mat3.shape
```
The preceding code produces the following output:
```
(3, 2, 2)
```
Reshape the matrix into a one-dimensional array, as follows:
```
mat4 = np.reshape(mat1, [12])
mat4
```
The preceding code produces the following output:
```
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
```
Confirm this by printing the shape of the array:
```
mat4.shape
```
The preceding code produces the following output:
```
(12, )
```
Taking the transpose of an array will flip it across its diagonal. For a one-dimensional array, a row-vector will be converted into a column vector and vice versa. For a two-dimensional array or matrix, each row becomes a column and vice versa. Call the transpose of an array using the T method:
```
mat = np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
mat.T
```
The following figure shows the output of the preceding code:
Figure 2.10: Visual demonstration of the transpose function
Check the shape of the matrix and its transpose to verify that the dimensions have changed:
```
mat.shape
```
The preceding code produces the following output:
```
(4, 3)
```
Check the shape of the transposed matrix:
```
mat.T.shape
```
The preceding code produces the following output:
```
(3, 4)
```
Verify the matrix elements do not match when a matrix is reshaped, and a matrix is transposed:
```
np.reshape(mat1, [3,4]) == mat1.T
```
The preceding code produces the following output:
```
array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, True]], dtype = bool)
```
Here, we can see that only the first and last elements match.

In this section, we introduced some of the basic components of linear algebra, including scalars, vectors, matrices, and tensors. We also covered some basic manipulation of linear algebra components, such as addition, transposition, and reshaping. By doing so, we learned how to put these concepts into action by using functions in the NumPy library to perform these operations.

Note

To access the source code for this specific section, please refer to https://packt.live/3gqBlR0.

You can also run this example online at https://packt.live/3eYCChD.

In the next section, we will extend our understanding of linear transformations by covering one of the most important transformations related to ANNs—matrix multiplication.

Matrix Multiplication

Matrix multiplication is fundamental to neural network operations. While the rules for addition are simple and intuitive, the rules for multiplication for matrices and tensors are more complex. Matrix multiplication involves more than simple element-wise multiplication of the elements. Instead, a more complicated procedure is implemented that involves the entire row of one matrix and an entire column of the other. In this section, we will explain how multiplication works for two-dimensional tensors or matrices; however, tensors of higher orders can also be multiplied.

Given a matrix, A = [aij]m x n, and another matrix, B = [bij]n x p , the product of the two matrices is C = AB = [Cij]m x p, and each element, cij, is defined element-wise as formula4 . Note that the shape of the resultant matrix is the same as the outer dimensions of the matrix product or the number of rows of the first matrix and the number of columns of the second matrix. For the multiplication to work, the inner dimensions of the matrix product must match, or the number of columns of the first matrix and the number of columns of the second matrix.

The concept of inner and outer dimensions of matrix multiplication can be seen in the following figure:

Figure 2.11: A visual representation of the inner and outer dimensions in matrix multiplication

Unlike matrix addition, matrix multiplication is not commutative, which means that the order of the matrices in the product matters:

Figure 2.12: Matrix multiplication is non-commutative

For example, let's say we have the following two matrices:

Figure 2.13: Two matrices, A and B

One way to construct the product is to have matrix A first, multiplied by B:

Figure 2.14: Visual representation of matrix A multiplied by B

This results in a 2x2 matrix. Another way to construct the product is to have B first, multiplied by A:

Figure 2.15: Visual representation of matrix B multiplied by A

Here, we can see that the matrix that was formed from the product BA is a 3x3 matrix and is very different from the matrix that was formed from the product AB.

Scalar-matrix multiplication is much more straightforward and is simply the product of every element in the matrix multiplied by the scalar so that λA = [λaij]m x n, where λ is a scalar and A is a matrix.

In the following exercise, we will put our understanding into practice by performing matrix multiplication in Python utilizing the NumPy library.

Exercise 2.03: Matrix Multiplication

In this exercise, we are going to demonstrate how to multiply matrices together. Follow these steps to complete this exercise:

Open a Jupyter notebook from the start menu to implement this exercise.

To demonstrate the fundamentals of matrix multiplication, begin with two matrices of the same shape:

import numpy as np
mat1 = np.array([[1, 2, 3], [4, 5, 6], \
                 [7, 8, 9], [10, 11, 12]])
mat2 = np.array([[2, 1, 4], [4, 1, 7], \
                 [4, 2, 9], [5, 21, 1]])

Since both matrices have the same shape and they are not square, they cannot be multiplied as is, otherwise, the inner dimensions of the product won't match. One way we could resolve this is to take the transpose of one of the matrices; then, we would be able to perform the multiplication. Take the transpose of the second matrix, which would mean that a (4x3) matrix is multiplied by a (3x4) matrix. The result would be a (4x4) matrix. Perform the multiplication using the dot method:
```
mat1.dot(mat2.T)
```
The preceding code produces the following output:
```
array([[ 16, 27, 35, 50],
       [ 37, 63, 80, 131],
       [ 58, 99, 125, 212],
       [ 79, 135, 170, 293]])
```
Take the transpose of the first matrix, which would mean that a (3x4) matrix is multiplied by a (4x3) matrix. The result would be a (3x3) matrix:
```
mat1.T.dot(mat2)
```
The preceding code produces the following output:
```
array([[ 96, 229, 105],
       [ 111, 254, 126],
       [ 126, 279, 147]])
```
Reshape one of the arrays to make sure the inner dimension of the matrix multiplication matches. For example, we can reshape the first array to make it a (3x4) matrix instead of transposing. Note that the result is not the same as it is when transposing:
```
np.reshape(mat1, [3,4]).dot(mat2)
```
The preceding code produces the following output:
```
array([[ 42, 93, 49],
       [ 102, 193, 133],
       [ 162, 293, 217]])
```

In this exercise, we learned how to multiply two matrices together. The same concept can be applied to tensors of all ranks, not just second-order tensors. Tensors of different ranks can even be multiplied together if their inner dimensions match.

Note

To access the source code for this specific section, please refer to https://packt.live/38p0RD7.

You can also run this example online at https://packt.live/2VYI1xZ.

The next exercise demonstrates how to multiply three-dimensional tensors together.

Exercise 2.04: Tensor Multiplication

In this exercise, we are going to apply our knowledge of matrix multiplication to higher-order tensors. Follow these steps to complete this exercise:

Open a Jupyter notebook from the start menu to implement this exercise. Begin by creating a three-dimensional tensor using the NumPy library and the array function. Import all the necessary dependencies:
```
import numpy as np
mat1 = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
mat1
```
The preceding code produces the following output:
```
array([[[ 1, 2, 3],
        [ 4, 5, 6],
        [[ 1, 2, 3],
        [ 4, 5, 6]]])
```
Confirm the shape using the shape method:
```
mat1.shape
```
This tensor has the shape (2x2x3).
Create a new three-dimensional tensor that we will be able to multiply the tensor by. Take the transpose of the original matrix:
```
mat2 = mat1.T
mat2
```
The preceding code produces the following output:
```
array([[[ 1, 1],
        [ 4, 4]],
       [[ 2, 2],
        [ 5, 5]],
       [[ 3, 3],
        [ 6, 6]]])
```
Confirm the shape using the shape method:
```
mat2.shape
```
This tensor has the shape (3x2x2).

Take the dot product of the two matrices, as follows:

mat3 = mat2.dot(mat1)
mat3

The preceding code produces the following output:

array([[[[ 5, 7, 9],
         [ 5, 7, 9]],
        [[ 20, 28, 36],
         [ 20, 28, 36]]],
       [[[ 10, 14, 18],
         [ 10, 14, 18]],
        [[ 25, 35, 45],
         [ 25, 35, 45]]],
       [[[ 15, 21, 27],
         [ 15, 21, 27]],
        [[ 30, 42, 54],
         [ 30, 42, 54]]]])

Look at the shape of this resultant tensor:
```
mat3.shape
```
The preceding code produces the following output:
```
(3, 2, 2, 3)
```
Now, we have a four-dimensional tensor.

In this exercise, we learned how to perform matrix multiplication using the NumPy library in Python. While we do not have to perform matrix multiplication directly when we create ANNs with Keras, it is still useful to understand the underlying mathematics.

Note

To access the source code for this specific section, please refer to https://packt.live/31G1rLn.

You can also run this example online at https://packt.live/2AriZjn.

Introduction to Keras

Building ANNs involves creating layers of nodes. Each node can be thought of as a tensor of weights that are learned in the training process. Once the ANN has been fitted to the data, a prediction is made by multiplying the input data by the weight matrices layer by layer, applying any other linear transformation when needed, such as activation functions, until the final output layer is reached. The size of each weight tensor is determined by the size of the shape of the input nodes and the shape of the output nodes. For example, in a single-layer ANN, the size of our single hidden layer can be thought of as follows:

Figure 2.16: Solving the dimensions of the hidden layer of a single-layer ANN

If the input matrix of features has n rows, or observations, and m columns, or features, and we want our predicted target to have n rows (one for each observation) and one column (the predicted value), we can determine the size of our hidden layer by what is needed to make the matrix multiplication valid. Here is the representation of a single-layer ANN:

Figure 2.17: Representation of a single-layer ANN

Here, we can determine that the weight matrix will be of size (mx1) to ensure the matrix multiplication is valid.

If we have more than one hidden layer in an ANN, then we have much more freedom with the size of these weight matrices. In fact, the possibilities are endless, depending on how many layers there are and how many nodes we want in each layer. In practice, however, certain architecture designs work better than others, as we will be learning throughout this book.

In general, Keras abstracts much of the linear algebra out of building neural networks so that users can focus on designing the architecture. For most networks, only the input size, output size, and the number of nodes in each hidden layer are needed to create networks in Keras.

The simplest model structure in Keras is the Sequential model, which can be imported from keras.models. The model of the Sequential class describes an ANN that consists of a linear stack of layers. A Sequential model can be instantiated as follows:

from keras.models import Sequential
model = Sequential()

Layers can be added to this model instance to create the structure of the model.

Note

Before initializing your model, it is helpful to set a seed using the seed function in NumPy's random library and the set_seed function from TensorFlow's random library.

Layer Types

The notion of layers is part of the Keras core API. A layer can be thought of as a composition of nodes, and at each node, a set of computations happen. In Keras, all the nodes of a layer can be initialized by simply initializing the layer itself. The individual operation of a generalized layer node can be seen in the following diagram. At each node, the input data is multiplied by a set of weights using matrix multiplication, as we learned earlier in this chapter. The sum of the product between the weights and the input is applied, which may or may not include a bias, as shown by the input node equal to 1 in the following diagram. Further functions may be applied to the output of this matrix multiplication, such as activation functions:

Figure 2.18: A depiction of a layer node

Some common layer types in Keras are as follows:

Dense: This is a fully connected layer in which all the nodes of the layer are directly connected to all the inputs and all the outputs. ANNs for classification or regression tasks on tabular data usually have a large percentage of their layers with this type in the architecture.
Convolutional: This layer type creates a convolutional kernel that is convolved with the input layer to produce a tensor of outputs. This convolution can occur in one or multiple dimensions. ANNs for the classification of images usually feature one or more convolutional layers in their architecture.
Pooling: This type of layer is used to reduce the dimensionality of an input layer. Common types of pooling include max pooling, in which the maximum value of a given window is passed through to the output, or average pooling, in which the average value of a window is passed through. These layers are often used in conjunction with a convolutional layer, and their purpose is to reduce the dimensions of the subsequent layers, allowing for fewer training parameters to be learned with little information loss.
Recurrent: Recurrent layers learn patterns from sequences, so each output is dependent on the results from the previous step. ANNs that model sequential data such as natural language or time-series data often feature one or more recurrent layer types.

There are other layer types in Keras; however, these are the most common types when it comes to building models using Keras.

Let's demonstrate how to add layers to a model by instantiating a model of the Sequential class and adding a Dense layer to the model. Successive layers can be added to the model in the order in which we wish the computation to be performed and can be imported from keras.layers. The number of units, or nodes, needs to be specified. This value will also determine the shape of the result from the layer. A Dense layer can be added to a Sequential model in the following way:

from keras.layers import Dense
from keras.models import Sequential
input_shape = 20
units = 1
model.add(Dense(units, input_dim=input_shape))

Note

After the first layer, the input dimension does not need to be specified since it is determined from the previous layer.

Activation Functions

An activation function is generally applied to the output of a node to limit or bound its value. The value from each node is unbounded and may have any value, from negative to positive infinity. These can be troublesome within neural networks where the values of the weights and losses have been calculated and can head toward infinity and produce unusable results. Activation functions can help in this regard by bounding the value. Often, these activation functions push the value to two limits. Activation functions are also useful for deciding whether the node should be "fired" or not. Common activation functions are as follows:

The Step function: The value is nonzero if it is above a certain threshold; otherwise, it is zero.
The Linear function: , which is a scalar multiplication of the input value.
The Sigmoid function: , such as a smoothed-out step function with smooth gradients. This activation function is useful for classification since the values are bound from zero to one.
The Tanh function: , which is a scaled version of the sigmoid with steeper gradients around x=0.
The ReLU function: , otherwise 0.

Now that we have looked at some of the main components, we can begin to see how we might create useful neural networks out of these components. In fact, we can create a logistic regression model with all the concepts we have learned about in this chapter. A logistic regression model operates by taking the sum of the product of an input and a set of learned weights, followed by the output being passed through a logistic function. This can be achieved with a single-layer neural network with a sigmoid activation function.

Activation functions can be added to models in the same manner that layers are added to models. The activation function will be applied to the output of the previous step in the model. A tanh activation function can be added to a Sequential model as follows:

from keras.layers import Dense, Activation
from keras.models import Sequential
input_shape = 20
units = 1
model.add(Dense(units, input_dim=input_shape))
model.add(Activation('tanh'))

Note

Activation functions can also be added to a model by including them as an argument when defining the layers.

Model Fitting

Once a model's architecture has been created, the model must be compiled. The compilation process configures all the learning parameters, including which optimizer to use, the loss function to minimize, as well as optional metrics, such as accuracy, to calculate at various stages of the model training. Models are compiled using the compile method, as follows:

model.compile(optimizer='adam', loss='binary_crossentropy', \
              metrics=['accuracy'])

After the model has been compiled, it is ready to be fit to the training data. This is achieved with an instantiated model using the fit method. Useful arguments when using the fit method are as follows:

X: The array of the training feature data to fit the data to.
y: The array of the training target data.
epochs: The number of epochs to run the model for. An epoch is an iteration over the entire training dataset.
batch_size: The number of training data samples to use per gradient update.
validation_split: The proportion of the training data to be used for validation that is evaluated after each epoch.
shuffle: Indicates whether to shuffle the training data before each epoch.

The fit method can be used on a model in the following way:

history = model.fit(x=X_train, y=y_train['y'], \
                    epochs=10, batch_size=32, \
                    validation_split=0.2, shuffle=False)

It is beneficial to save the output of calling the fit method of the model since it contains information on the model's performance throughout training, including the loss, which is evaluated after each epoch. If a validation split is defined, the loss is evaluated after each epoch on the validation split. Likewise, if any metrics are defined in training, they are also calculated after each epoch. It is useful to plot such loss and evaluation metrics to determine model performance as a function of the epoch. The model's loss as a function of the epoch can be visualized as follows:

import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(history.history['loss'])
plt.show()

Keras models can be evaluated by utilizing the evaluate method of the model instance. This method returns the loss and any metrics that were passed to the model for training. The method can be called as follows when evaluating an out-of-sample test dataset:

test_loss = model.evaluate(X_test, y_test['y'])

These model-fitting steps represent the basic steps that need to be followed to build, train, and evaluate models using the Keras package. From here, there are an infinite number of ways to build and evaluate a model, depending on the task you wish to accomplish. In the following activity, we will create an ANN to perform the same task that we completed in Chapter 1, Introduction to Machine Learning with Keras. In fact, we will recreate the logistic regression algorithm with ANNs. As such, we expect there to be similar performance between the two models.

Activity 2.01: Creating a Logistic Regression Model Using Keras

In this activity, we are going to create a basic model using the Keras library. We will perform the same classification task that we did in Chapter 1, Introduction to Machine Learning with Keras. We will use the same online shopping purchasing intention dataset and attempt to predict the same variable.

In the previous chapter, we used a logistic regression model to predict whether a user would purchase a product from a website when given various attributes about the online session's behavior and the attributes of the web page. In this activity, we will introduce the Keras library, though we'll continue to utilize the libraries we introduced previously, such as pandas, for easily loading in the data, and sklearn, for any data preprocessing and model evaluation metrics.

Note

Preprocessed datasets have been provided for you to use for this activity. You can download them from https://packt.live/2ApIBwT.

The steps to complete this activity are as follows:

Load in the processed feature and target datasets.
Split the training and target data into training and test datasets. The model will be fit to the training dataset and the test dataset will be used to evaluate the model.
Instantiate a model of the Sequential class from the keras.models library.
Add a single layer of the Dense class from the keras.layers package to the model instance. The number of nodes should be equal to the number of features in the feature dataset.
Add a sigmoid activation function to the model.
Compile the model instance by specifying the optimizer to use, the loss metric to evaluate, and any other metrics to evaluate after each epoch.
Fit the model to the training data, specifying the number of epochs to run for and the validation split to use.
Plot the loss and other evaluation metrics with respect to the epoch that will be evaluated on the training and validation datasets.
Evaluate the loss and other evaluation metrics on the test dataset.

After implementing these steps, you should get the following expected output:

2466/2466 [==============================] - 0s 15us/step
The loss on the test set is 0.3632 and the accuracy is 86.902%

Note

The solution for this activity can be found via this link.

In this activity, we looked at some of the fundamental concepts of creating ANNs in Keras, including various layer types and activation functions. We used these components to create a simple logistic regression model using a package that gives us similar results to the logistic regression model we used in Chapter 1, Introduction to Machine Learning with Keras. We learned how to build the model with the Keras library, train the model with a real-world dataset, and evaluate the performance of the model on a test dataset to provide an unbiased evaluation of the performance of the model.

Filter reviews by

All

Packt verified reviews

Amazon verified reviews

Darwin Cubi Jan 27, 2021

En mi caso me interesó este libro pues la mayoría del tiempo trabajo con R y he estado poco a poco aprendiendo python, con principal intereses en el aprendizaje profundo. No he tenido la oportunidad de trabajar con keras o tensorFlow y tengo que decir que me sentí algo intimidado cuando mire los ejemplos que se encuentran que la web del desarrollador. Este libro en apariencia es extenso con sus 495 páginas y 9 capítulos pero se debe a que tiene varias ilustraciones, bloques de código junto con el detalle de las funciones. Desde el primer capítulo hasta el último se definen y aclaran conceptos de manera muy sencilla sin entrar en detalles matemáticos, si ya tienes cierta experiencia podrías saltarte el capítulo 1. Lo genial del libro es que hace una perfecta sincronía entre mostrar los pasos que se deben seguir para la elaboración de un modelo y cómo usar las bibliotecas keras y tensorflow. Además el libro está estructurado de tal manera que todo el desarrollo y aprendizaje de un capitulo se usa para el capítulo siguiente.Todo el código que se muestra en el libro ya se encuentra en github junto con el archivo requirements.txt que facilita mucho la instalación de las bibliotecas necesarias. Recomiendo mucho este libro a personas que tengan poca o nada de experiencia en el desarrollo de modelos ya que hará que su primer contacto sea muy fácil y muy enriquecedor en conceptos y buenas prácticas al momento de crear un modelo de aprendizaje profundo. No lo recomendaría para personas con experiencia que buscan el concepto matemático o que desean profundizar en la optimización o uso de los hyperparametros de un modelo.In my case I was interested in this book as most of the time I work with R and have been slowly learning python, with main interests in deep learning. I have not had the opportunity to work with keras or tensorFlow and I have to say that I felt a bit intimidated when I looked at the examples found on the developer's website. This book in appearance is lengthy with its 495 pages and 9 chapters but that is because it has several illustrations, code blocks along with the detail of the functions. From the first chapter to the last it defines and clarifies concepts in a very simple way without going into mathematical details, if you already have some experience you could skip chapter 1. The great thing about the book is that it makes a perfect synchrony between showing the steps that must be followed for the development of a model and how to use the keras and tensorflow libraries. Also the book is structured in such a way that all the development and learning from one chapter is used for the next chapter.All the code shown in the book is already on github along with the requirements.txt file which makes it much easier to install the necessary libraries. I highly recommend this book to people with little or no experience in model development as it will make your first contact very easy and very enriching in concepts and best practices when creating a deep learning model. I would not recommend it for experienced people who are looking for the mathematical concept or who want to go deeper into the optimization or use of the hyperparameters of a model.

Amazon Verified review

Sree Feb 11, 2021

One of the best books on Keras!

Andrew Dec 13, 2020

This is a great book if you want to learn how to use Keras for deep learning! The layout is clear and easily understandable, with lots of practical applications with code examples. The book is well organized into chapters that make it a practical resource. Practical aspects such as data preprocessing, model evaluation and advantages/disadvantages of deep learning are also covered, which make this a good reference for both how deep learning models work as well as how they can be best applied. Highly recommended for anyone looking to start using Keras for their machine learning projects!

Siim Tolk Feb 28, 2021

The authors do not assume any prior knowledge from the reader except for some experience with Python and Linear Algebra. However, the chapters manage to avoid lengthy introductory theory and cover the necessary basics quickly which means that you’ll see how to actually use the packages on example problems in no time. The book relies heavily on examples, which keeps the pace high, and offers explanations on what is going on behind the scenes along the way.All in all, I think it is an excellent resource for people new to the field.Best read with a Jupyter notebook running on aside.(FYI, I was sent a copy of the book by the authors for an unbiased review )

Martin Alonso Jan 11, 2021

I’ve always found neural networks and deep learning fascinating. The ease with which they can be used to solve any machine learning problem are mind-blowing, though their computational power varies with the number of neurons you decide to use, which can be either a hindrance or a boon to the model.The advantage of having open-source platforms, such as Python, Jupyter, and the several libraries and modules available is there is always a software developer who has already tried to solve the problem you are working on. Thus, the likelihood that there is a Python library capable of implementing machine learning or deep learning models was already pretty high. Enter Tensorflow and Keras.Both these libraries work together, and they are very easy to implement straight out of the box – once you’ve installed the necessary dependencies that enable their use. And though the tutorials on their website are straightforward, there is very little documentation on how to apply neural networks beyond what is offered on the website, which is simple regression and image classification problems.The Deep Learning with Keras Workshop aims to correct this.It is a very easy to follow guide which strives to lay the groundwork behind how neural networks work and how these could be applied using Keras. It does so by assuming that the user isn’t wanting in statistical analysis or how modeling works, but by first explaining the proper way the data should be assessed, establishing a baseline using both the data itself and a simple model, and from there building up on how neural networks work.When it comes to how models are built, evaluated, and tuned, it is very easy to find tutorials that cover all this; albeit these can be very brief, not explaining the intricacies of the model, while also being very crowded when it comes to explaining how each part of the model-building process comes together. I thoroughly enjoyed that, in this book, while each of these parts is studied, they are each dedicated a separate chapter, allowing the reader to take their time to explained how concepts such as hyperparameter tuning or cross-validation are used to boost the model’s prediction capabilities. By not cramming model building, tuning, evaluation, and cross-validation into every single chapter, but rather presenting each one individually and using the previous as a foundation to build on the next, the authors have done an extremely good job building a book that genuinely teaches not only how Keras works, but how the process of building any model works.Turning to the user examples, each chapter builds on the previous one, allowing the user to build more robust models, but not all chapters use the same examples. I found it refreshing that the book works with several real-world problems and data sets, showing the user how neural networks and Keras can be applied to several problems. And, though not all of them have high accuracy rates or low-test errors, which is not a downside given this is how most real-world problems work, it helps the user by showing the applications of neural networks beyond simple regressions and image classification.The one true downside to this book is that it does not go into detail how each parameter affects the model. Why is it important to have more (or less) epochs or batches? What is the difference between an “adam” and “sgd” optimizer? What is the difference between using Dense(1, activation=’sigmoid’) and Activation(‘sigmoid’)? Nevertheless, going into these details could make the book harder to understand and could cause the user to lose interest, so perhaps adding some minor details, or pointing to further material that can go in depth into those subjects, akin to what is done with the Keras and Tensorflow package websites, would be helpful.Overall, I find that my experience with Keras and neural networks has improved upon reading this book, and though the use of such is computationally demanding, I find that I can now try to use small deep learning models as part of my model evaluation and comparison when exploring new data sets in a more comfortable manner.

The Deep Learning with Keras Workshop: Learn how to define and train neural network models with just a few lines of code

What do you get with eBook?

The Deep Learning with Keras Workshop

2. Machine Learning versus Deep Learning

Introduction

Advantages of ANNs over Traditional Machine Learning Algorithms

Advantages of Traditional Machine Learning Algorithms over ANNs

Hierarchical Data Representation

Linear Transformations

Scalars, Vectors, Matrices, and Tensors

Tensor Addition

Exercise 2.01: Performing Various Operations with Vectors, Matrices, and Tensors

Reshaping

Matrix Transposition

Exercise 2.02: Matrix Reshaping and Transposition

Matrix Multiplication

Exercise 2.03: Matrix Multiplication

Exercise 2.04: Tensor Multiplication

Introduction to Keras

Layer Types

Activation Functions

Model Fitting

Activity 2.01: Creating a Logistic Regression Model Using Keras

Summary

Page 1 of 5

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs

The Deep Learning with Keras Workshop: Learn how to define and train neural network models with just a few lines of code

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access