How Does Deep Learning Work?


Most deep learning methods use neural network architectures, so deep learning models are often called deep neural networks.

The term "deep" usually refers to the number of hidden layers in the neural network. Traditional neural networks only contain 2-3 hidden layers, while deep networks can be up to 150.

Deep learning models are trained using large labeled datasets and neural network architectures that learn features directly from data without the need for manual feature extraction.

One of the most popular deep neural networks is known as convolutional neural networks (CNN or ConvNet). A CNN combines the learned features with input data and uses 2D convolutional layers and makes this architecture very convenient for processing 2D data such as images.

CNNs eliminate the need for manual feature extraction, so you don't need to define the features used to classify images. CNN works by removing features directly from images. Related features are not trained in advance; learned while the network is working on an image collection. This automatic feature extraction makes deep learning models extremely sensitive to computer vision tasks such as object classification.

CNNs learn to detect different properties of the image using tens or hundreds of hidden layers. Each hidden layer increases the complexity of the learned image features. For example, the first hidden layer can learn how to detect the edges, and the last learns how to detect more complex shapes that are particularly suitable for the shape of the object we are trying to recognize.

What is the Difference Between Machine Learning and Deep Learning?

Deep learning is a special form of machine learning. Machine learning workflow begins with manual extraction of related features from images. The properties are then used to create a model that categorizes objects in the image. With a deep learning workflow, related features are automatically removed from the images. In addition, deep learning performs “end-to-end learning”, where a network is assigned a task to perform, such as raw data and classification, and it learns how to do this automatically.

Another important difference is that while deep learning algorithms are scaled with data, shallow learning converges. Shallow learning refers to machine learning methods that span a certain level of performance when you add more examples and training data to the network.

An important advantage of deep learning networks is that as your data grows, they often continue to improve.

Choosing Between Machine Learning and Deep Learning

Machine learning offers a variety of techniques and models to choose from, based on your application, the size of the data you process, and the type of problem you want to solve. A successful deep learning application requires a huge amount of data (thousands of images) to train GPUs or graphics processing units, as well as the model to process your data quickly.

When choosing between machine learning and deep learning, consider whether you have a high-performance GPU and a lot of tagged data. If you do not have any of these, it may make more sense to use machine learning instead of deep learning. Deep learning is often more complex, so you will need at least a few thousand images to get reliable results. Having a high-performance GPU means that the model will take less time to analyze all these images.

The three most common ways people use deep learning to perform object classification are:

Training from scratch

To train a deep network from scratch, you collect a very large set of tagged data and design a network architecture to learn the features and model. This is good for new applications or applications with multiple output categories. This is a less common approach, because training of these networks takes days or weeks, with a large amount of data and learning rates.

Transfer Learning

Most deep learning apps use the transfer learning approach, a process that involves fine-tuning a pre-trained model. You start with an existing network such as AlexNet or GoogLeNet and feed new data containing previously unknown classes. After making some changes to the network, you can now perform a new task like categorizing only dogs or cats instead of 1000 different objects. This also has the advantage of needing much less data (processing thousands of images instead of millions), so the calculation time is reduced to minutes or hours.

Transfer learning requires an interface into the pre-existing network, so it can be surgically changed and improved for the new task. MATLAB® has the tools and functions designed to help you learn transfer.

Feature extraction

A slightly less common, more specific approach to deep learning is to use the network as a feature extractor. Since all layers are assigned to learn specific features from images, we can remove these features from the network at any time during training. These features can then be used as input to a machine learning model, such as support vector machines (SVM).

Accelerating Deep Learning Models with GPUs

Training for a deep learning model can take from days to weeks. Using GPU acceleration can significantly speed up the process. Using MATLAB with a GPU reduces the time it takes to train a network and can reduce training time from days to hours for the image classification issue. When training deep learning models, MATLAB uses GPUs (if available) without having to know how to program GPUs explicitly.