Authors: Francesco Pugliese & Matteo Testi
During the last years, a buzz word arose in the field of Artificial Intelligence: “Deep Learning”. Recently, there is a great interest in this kind of research, especially amongst business companies which are truly performing a “scavenger hunt” in order to find experts from Machine Learning and Deep Learning areas. These roles are increasingly being associated with the figure of the Data Scientist.
We can simply have a look at the development of “Deep Learning” by Google Trends, in time-slot ranging from 2011 to 2017.
Since the time DeepMind’s “Alpha-Go” software defeated South Korean Master Lee Se-dol in the board game Go earlier last year, the term “Artificial Intelligence” (AI) has become exponentially popular. The way the Deep Learning Engine within Alpha-Go worked consisted in combining the traditional Tree Navigation algorithm called “Monte-Carlo Tree Search” (MTS) with very deep “Convolutional Neural Networks” (CNN). Until then, MTS was a de-facto-standard for building record-breaking programs able to play Go game. However, the value-functions was still based on heuristic hand-crafted heuristic methods. The novelty introduced by DeepMind was the value-function inferred by a CNN trained on a Supervised Training Set before of million moves. Successively, a Deterministic Policy Gradients System based on an Actor-Critic Model was addressed to play against different versions of him-self for a long time. The result was a still unbeaten artificial GO player. In the popular Nature article named “Mastering the game of Go with deep neural networks and tree search,” DeepMind carefully describe all the Alpha-Go system.
The following diagram explains the difference in the Artificial Intelligence, Machine Learning and Deep Learning.
Machine Learning (ML) is essentially a form of applied statistics meant to use computers to statistically estimate complex function. Mitchell in 1997 provides the following definition of Machine Learning: “A computer is said to learn from experience E concerning some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Basically, ML is a set of techniques able to “learn” from data, and then make a decision or prediction about them. A Machine Learning system can be applied to “knowledge” from multiple sources to solve diverse tasks: facial classification, speech recognition, object recognition, etc. Unlike hard-coded algorithms, namely algorithms with specific instructions to solve a problem, Machine Learning enables a computer to learn how to recognize patterns on its own and make predictions.
The Machine Learning can be adapted for three different types of tasks:
One of the most popular applications of machine learning has been Computer Vision, for many years. Most machine learning algorithms can be divided into the categories of supervised learning and unsupervised learning according to the training set comes supervised (with trainer’s external information or labels) or not supervised.
In Supervised Learning, labels are made by human to enable the machine to find out the relationship between input and labels
In Unsupervised learning, there are no labels available. In this situation, we are asking the computer to find clusters within the data.
Typical Machine learning algorithms are Random Forest, Linear / Logistic Regression, Decision Tree, Support Vector Machines, PCA, K means, ICA, Naive Bayes etc.
The Deep Learning is a subarea of the Machine Learning that makes use of Deep Neural Networks (with many layers) and specific novel algorithms for the pre-processing of data and regularization of the model: word embeddings, dropout, data-augmentation. Deep Learning takes inspiration from Neuroscience since Neural Networks are a model of the neuronal network within the brain. Unlike the biological brain, where any neuron can connect to any other under some physical constraints, Artificial Neural Networks (ANNs) have a finite number of layers, connections, and fixed direction of the data propagation. So far, ANNs have been ignored by the research and business community. The problem is their computational cost.
Between 2006 and 2012 the research group led by Geoffrey Hinton at the University of Toronto finally parallelized the ANNs algorithms to parallel architectures. The main breakthroughs are the increased number of layers, neurons, and model parameters in general (even over than 10 million) allowing to compute massive amounts of data through the system to train it.
Therefore, the first requirement for training a Deep learning Model is having available a massive train-set. This makes Deep Learning a good fit for the Big Data age.
The reasons behind the popularity of Deep Learning are Big Data and Graphic Processing Unit (GPUs). Using a massive amount of data the algorithm and network learn how to accomplish goals and improve upon the process.
A deep learning algorithm could be instructed to “learn” what a dog looks like. It would take a massive data set of dog’s images to understand “features” that identify a dog and distinguish it from a wolf. We should keep in mind that, the Deep learning is also highly susceptible to bias. For example, in a supervised model, if the labels are made wrong, the model is going to learn from the mistakes.
When Google’s facial recognition system was initially rolled out, for instance, it tagged many black faces as gorillas.
“That’s an example of what happens if you have no African American faces in your training set”
Said Anu Tewary, chief data officer for Mint at Intuit.
Deep Learning for Business
Deep learning affected business applications as never happened in Machine Learning before. It can deal with a huge amount of data—millions of images, for example—and recognise certain discriminative characteristics. Text-based searches, fraud detection, spam detection, handwriting recognition, image search, speech recognition, Street View detection, recommendation systems and translation are only some of the tasks that can be tackled by Deep Learning. At Google, Deep Networks have already replaced tens of “handcrafted rule-based systems”. Today, Deep Learning for Computer Vision already displays super-human capabilities, and that ranges from recognising common pictures like dog & cats to identifying cancer nodules in lung ct-scans.
Machine Learning theory claims that an ML algorithm can generalise very well from a finite training set of examples. This contradicts the basic of logic: inferring general rules from a limited set of examples is not logically valid. In other words, to infer a rule describing every member of a set, we should have information about every member of that set. In part, ML addresses this problem with probabilistic rules rather than certain rules of logical reasoning. Unfortunately, this does not resolve the problem. According to the “No Free Lunch Theorem” (David Wolpert and William Macready, 1997): averaged over all possible data generating distributions, every classification algorithm shows the same error rate on unobserved data (test set). This means that the best ML algorithm cannot exist: so our goal must be to understand what kinds of distributions are relevant for the “real world” and what kind of ML algorithms perform well on the data drawn from distributions we focus on. In other words, Deep Learning is not universally better than Machine Learning, but it depends on the task domain. However, it seems that, in the future, Deep Learning will be likely solving many of our everyday computer, business, AI, marketing, etc.
As Andrew Yan-Tak Ng, former chief scientist at Baidu, where he led the company’s Artificial Intelligence Team says,
“AI is the new electricity”.
We add: Deep Learning is the new light bulb.
We will deepen Deep Learning in the next tutorials, stay tuned…
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.