Icone color1 09

Long-term memory neural network 1 – Introduction

Author: Daniele D’armiento

Cognitive skills, such as prediction, reckoning, ability to answer questions and to undertake actions, all involve retrieval of previously stored information.

The actual challenges in the development of an Artificial Intelligence reside in being able to both store big data in memory and quickly retrieve them.

But it is no news that computers are able to store a huge amount of data (today we estimates that all world data is much more than 1 ZettaByte, that is >> 1021 Bytes), and there is also nothing surprising about the existence of large databases and SQL queries for every need.

 Moreover, the human brain has not a stable memory as the silicon one but thanks to this deficiency, our lack of stability and reliability, we can intuitively process big data and retrieve information. In this way we can overcome the so-called “curse of dimensionality”.

No research has already unveiled the mysteries of the human brain but nowadays Deep Learning breakthrough brings us closer to a finer description of what intelligence is.

It has developed a model (which originates from biological neural networks) that is able to learn different signals encoded in images and sounds, to classify them and to build inner representations, in order to organize big amount of data and quickly recover informations distributed among all nodes of the network (opposite to the old style which stored data in precise memory addresses). This is completely automatic without the need of any sequential instructions or algorithms.


In the late 80’s, well before Deep Learning come, computers speed was very low compared to nowadays. That is because low speed requires high processing time. It is obvious but not trivial since nobody would start experiments and simulations which could take such a long time. Thus the key to unveiling that world already existed.

 The learning algorithm is, today and yesterday, BackPropagation which, together with Gradient Descent, allows us to find a better approximation of the network weights, reaching a minimum error relative to training data. It requires many update steps and many sample data from which to learn. All this is quite a massive calculus and a big time consumption.

This is an example of the error hypersurface, in the training’s parameters space: the SGD algorithm searches for the best, minimum error parameters.
 Source: https://thoughtsahead.com/2017/01/27/machine-learning-series-introduction-to-machine-learning-linear-regression-and-gradient-descent/

For this reason, old models had to be lightweight, and couldn’t exploit the big data necessary to obtain better performances, so it was the custom of that period to build shallow architectures with fewer parameters and with one or two neuron layers.

However, a more complex model is necessary to achieve a higher level of abstraction. It is also needed to store data in complex structures able to preserve a huge number of observed characteristics and, at the same time, able to generalize which means to recognize that features in objects never observed before.

So, we need to store more data not as a simple repetition, but rather as the “Eureka!” light bulb which occurs when we comprehend a new unifying scheme that describes many sides of the same entity. Sides that previously seemed uncorrelated.

Left: a “Shallow” neural network, a one hidden layer network.  Right: a “Deep” neural network, with many hidden layers.  Source: https://www.quora.com/What-is-the-difference-between-deep-and-shallow-neural-networks


If a Deep Learning model is able to recognize objects in images, to understand words spoken by humans, or answer to written questions in a way that does make sense, then it means that the model can grasp the meaning without retrieving that from a database. It synthesizes a concept by its own the same way we do.

 This is possible thanks to the “Deep” structure which enables us to store more information than the former pre-2010 models.

GoogLeNet: a neural network used by Google for image recogition.
Source: https://research.googleblog.com/2017/05/using-machine-learning-to-explore.html

Regarding linguistic models for translation, for NLP and NLU, as well as the conversational one, a big step forward was possible thanks to Deep Learning.

 The performances can be measured but it is straightforward to realize

only by intuition how a certain model gives us a good translation, or a decent chatbot assisting for a service.

When it does good, the cause of this intelligent behaviour is something different from a lookup table or a simple, quick algorithm; it needs a long-term memory to link words and phrases distant in time and semantic space. An n-gram statistics will certantly fail to do that.

A neural network with Episodic memory: Dynamic Memory Network (DMN)

Source: https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/


There are many types of neural networks for language models, ranging from Recurrent Neural Network (RNN) to Convolutional Neural Network (CNN).

We are going to show a new promising model: the Dynamic Memory Network (DMN).

This model is trained on elements which consists of several input phrases, a question

and an answer each, and its strength lies in the so-called Episodic Memory which can perform a multi-step process of phrases from which it builds a contest and extracts the required information.