Author: Davide Coppola
Although sometimes overlooked, math is a fundamental part of machine learning (ML) and deep learning (DL). Indeed, it is the basis on which both disciplines stand: without notions of algebra or calculus they could not exist. A key factor in ML, coming from calculus, is the notion of derivative. But you should not be scared by this concept; it is much easier than you may think!
First of all, let us define a function: it can be thought of as a black box (Fig. 1): a number n of input values or independent variables enters the box; they are processed in a specific way determined by the equation(s) describing the function and finally m new output values or dependent variables exit the box.
Fig. 1: Any function can be seen as a black box where independent variables enter the box and obtain a new value.
For the rest of this tutorial, we will focus on unidimensional functions, i.e. functions that have only one input and one output. Common examples of this kind of functions are:
Where m, q, a, b and c are just numerical coefficients, think of them as any fixed number. 1 Is the equation of a straight line, 2 describes a parabola and 3 is the natural logarithm function. As you can see they all have an independent variable () and a dependent variable (): a function describes the relation that stands between the two variables, thus determines its “shape” in space.
If a function already describes our curve, then why do we need the derivative?
Generally speaking, functions usually are not as straightforward as the examples given above and it might be impossible or impractical to try out all the possible values of the independent variable to understand their behavior. Therefore, the derivative of a function gives additional information on the curve we are studying.
What is a derivative then? A derivative of a function is another function , deriving from the original, that describes the variability of , i.e. how the rate of change of a function behaves with respect to the independent variable. The derivative, evaluated at a point , describes how a function is changing in a neighborhood of . For example, if the derivative is positive we can expect for the points following to have higher values of . This means that the function is growing as increases. Likewise, if the derivative in is negative, the value of the function decreases as increases. Thus, the derivative at a given point indicates the slope of the line tangent to the curve at that point
The slope defines the ratio between the height and the horizontal length, for example, of an inclined plane or a right triangle. You surely have experience of this concept from road signs (Fig. 3). In general, the slope is given by the equation
Fig. 3: Road sign indicating the slope of the road.
The rigorous definition of a derivative is, in fact, the limit of the incremental ratio:
This ratio describes the slope of a secant to the curve passing through the points and . In fact, the numerator can be seen as the height of an inclined plane, whose horizontal length is . The limit tells that should be a number infinitely close to zero, meaning that the distance between the two points becomes practically non-existent. As a matter of fact, what was once a secant becomes a tangent to the curve as can be seen in the animation in Fig. 4.
Fig. 4: As the distance between the two points becomes zero, the points overlap and the secant line becomes a tangent to the curve.
Bear in mind that there are some particular cases where the derivative cannot be defined in one or more points of the function; the function has to be continuous in that point, although continuity alone is not sufficient for the derivative to exist.
Before looking at a simple example let us revise the key concepts; the derivative…
- … represents the variability of the primitive function with respect to the independent variable;
- … of a function is a function;
- … evaluated at any given point, represents the slope of the tangent to the curve at that point.
Fig. 5: A parabola and its derivative. The green and the blue straight lines are the tangents to the curve in the points x=-2 and x=+2. respectively.
In the example (Fig. 5) we have the graphs of a function (f) and its derivative (f’): the former is a parabola, whereas the latter is a straight line. Functions and their derivatives are usually represented with their graphs one above the other; this is because the independent variable is the same and this disposition makes is it easier to understand their relation.
Looking at x<0 , you can see that the derivative is positive, which means that the primitive function is growing with x , i.e. the slope of any tangent line to f for x<0 is positive. However, the value of the derivative is decreasing with a constant rate, meaning that the growth of the value of f is decreasing as well. Consequently, the tangent lines are more and more tending to a horizontal line.
This extreme situation occurs for x=0 , which corresponds to the apex of the parabola and to the point where the derivative is 0 . Points that have a derivative equal to 0 are called critical points or stationary points. They play a crucial role in calculus, and in machine learning as well because they represent the points corresponding to the maxima, the minima and saddle points of a function. Many machine learning algorithms revolve around the search for the minima of a function, reasons why it is important to have a little understanding of derivatives and their meaning.
With x>0 , the derivative is negative and its absolute value keeps growing. This means that the primitive function will decrease in value with x and that its decrease rate will also grow with each step. As a matter of fact, this is exactly what happens to the parabola.
The aim of this intuition tutorial was to give you a general understanding of how a derivative works and its meaning without using too many equations. Of course, A more in-depth and rigorous analysis of this topic is necessary if you want to fully understand more complex matters that arise in machine learning. But don’t be scared, it is not that complicated after all!
Fig. 2, 3 and 4 are taken from Wikipedia.