Author: Paolo Caressa
In machine learning and deep learning, as in any other interdisciplinary area, concepts and formalisms coming from different sources and fields are used, and often they require different mindsets to be actually understood and properly managed.
For this reason, one says usually that a data scientist should be a hybrid between a computer scientist, a statistician and a mathematician, but few people are actually well-versed in all of those three subjects, and it is unlikely that all those fields will be familiar, easy and intuitive for the same mind.
In particular, among the different mathematical notions which are truly needed to a data scientist, as for anyone seriously interested in understanding how a machine learning system works (or rather why it actually works), we find linear algebra.
This is quite surprising for many mathematicians, which consider linear algebra so easy to be userful only at a first approximation. But nevertheless, as I.M. Herstein wrote, linear algebra is just easy, useful and beautiful!
The aim of this tutorial is to explain, in a simple but precise way, the basic notions of linear algebra: thus, if you reader already know how to compute eigenvalues, or how to perform a singular value decomposition or how to compute a principal component analysis, this tutorial is not for you. Otherwise, enjoy reading!
The Cartesian plane and space
Let us start from the very beginning, and dig into our high school remembrances about Cartesian (or analytic) geometry.
This approach to geometry was devised in the XVIIth century by Descartes and Fermat, and the basic idea is to provide a numerical model both for the plane and space. Essentially, their discovery consists in identifying the points in the plane and in the space with respectively pairs and triples of numbers, so that the plane is one-to-one mapped into the set of pairs (x,y) of numbers where x and y vary independently and in all possible ways.
To begin with, let us recall that the points of the line correspond to the set of real numbers (thus rational and irrational numbers) totally ordered by the relation <.
Indeed, let us fix a point O on the line and let it correspond to 0, next let us fix a point U (different from O) on the line and let it correspond to 1. All other points are determined uniquely by their positions with respect to those two points:
In particular, it is clear what we mean by the distance between two points P and Q on the line: if x and y are the real numbers corresponding to P and Q, the distance PQ is just |x – y| (absolute value of the difference):
Therefore, on a line, once a point corresponding to 0 and a point corresponding to 1 are fixed we get also a direction for positive numbers: indeed O divides the line in twintosides, and positive numbers are the ones on the side of U, and negative ones are on the opposite side.
The line of Euclidean geometry corresponds one-to-one to the set R of real numbers
To each point it corresponds one and only one number, were we may think to a real number as either a decimal number with a finite number of digits after the decimal point, possibly repeating indefinitely, or a decimal number with infinite non periodic digits after the decimal dot (to be precise we should exclude a periodic repetition of the 9 digit).
Next, let us come to the plane.
Fix a point O in the plane and take the line x passing through it, and also a second line y, perpendicular to x and passing through O. On both x and y we choose the same measure unit (thus a point U different from O and state that the distance OU is 1).
Therefore we may measure the distance between two points on the same line x, as we did before.
The correspondence between a point P and a pair (a,b) of numbers is defined as follows:
- We project P on the x line, getting a point P’, and we put a = distance between P’ and O.
- We project P on the y line, getting a point P’’, and we put b = distance between P’’ and O.
In this way we uniquely associate P to the pair (a,b): to come back from coordinates (a,b) to points it suffices to find out points P’ and P’’ on lines x and y which have distances a and b from the origin O, to that
In this way we uniquely associate P to the pair (a,b): to come back from coordinates (a,b) to points it suffices to find out points P’ and P’’ on lines x and y which have distances a and b from the origin O, to that we draw the perpendicular to the axes x and y through P’ and P’’ and get the point P as intersection point of these two perpendiculars
The plane of Euclidean corresponds one-to-one to the set R2 of pairs of real numbers: to each pair, it corresponds one and only one number.
The notation R2 denotes the ordered pairs of elements of R. In general, if X is a set, Xn is the set of n-tuples of elements of X.
Which advantages does it bring to identify the plane with the pairs of numbers (a,b)?
- In the first place, we know what they are, we need not axioms anymore to describe what points, lines, etc. are.
- Next, we transform geometric problems, involving complex constructions, into algebraic problems, which may be solved by computing.
Thus, Cartesian geometry is the ideal geometry for a computer, since we may reduce logical deduction and cumbersome constructions to numbers and computations. It is not by chance that 2D libraries which are used to draw on a computer screen, or on the window of an application, use coordinates to represent points (plus come attributes for colours, etc.).
For example let us ask: how is a curve represented in the Cartesian plane?
There are two ways:
- By means of a Cartesian equation, thus a single algebraic relations between the coordinates od any point belonging to the curve.
- By means of a parametric equation, which describes the algebraic expression of a generic point in the curve, depending on a real parameter.
For example consider the equation x2+y2 = 1 and the equation 2x+y = 0. The former represents a circumference centred in the origin and with radius 1, the latter a line passing through the origin.
Suppose we want to compute their intersections: on using Euclidean geometry we should find out that intersection by means of a geometric construction somehow, while according to the Cartesian geometry approach we are actually asking: which are the points whose coordinates (x,y) simultaneously satisfy the following relations: x2+y2 = 1 e 2x+y = 0?
The latter equation implies y = –2x, which substituted in the former equation provides x2+(–2x)2 = 1, thus x2+4x2 = 1 hence x2 = 1/5, which has two solutions x = 1/√5 and x = –1/√5. Therefore we’ve found the two required intersection points: (1/√5, –2/√5) and (–1/√5, 2/√5).
One could proceed in this way toward higher dimensions: thus, Cartesian space is the set of triples (x,y,z) of real numbers, thus the set R3.
Therefore, a point in the space is uniquely determined by its three Cartesian coordinates, which we may plug into equations to solve geometric problems after we translated them into algebraic ones.
The distance between two points
We have said that along with a line distance are computed by the absolute value of the difference. How are distances in the plane computed?
Interestingly there are different ways to do that, which are all generalizations of the distance on a line: let us the three such distances, which are all particular cases of the so-called Minkowski distances:
- Euclidean distance
- Manhattan (or taxi-cab) distance
- Uniform distance.
Euclidean distance between two points P = (x1,y1) and Q = (x2,y2) in the plane is computed by the following formula:
Notice that, if both P and Q belong to the x-axis, this distance reduces to the absolute difference of coordinates: indeed, on supposing that P = (x1,0) and Q = (x2,0) the previous formula becomes (as usual when we take the square root of a number we consider the positive one).
(as usual when we take the square root of a number we consider the positive one).
Euclidean distance should be actually called “Pythagorean distance”, since it may be inferred after Pythagoras’ theorem: indeed, let us consider a right triangle OPQ with a vertex in the origin, one cathetus on the x-axis and the other one parallel to the y-axis.
Therefore the coordinates of P will be (x,0) and the coordinates of Q will be (x,y): the distance between O and P is x and the distance between Q and P is y, so that, by Pythagoras’ theorem, d(O,Q)2 = d(O,P)2 + d(P,Q)2 = x2 + y2. Hence in this case, on taking the square root:
and the formula is proven. In the general case the reasoning is similar, but coordinates of O (which in our case are 0) have also to be taken into account.
Thus: Euclidean distance is just the distance between the end points of a linear segment, and it has a simple geometric description.
Distance among points in space is measured in the same way, just by adding one more coordinate to points: if P = (x1,y1,z1) and Q = (x2,y2,z2) are points in the Cartesian space then their Euclidean distance is:
For the sake of completeness we explain two more ways to compute distances in the plane, whose outcomes are different from Euclidean distance but which nevertheless are useful to measure when two points are near (a nice theorem which goes beyond the scope of this tutorial claims that as far as the “nearness” relation is concerned, all those distances are equivalent).
The first one is a common distance in Machine Learning, namely the “Manhattan” distance or “taxi-cab distance”:
Notice that, if both P and Q are on the x-line then this distance coincides with the absolute value of their difference. The “taxi-cab” name stems from the fact that this distance does not involve Pythagoras’ theorem, which uses the diagonal shortcut, but starting from P describes the shortest path toward Q set up by horizontal and vertical segments.
This is like the path followed by a taxi to reach the opposite point of a building block: the car cannot pass through the block but has to follow its border.
Another distance which may prove to be useful is the “uniform distance”, defined as
Thus this distance is computed by taking the maximum length of the segment given by the projection of the PQ segment on the x- and y-axes: again, if both points lie on the x-axis this distance equals the absolute value of the difference between point’s coordinates.
Given a distance, a (open) ball is the set of points whose distances are less or equal to a fixed one (its ray) from a fixed point O, the center: thus, if O is the center and r the ray then the ball contains all points such that d(P,O) < r. This terminology is self-apparent in the case of the Euclidean distance: as an exercise, try to figure out which subset of the Cartesian space is balls in the taxi-cab distance.