Posts

A Tutorial on Linear Algebra and Geometry ( Part 2)

 Author: Paolo Caressa

 

Points, vectors and their algebra

So far we dealt with points as identified with pairs (or triples for space) of real numbers: as far as Machine Learning is concerned, we are interested infinite sets of points, which represent particular objects to classify or phenomena to correlate as points in a Cartesian space. However, to find out some regular behaviour or pattern for those points, often we are interested in lines, planes and also more complex geometric figures, as conics or quadrics.

We have already seen, by an example, that some geometric figures may also be represented by single equations or systems of equations. Let us bound ourselves to lines and planes to understand in general how to represent them since often those representations are used in Machine Learning.

 

A line is, intuitively speaking, a set of collinear points: one of Euclid’s axioms claims that given two distinct points there is exactly one line passing through them so that to identify uniquely a line two of its points suffice.

To understand how to do that in the Cartesian plane, let us consider two points P ≠ Q, whose coordinates are P=(x1,y1) e Q=(x2,y2). Let us define the vector from P to Q as the pair of real numbers

 

foto13

 

 

Thus a vector is by definition the difference between two points, whereby “difference” we mean the coordinate-wise difference.

 

From the IT point of view, we may imagine a point as an array (or list) with two elements (which are numbers). Usually an array A = [x,y] is indexed starting by 0, thus A[0] = x and A[1] = y. Then, if A and B are arrays, the vector   \overline{AB}  may be represented by the array [B[0] – A[0], B[1] – A[1]].

 

 

Why do we call “vector” a difference of points? Because it is an object which has direction, sense and magnitude, arrays as vectors used in Physics.

Its direction is just the line passing through P and Q; its orientation is from P toward Q (the opposite vector would be    \overline{QP}  which runs from Q toward P: which are its coordinates?)

The magnitude is the distance between points P and Q. Therefore a vector determines a line, and a sense on it: of course there are infinite vectors sharing the same direction (the same line).

 

For example, the line passing through  P= (0,1)  and  Q= (2,-1)  is determined by the vector , but also by the vector  \overline{PQ} which has the same direction but different magnitude.

In general, for each number, a ≠ 0 the vector  (-a,a)  induces this line. We say that two vectors are paralleli if one is multiple of the other one by a non-zero scale factor.

Indeed, another way to determine a line is to pick one of its points   P and to provide a vector   \overline{V} so to be able to express all the line’s points as translated by vectors parallel to \overline{V}.

 

foto 14

 

This is called the parametric equation of the line since it expresses the generic point  on the line via a parameter  which varies through all real numbers.

Notice that we used the operation of the sum between a point and a vector, defined as follows:

 

foto 15

 

 

Although  vectors are useful to describe lines (but also planes and, in higher dimension, “hyperplanes”), they are interesting in themselves because of their algebraic properties, such as:

  1. Vectors may be added to get new vectors according to the rule

foto 16

 

 

foto 17

By a vector space, we mean a set of elements equipped with two operations, sum between elements and multiplication by a number, which satisfies the previous properties shared by vectors in the Cartesian plane as defined as differences of points.

Actually, in Machine Learning, one always uses finite-dimensional vector spaces whose vectors are expressed in coordinates, thus vectors are identified with n-tuples of numbers (their coordinates). This is a major element of confusion for beginners since both points and vectors in the Cartesian space are represented as a pair of numbers.

 

foto 18

 

Actually, in Machine Learning, one always uses finite-dimensional vector spaces whose vectors are expressed in coordinates, thus vectors are identified with n-tuples of numbers (their coordinates). This is a major element of confusion for beginners since both points and vectors in the Cartesian space are represented as a pair of numbers.

However, even if they are represented in the same way, points and vectors are conceptually different objects. A point identifies a single point in the space, while a vector identifies a displacement along a certain direction, with given sense and magnitude: points and vectors may be put in one-to-one correspondence as follows: To a point P it is associated the vector \overline{OP}  which starts from the origin and ends in P; to a vector  \overline{V} instead, we may associate the point 0 +\overline{V} .

 

 

This distinction between points and vectors is often overlooked but t is nevertheless important, also because it helps in understanding application of the theory: for example, in the following picture some words of a document corpus are represented, mapped by means of a Machine Learning algorithm to points in the plane:

 

FOTO 19

 

These are points representing words. What about vectors in this case?

Let us take the vector which displaces “pizza” to “Italy” and let us apply it to the point “sushi”: we vet the point “Japan” (up to a certain approximation). We infer that that vector represents an answer to a question: “given a national dish which is the corresponding nation?”.

In this case, it is clear that points and vectors represent distinct concepts.

 

 

FOTO 20

 

By applying the “arccosine” function, it is possible to compute this angle starting from the scalar product between two vectors and from their lengths: the geometrical meaning of angle for this quantity is explained via the trigonometric interpretation of the cosine function.

 

 

Recall that  cos\theta   is a number greater or equal to -1  and less or equal  1  to , such that:

  • If the two vectors have the same direction and the same sense then the cosine is equal to 1.
  • If the two vector has the same direction ut opposite since then the cosine is equal to -1 .
  • If the cosine is equal to zero, the two vectors are said to be orthogonal.

 

foto 21

Therefore, while the distance measures the nearness of two points, which the more are near the more their distance is zero, the cosine measures the similarity among directions and senses of the two vectors, so that the more the two vectors are aligned on the same line the more the absolute value of their cosine is near to one.

This cosine similarity is often employed in Machine Learning to classify objects in vector spaces.

 

The N-dimensional Cartesian space

So far, in the crash course on Cartesian geometry and linear algebra, we bounded ourselves to the case of dimension N=2 and N=3, so to be able to draw pictures and to develop the concept in a somewhat familiar environment: but in real life applications, the dimension of vector spaces may be quite high.

In those cases, we have to give up to geometric intuition, although all formulas and concept developed so far generalize trivially to the case of higher dimensions.

Indeed such formulas depend on sum and subtractions of points coordinates and vector components: if those coordinates are 2 or 20,000 it makes no real difference. Better still, all the theory is easily and efficiently implemented on a computer, which may deal with points and vectors in spaces of high dimensions without problems, since they are represented as arrays of numbers.

For example, let us consider the concept of a hyperplane: in the plane, this coincides with the concept of line and in the space with the concept of a plane.

Let us fix a dimension N and let us consider the Cartesian space RN.

A line in this space is, as in dimension 2, determined by a pair of points, or by a point and a vector: the parametric equation is just the same as in the dimension 2 case.

If N > 2 we may also consider parametric equations of the following form:

 

foto 22

In this case, we have two parameters which vary independently, so that, intuitively, the set of points X which satisfy this equation, when a and b vary in all real numbers, corresponds to the set of pairs (a,b) thus to the plane. In particular, it is a bidimensional object.

 

Actually this is not always true: for example if  \overline{v}= \overline{w} then the parametric equation becomes

foto 23

which actually describes a line.

Therefore, if we write a parametric equation with many parameters, the dimension of the set of points described by this equation depends on the relations between the vectors which appear in the parametric equation itself. In the previous case, if vectors   \overline{v}  and \overline{w}  are parallel then the equation represents a line and not a plane.

If in general, in the N-dimensional space, we write a parametric equation with N – 1 parameters, we get:

 

foto 24

 

foto 25

Matrices and their algebra

A major feature of linear algebra is the efficiency and universality of its numerical methods. Actually, it suffices to implement a single algorithm (or one of its variants), namely Gauss’ elimination, to be able to do practically everything in an effective way (solving equations, checking linear independence, etc.). These algorithms are available in each standard numerical computation library, such as Python’s numpy.linalg.

To close this tutorial (already far too long) it is worth to introduce the key notion which is involved in each of those algorithms, and which is also crucial in the conceptual developments of linear algebra: the concept of a matrix.

A matrix is just a table of numbers, and it may be considered as a bidimensional array. More formally, an n×m matrix is a table of numbers which are singly addressed by means of two indexes, i and j, where the first one addresses the row and the second one addresses the column. At the crossing of row I and column j, there is the number  which is pointed by those indexes (notice that in mathematics indexes usually runs from 1 on, not from 0 on as in computer science).

The tabular representation of a matrix is as follows:

foto 26

A matrix such as n = m is said to be a square matrix.

In practice, a matrix is just a vector of length nm whose elements are displayed by rows and not as a single sequence. However, this notation change is fundamental to use those objects.

In particular, matrices enrich vector algebra with a new operation, a multiplication. In the first place let us notice that we may add matrices and multiply them by a single number to get again matrices of the same type:

foto 27

 

Therefore, n×m matrices do form a vector space of dimension nm.

Recall that a vector may be multiplied by a number, to get a new vector and that two vectors may be multiplied to get a number (via the dot product). But we do not know how to multiply, say, a vector by itself to get a new vector.

If we write vector as matrices we can actually multiply them: indeed we may distinguish two kinds of vectors when written in matrix form, row vectors and column vectors. A row vector is a sequence of numbers, we have already met them, for example (1,2,0).

 

A column vector is a vector written from top to bottom as in

 

foto 28

 

At first sight, this is just a matter of notation, but actually, if we interpret a vector as a particular kind of matrix, a row vector is a 1×N matrix, while a column vector is an N×1 vector.

Now, given an n×m matrix A and n×r matrix B, we may multiply A by B to get a new n×r matrix. The entry in the matrix AB addressed by indexes i and j is defined as:

 

foto 29

 

Notice that this is the dot product of the row vector given by the i-th row in A by the column vector given by the j-th column in B.

Example: let us multiply a 2×3 matrix times a 2×3 matrix:

 

 

foto 30

 

Now we come back to vectors: we may multiply a row vector times a column vector to get a 1×1 matrix (which is just a number) and this is the dot product. But we can also multiply an N×1 column vector times a 1×N row vector as matrices and get an N×N matrix, as in:

 

foto 31

 

However in this way, we multiply two vectors belonging to N-dimensional spaces and we get a vector in a vector space with a different dimension, either 1 or N×N.

The identity matrix is the square matrix whose entries are zero but for the one on the diagonal which are 1 (diagonal elements in a matrix are the ones with the row index equal to the column index: a_i_i ). For example the 3,,×3 identity matrix is

 

foto 32

 

As the name suggests, on multiplying a matrix A times the identity matrix still we get A. Moreover, the matrix product is both associative and distributive with respect to matrix sum.

However, matrix algebra displays a particular and interesting feature: the matrix product is not commutative in general. Thus AB is generally different from BA (indeed BA may well be meaningless, for example, if n m).

For example:

foto 33

 

 

Another typical operation is the multiplication of an n×m matrix times a column vector with m components: the result is a column vector with n components.