Author: Matteo Alberti
The aim of this first tutorial is to introduce the basic notions of dimensionality reduction from the mathematical point of view (spaces, subspaces, linear maps) and to retrieve the necessary elements of linear algebra (norms, isometry, isomorphism) for each machine learning algorithm.
Therefore, we are going to give a gentle introduction to what a linear reduction is. First of all, we need to identify the concept of “optimal” subspaces.
We have a set of k-dimensional vectors (n statistical units on k covariate) where we want to identify a p-dimensional subspace in and a linear map from to such that the relative images sufficiently approximate our k-dimensional input vectors (in reference to some metric we will define)
We rewrite as:
elements of algebra:
We define a vector space on R a set V, whose elements are called vectors , with the following properties:
There exists in V a defined vector addition operation that associates with each the vector
- Vector addition is commutative and associative
- There is a vector in V , set in 0 and defined as the origin
- Each vector has its opposite, indicated with –x t.c. + (-x)= 0
Exists in V a defined multiplication operation for the scalars which associates each and at each the vector so that:
- Multiplication between scalars is associative
It also applies:
- Scalar multiplication is distributive concerning vector addition
S is a non-empty subset of the subspace of V where for each and every linear combinations
Kernel and Image
Let V and W be two vector spaces and be a linear application.
The kernel of L is the set of V vectors whose image is the null vector of W.
This set is indicated with ker L.
The image of L is the set of the vectors of W which are images of some vector belonging to the domain V, that is:
A linear map between real vector spaces is a function with the following properties:
for every and .
We are going to define only the cases of main relevance:
Let be a linear map, then f is said:
- monomorphism if f is one-to-one
- epimorphism if f is onto
- isomorphism if f is bijective (injective + surjective)
- endomorphism if V = W
- automorphism if V = W and is bijective.
- rank equal to r if r = dim f (V)
Suppose at this point that we have identified a subspace (specified later) that is sufficiently “approximate” and is a base of (Note: the base is formed by k-dimensional vectors since is subspace of )
The linear map associates to the input the element of , having the following form:
The approximation error from then from to will consist of the appropriate vector norm:
Fixed our base our vectors can be found directly from the coordinates .
Given the map that associates with the generic vector its vector of coordinates and we compose with the map we obtain a map that associates the coordinates of their images with :
Through our map the k-dimensional input vectors are represented in p-dimensional vectors, i.e. as elements of .
At this point, it is immediate that we can proceed with further analysis in .
So let’s investigate how this reduction in dimensionality goes to keep of the original geometry (and to lose)
Going from k to p dimensions with involves a loss of information and a deformation of the original geometry of the data. Also w,e have to remeber that any kind of linear maps can not be an isomorphism or an isometry.
This means that all rules, scalar products and distances are not preserved
orthogonal complement of
Each vector can be decomposed into a sum of vectors (and a vector )
So, given two vectors we have:
Each vector can be decomposed into a sum of vectors
So, given two vectors we have:
As can be seen, the scalar product is not preserved.
Elements of algebra:
Let be a subspace of V, we define orthogonal complement of D in V indicating it with t the subset of V defined by:
That is a subset of all the vectors of orthogonal V to all the vectors of S
The linear map is an isomorphism the vector spaces and , this means that there will be no loss of information.
We consider the Euclidean product and suppose that the base is orthonormal, two vectors of with the relative coordinate vectors:
then results truly the following condition:
It means that the product scalar between h, w is identical to the scalar product of the respective initial coordinate vectors (specify that the two scalar products are defined in different spaces although they have been written in the same way)
If the base was not orthonormal the factors would not be reduced to 0.
In the first case, the scalar product among the coordinate vectors is given by: while in the second case it would be given by , with Gram matrix G it follows therefore that matrices built in the second way and associated with different bases are different from one another (having to readjust the calculation of the scalar products, norms, induced distances …)
Elements of algebra:
A base is defined as orthonormal when it is composed orthogonal vectors of unitary norms
We want to give a second vision to the dimensionality reduction based on the matrix approximation (what will be used in practice in the most of programming language)
Then we can rewrite it as:
The rows of θ are given by the linear combinations of the rows of B that is from our base, with the coefficients given by the rows of A, the coordinates on the chosen base.
Our problem dimensionality reduction, therefore, corresponds to identifying a vector subspace of dimension p (p <k) our chosen base (B) and of the relative coordinates given by the matrix A.
In data analysis, where our starting point is the data matrix, the different reduction techniques differ according to the type of approximation, decomposition and choice among the many possible bases.
Basic case: Decomposition in Singular Values (SVD)
Let’s implement in Python a simple decomposition in singular values (SVD) that is, let’s divide our starting matrix X into the two matrices A and B seen previously:
import numpy as np
X = np.array([3,2,4],[3,6,7],[2,1,4])
autoval, autovett = np.linalg.eig(X)