Icone color1 02

Introduction to Dimensionality Reduction and Linear Algebra basics (Part 1)

Author: Matteo Alberti




Dimensionality Reduction in a linear space. 1

Through the identification of a subspace. 1

Reduction through matrix approximations. 1

Basic case: Decomposition in Singular Values (SVD). 1

Matricial Norms. 1

Vectorial Norms. 1

Induced Norms. 1

Schattern Norms. 1

Frobenius Norms. 1

Base Cases: Cluster Analysis. 1

Definition of a Metrics. 1

Minkowski Distance (Manhattan, Euclidean, Lagrange). 1



Dimensionality Reduction in a linear space


The aim of this first tutorial is to introduce the basic notions of dimensionality reduction from the mathematical point of view (spaces, subspaces, linear maps) and to retrieve the necessary elements of linear algebra (norms, isometry, isomorphism) for each machine learning algorithm.


Therefore, we are going to give a gentle introduction to what a linear reduction is. First of all, we need to identify the concept of “optimal” subspaces.


We have a set of k-dimensional vectors   x_1,..,x_n (n statistical units on k covariate) where we want to identify a p-dimensional subspace  V_p R^k in  and a linear map  \varphi (.)  from R^k   to  V_p such that the relative images  \varphi(x_i) sufficiently approximate our k-dimensional input vectors (in reference to some metric we will define)

We rewrite as:


ecuation 1

elements of algebra:

Vector space


We define a vector space on R a set V, whose elements are called vectors , with the following properties:

There exists in V a defined vector addition operation that associates with each   x,y\epsilon V the vector x+y\epsilon V

  • Vector addition is commutative and associative
  • There is a vector in V , set in 0 and defined as the origin
  • Each vector x\epsilon V has its opposite, indicated with –x t.c. + (-x)= 0

Exists in V a defined multiplication operation for the scalars which associates each x\epsilon V  and at each a\epsilon V  the vector  ax\epsilon V so that:

  • Multiplication between scalars is associative
  • 1x= x, \forall x \in V

It also applies:

  • Scalar multiplication is distributive concerning vector addition


Vector subspace

S is a non-empty subset of the subspace of V where for each  x-x= 0\in S  and every linear  combinations ax+\beta y\in S



Kernel and Image

Let V and W be two vector spaces  L: V\rightarrow W and be a linear application.

The kernel of L is the set of V vectors whose image is the null vector of W.

This set is indicated with ker L.


ecuation 2

The image of L is the set of the vectors of W which are images of some vector belonging to the domain V, that is:

ecuation 3

Linear Map

A linear map  f:V\mapsto W  between real vector spaces is a function with the following properties:



for every   \overline{u}, \overline{x} \in V  and  \lambda \in R 

We are going to define only the cases of main relevance:

Let  f: v\mapsto W  be a linear map, then f is said:

  • monomorphism if f is one-to-one
  • epimorphism if f is onto
  • isomorphism if f is bijective (injective + surjective)
  • endomorphism if V = W
  • automorphism if V = W and is bijective.
  • rank equal to r if r = dim f (V)

Suppose at this point that we have identified a subspace (specified later) v_p  that is sufficiently “approximate” and is  b_1,..., b_p a base  of   V_p (Note: the base is formed by k-dimensional vectors since is subspace of  R^k )

The linear map \varphi (.) associates to the input   the element    \varphi= (x_i) of  v_p , having the following form:

ecuation 5

a_i_j chosen appropriately.

The approximation error from  then from  to  will consist of the appropriate vector norm:

ecuation 6

Fixed our base  b_1,...,b_p  our vectors  \varphi(x_i) can be found directly from the coordinates  a_i_j .

Given the map  \varphi(.) that associates with the generic vector  \varphi(x_i) its vector of coordinates a_i_j and we compose with the map  \varphi (.) we obtain a map that associates the coordinates of their images with :


ecuation 7


ecuation 8

Through our map the k-dimensional input vectors are represented in p-dimensional vectors, i.e. as elements of R^p .

At this point, it is immediate that we can proceed with further analysis in R^p .


So let’s investigate how this reduction in dimensionality goes to keep of the original geometry (and to lose)

ecuation 8 1

ecuation 10

Going from k to p dimensions with   involves a loss of information and a deformation of the original geometry of the data. Also w,e have to remeber that any kind of linear maps can not be an isomorphism or an isometry.

This means   p\lt k that all rules, scalar products and distances are not preserved




v_p dimensional subspace


ecuation 11 1

orthogonal complement of R^k


Each vector can be decomposed into a sum of vectors (and a vector )

So, given two vectors  we have:


Each vector  x \in R^k  can be decomposed into a sum of vectors

ecuation 12

So, given two vectors  u,v , \in R^k  we have:

ecuation 13

As can be seen, the scalar product is not preserved.



Elements of algebra:

orthogonal complement

Let   S \subseteq V  be a subspace of V, we define orthogonal complement of D in V indicating it with tS^\bot   the subset of V defined by:


ecuation 14

That is a subset of all the vectors of orthogonal V to all the vectors of S

v_p\mapsto R^p


The linear map  \varphi\left ( . \right )  is an isomorphism the vector spaces  V_p  and  R^p  , this means that there will be no loss of information.

We consider the Euclidean product and suppose that the base  b_1 ....b_p   is orthonormal,  h,w  two vectors of  V_p  with the relative coordinate vectors:


ecuation 15

then results truly the following condition:


ecuation 16


It means that the product scalar between h, w is identical to the scalar product of the respective initial coordinate vectors (specify that the two scalar products are defined in different spaces although they have been written in the same way)


If the base was not orthonormal the factors  \left ( b_i...b_j \right )   would not be reduced to 0.

In the first case, the scalar product among the coordinate vectors is given by:  \left [ a^h \right ]^t I  a^w  while in the second case it would be given by  \left [ a^h \right ]^t G a^w  , with Gram matrix G it follows therefore that matrices built in the second way and associated with different bases are different from one another (having to readjust the calculation of the scalar products, norms, induced distances …)





Elements of algebra:



A base is defined as orthonormal when it is composed orthogonal vectors of unitary norms


Reduction through matrix approximations


We want to give a second vision to the dimensionality reduction based on the matrix approximation (what will be used in practice in the most of programming language)



imagen matriz 1

Then we can rewrite it as:


ecuation 17

The rows of θ are given by the linear combinations of the rows of B that is from our base, with the coefficients given by the rows of A, the coordinates on the chosen base.

Our problem dimensionality reduction, therefore, corresponds to identifying a vector subspace of dimension p (p <k) our chosen base (B) and of the relative coordinates given by the matrix A.


In data analysis, where our starting point is the data matrix, the different reduction techniques differ according to the type of approximation, decomposition and choice among the many possible bases.





Basic case: Decomposition in Singular Values (SVD)


Let’s implement in Python a simple decomposition in singular values (SVD) that is, let’s divide our starting matrix X into the two matrices A and B seen previously:

import numpy as np

X = np.array([3,2,4],[3,6,7],[2,1,4])

autoval, autovett = np.linalg.eig(X)