# Introduction to Dimensionality Reduction and Linear Algebra basics (Part 1)

Author: Matteo Alberti

Sommario

Dimensionality Reduction in a linear space. 1

Through the identification of a subspace. 1

Reduction through matrix approximations. 1

Basic case: Decomposition in Singular Values (SVD). 1

Matricial Norms. 1

Vectorial Norms. 1

Induced Norms. 1

Schattern Norms. 1

Frobenius Norms. 1

Base Cases: Cluster Analysis. 1

Definition of a Metrics. 1

Minkowski Distance (Manhattan, Euclidean, Lagrange). 1

# Dimensionality Reduction in a linear space

The aim of this first tutorial is to introduce the basic notions of dimensionality reduction from the mathematical point of view (spaces, subspaces, linear maps) and to retrieve the necessary elements of linear algebra (norms, isometry, isomorphism) for each machine learning algorithm.

Therefore, we are going to give a gentle introduction to what a linear reduction is. First of all, we need to identify the concept of “optimal” subspaces.

We have a set of k-dimensional vectors    (n statistical units on k covariate) where we want to identify a p-dimensional subspace    in  and a linear map    from    to   such that the relative images   sufficiently approximate our k-dimensional input vectors (in reference to some metric we will define)

We rewrite as:

elements of algebra:

Vector space

We define a vector space on R a set V, whose elements are called vectors , with the following properties:

There exists in V a defined vector addition operation that associates with each    the vector

• Vector addition is commutative and associative
• There is a vector in V , set in 0 and defined as the origin
• Each vector  has its opposite, indicated with –x t.c. + (-x)= 0

Exists in V a defined multiplication operation for the scalars which associates each   and at each   the vector   so that:

• Multiplication between scalars is associative

It also applies:

• Scalar multiplication is distributive concerning vector addition

Vector subspace

S is a non-empty subset of the subspace of V where for each    and every linear  combinations

Kernel and Image

Let V and W be two vector spaces   and be a linear application.

The kernel of L is the set of V vectors whose image is the null vector of W.

This set is indicated with ker L.

The image of L is the set of the vectors of W which are images of some vector belonging to the domain V, that is:

Linear Map

A linear map    between real vector spaces is a function with the following properties:

for every     and

We are going to define only the cases of main relevance:

Let    be a linear map, then f is said:

• monomorphism if f is one-to-one
• epimorphism if f is onto
• isomorphism if f is bijective (injective + surjective)
• endomorphism if V = W
• automorphism if V = W and is bijective.
• rank equal to r if r = dim f (V)

Suppose at this point that we have identified a subspace (specified later)   that is sufficiently “approximate” and is   a base  of    (Note: the base is formed by k-dimensional vectors since is subspace of   )

The linear map  associates to the input   the element     of   , having the following form:

chosen appropriately.

The approximation error from  then from  to  will consist of the appropriate vector norm:

Fixed our base    our vectors   can be found directly from the coordinates   .

Given the map   that associates with the generic vector   its vector of coordinates  and we compose with the map   we obtain a map that associates the coordinates of their images with :

Through our map the k-dimensional input vectors are represented in p-dimensional vectors, i.e. as elements of  .

At this point, it is immediate that we can proceed with further analysis in  .

So let’s investigate how this reduction in dimensionality goes to keep of the original geometry (and to lose)

Going from k to p dimensions with   involves a loss of information and a deformation of the original geometry of the data. Also w,e have to remeber that any kind of linear maps can not be an isomorphism or an isometry.

This means    that all rules, scalar products and distances are not preserved

Example:

dimensional subspace

orthogonal complement of

Each vector can be decomposed into a sum of vectors (and a vector )

So, given two vectors  we have:

Each vector    can be decomposed into a sum of vectors

So, given two vectors    we have:

As can be seen, the scalar product is not preserved.

Elements of algebra:

orthogonal complement

Let     be a subspace of V, we define orthogonal complement of D in V indicating it with t   the subset of V defined by:

That is a subset of all the vectors of orthogonal V to all the vectors of S

The linear map    is an isomorphism the vector spaces   and    , this means that there will be no loss of information.

We consider the Euclidean product and suppose that the base     is orthonormal,    two vectors of    with the relative coordinate vectors:

then results truly the following condition:

It means that the product scalar between h, w is identical to the scalar product of the respective initial coordinate vectors (specify that the two scalar products are defined in different spaces although they have been written in the same way)

If the base was not orthonormal the factors     would not be reduced to 0.

In the first case, the scalar product among the coordinate vectors is given by:    while in the second case it would be given by      , with Gram matrix G it follows therefore that matrices built in the second way and associated with different bases are different from one another (having to readjust the calculation of the scalar products, norms, induced distances …)

Elements of algebra:

orthonormal:

A base is defined as orthonormal when it is composed orthogonal vectors of unitary norms

## Reduction through matrix approximations

We want to give a second vision to the dimensionality reduction based on the matrix approximation (what will be used in practice in the most of programming language)

GIven:

Then we can rewrite it as:

The rows of θ are given by the linear combinations of the rows of B that is from our base, with the coefficients given by the rows of A, the coordinates on the chosen base.

Our problem dimensionality reduction, therefore, corresponds to identifying a vector subspace of dimension p (p <k) our chosen base (B) and of the relative coordinates given by the matrix A.

In data analysis, where our starting point is the data matrix, the different reduction techniques differ according to the type of approximation, decomposition and choice among the many possible bases.

## Basic case: Decomposition in Singular Values (SVD)

Let’s implement in Python a simple decomposition in singular values (SVD) that is, let’s divide our starting matrix X into the two matrices A and B seen previously:

import numpy as np

X = np.array([3,2,4],[3,6,7],[2,1,4])

autoval, autovett = np.linalg.eig(X)