Icone color1 02

Introduction to Dimensionality Reduction and Linear Algebra basics (part 2)

Author : Matteo Alberti

                     

Sommario

Dimensionality Reduction in a linear space. 1

Through the identification of a subspace. 1

Reduction through matrix approximations. 1

Basic case: Decomposition in Singular Values (SVD). 1

Matricial Norms. 1

Vectorial Norms. 1

Induced Norms. 1

Schattern Norms. 1

Frobenius Norms. 1

Base Cases: Cluster Analysis. 1

Definition of a Metrics. 1

Minkowski Distance (Manhattan, Euclidean, Lagrange).

 

 

Matricial Norms

 

At this point we have set the problem of dimensionality reduction of data as an approximation problem between matrices, we must now evaluate and then calculate the distance between the matrix of the original and the approximate data through the study of the different standards:

There are three main types of rules:

  • Vector norms
  • Induced norms
  • Schatten norms

Where we essentially refer, exceptions excluded, to the Frobenius norm (Euclidean distance)

 

Elements of algebra:

Norm

A standard (commonly marked with ‖ ∙ ‖) is a function from the vector space fo matrix if:

 

ecuation 18

Vectorial Norms

 

The vector norms family treats the  array as a vector of nk components where we can define the norm using any of the following rules:

 

ecuation 19

 

 

 

Note:

Setting p = 2 we are connected to the Euclidean norm

 

Induced Norms

 

An  X_n_x_k  matrix can be seen as a linear operator from   R^k\mapsto R^n .

Measuring in R^k  the lengths with a fixed norm and doing the same in  R^n , with a different norm, we can go to measure how  much X lengthens or shortens a vector   v \in  R^k , comparing the norm of v with the relative norm of his image Xv.

The induced norm  \left \| X \right \|_k  _n  is defined as:

 

ecuation 20

 

Schatten Norms

 

The Schatten norm, of order p, of an X matrix, is simply given by:

 

ecuation 21

 

Where  w_i are singular values

 

Frobenius Norms

 

The Frobenius norm of our  X_n_x_k  matrix is given by:

 

ecuation 22

 

Explicating the matrix product we obtain:

 

ecuation 23

 

It corresponds that the Frobenius norm is equal to the sum of square roots of the square.

A Euclidean norm is seen as a vector coincides with the vector rule of X of order 2.

 

Elements of algebra:

Trace

 

 

The trace operator, indicated by Tr (∙), is defined as the sum of the diagonal elements of the argument matrix

 

 

Base Cases: Cluster Analysis

 

Cluster analysis is a multivariate analysis technique through which it is possible to group statistical units, to minimize the internal “logical distance” to each group and to maximize the one between the groups.

It is among the unsupervised learning techniques.

It is therefore spontaneous to have to define what is meant by logical distance and based on which metric.

 

 

 

Definition of a Metrics

 

ecuation 24

If it contrarily enjoys only the first three properties, we can define it as an index of the distance

 

Minkowski Distance (Manhattan, Euclidean, Lagrange)

 

At this point we are going to analyze the main cases of distances belonging to the Minkowski distances family where:

ecuation 25

 

We highlight the following cases:

 

We highlight the following cases:

  • k=1  Manhattan distance
  •  k=2 Euclidean distance
  • k\to \infty  Lagrangian distance (Čebyšëv)

As we can see:

 

ecuation 26

 

 

Therefore, starting with the example of Cluster Analysis, it is essential to define the type of distance with which we want to deal with our analysis.

Mainly in the packages already implemented are the three variants of Minkowski distances (for quantitative variables)

 

Importing from sklearn:

AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, memory=None, connectivity=None, compute_full_tree=’auto’, linkage=’ward’

 

ecuation 27

ecuation 27