# Linear Dimensionality Reduction: Principal Component Analysis

Author: Matteo Alberti

Among all tools for the linear reduction of dimensionality PCA or Principal Components Analysis is certainly the main tools of Statistical Machine Learning.

Although we focus very often on non-linearity, the analysis of the principal components is the starting point for many analysis (also the core of preprocessing), and their knowledge becomes imperative in case the conditions on linearity are satisfied.

In this tutorial we are going to introduce at the mathematics level the extraction of PC, their implementation with python but above all their interpretation.

This is done by dividing the total variance into an equal number of starting variables than it will be possible to reduce the number based on the contribution that each Principal Component provides in the construction of our total variance.

We would like to remind you that the application of the PCA is useful when the starting variables are not independent

Let’s introduce them to the correct mathematical formalism:

Given a set of p quantitative variables X1, X2,. . . , Xp (centred or standardised variables) we want to determine a new set of k variables t.c k≤p indicated with Ys (s = 1,2, … k) that have the following properties:

uncorrelated, reproduce the largest possible portion of remaining variance following the construction of the first s-1 components (increasing order) and average equal to zero.

As a result, the linear combination will be:

We must, therefore, find the coefficients **v** that satisfy these constraints. This is a problem of maximum constraint where the first is called Normalization:

Our system becomes:

Where Variance can be written as:

And we can solve with Lagrange multiplier:

Calculate the gradient of L1 and its annulment:

The system : admits infinite solutions (which respect the constraint) by lowering the rank of the system coefficient matrix

which correspond to λs called eigenvalues of Σ.

Similarly, for the construction of the second PCA (and so for all the others) the Orthogonality Constraint replaces our system, given by our request that the PCs be uncorrelated, expressed as follows:

Than by setting the function of Lagrange in p + 2 variables:

From which we obtain the second eigenvalue (Y2) where we remember the following property:

Principal Component proprieties:
Each eigenvalue of Σ has a role in the variance of the respective PC positive semidefinite Total Variance Generalized Variance |

SELECTION CRITERIA:

For the choice of the number k (with k <p) of PC to be maintained in analysis there is no universally accepted and valid criterion. It is therefore good practice to use them jointly and always keep the needs of the analysis in mind. We want to expose the main ones:

(1)

Cumulative percentage of total variance reproduced

absolute misure Normalization of Var % cumulative

set a threshold T on the cumulative percentage and keep the first k PC in the analysis that guarantees the exceeding of the threshold

(2)

Screen-Plot

It represents the eigenvalues concerning the order number of the component

Where the first k PC is selected based on the slope reduction. In this specific case, the PCs to be kept in analysis would be the first two.

(3)

Kaiser Criterion

Eigenvalue criterion greater than 1 (valid only for standardized variables)

Let’s go to implement with Python:

We have to import the necessary packages from scikit-learn

import numpy as np from sklearn.decomposition import PCA

The class has the following attributes:

Sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None)

We want to comment the main parameters:

- n_components = number of components to be analyzed.
- svd_solver = gives us some of the main alternatives. We want to remember that PCA does not support sparse data (for which you will need to load TruncatedSVD)

We are going to test it on new real data, we predict for example on the Wine data that can be imported through the script:

from sklearn import datasets import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import decomposition

Then

np.random.seed(123) wine = datasets.load_wine() X = wine.data y = wine.target fig = plt.figure(1, figsize=(5, 4)) plt.clf() ax = Axes3D(fig, rect=[1, 0, 1, 0.9], elev=30, azim=222) plt.cla() pca = decomposition.PCA(n_components=None) pca.fit(X) X = pca.transform(X) for name, label in [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]: ax.text3D(X[y == label, 0].mean(), X[y == label, 1].mean() + 1.5, X[y == label, 2].mean(), name, bbox=dict(alpha=.5, edgecolor='b', facecolor='w')) # Reorder the labels to have colors matching the cluster results y = np.choose(y, [1, 2, 0]).astype(np.float) ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.spectral, edgecolor='r') ax.w_xaxis.set_ticklabels([]) ax.w_yaxis.set_ticklabels([]) ax.w_zaxis.set_ticklabels([]) plt.show()

This will our result: