Understanding Principal Component Analysis and its Application in Data Science — Part 1

Learn the mathematical intuition behind PCA

Reza Bagheri
56 min readFeb 7, 2021

Principal Component Analysis (PCA), is a method used to reduce the dimensionality of large datasets. We will also study the covariance matrix and the multivariate normal distribution in detail since understanding them will result in a better understanding of PCA. The Python scripts in this article show you how PCA can be implemented from scratch and using the Scikit Learn library.

Notation

Currently Medium supports subscripts and superscripts only for some characters. So to write the name of the variables, I use this notation: Every character after ^ is a superscript character and every character after _ (and before ^ if it is present) is a subscript character. For example

is written as P_B^T in this notation. In this article, bold-face italic lower-case letters (like a) refer to vectors. Bold-face italic capital letters (like A) refer to matrices, and italic lower-case letters (like a) refer to scalars. Capital letters (like X) refer to random variables and bold-face capital letters (like X) refer to both random vectors. The numbers above an equality sign like refer to the number of the equations that were used to derive the expression on the…

--

--

No responses yet