Understanding Principal Component Analysis and its Application in Data Science — Part 1
Learn the mathematical intuition behind PCA
Principal Component Analysis (PCA), is a method used to reduce the dimensionality of large datasets. We will also study the covariance matrix and the multivariate normal distribution in detail since understanding them will result in a better understanding of PCA. The Python scripts in this article show you how PCA can be implemented from scratch and using the Scikit Learn library.
Notation
Currently Medium supports subscripts and superscripts only for some characters. So to write the name of the variables, I use this notation: Every character after ^ is a superscript character and every character after _ (and before ^ if it is present) is a subscript character. For example
is written as P_B^T in this notation. In this article, bold-face italic lower-case letters (like a) refer to vectors. Bold-face italic capital letters (like A) refer to matrices, and italic lower-case letters (like a) refer to scalars. Capital letters (like X) refer to random variables and bold-face capital letters (like X) refer to both random vectors. The numbers above an equality sign like refer to the number of the equations that were used to derive the expression on the…