Understanding Principal Component Analysis and its Application in Data Science — Part 2

Learn the mathematical intuition behind PCA

np.random.seed(0)
mu = [2, 2]
Sigma = [[6, 4],
[4, 6]]
points = np.random.multivariate_normal(mu, Sigma, 150)
pca = PCA(n_components=2)
pca.fit(points)
pca.explained_variance_
array([10.24723443, 1.96099192])
# Listing 18
sigma = pca.singular_values_
sigma
array([39.07477357, 17.09350158])
M = 150
sigma ** 2 / (m — 1)
array([10.24723443, 1.96099192])
# Listing 20from sklearn.datasets import fetch_openml
Xtil, y = fetch_openml('mnist_784', version=1, return_X_y=True)
print(Xtil.shape)
print(y.shape)
(70000, 784)
(70000,)
plt.imshow(Xhat[0].reshape(28, 28), cmap=’gray’)
plt.show()
Figure 21
Figure 22
y[0]
'5'
Xhat /= 255
pca = PCA().fit(Xhat)
len(pca.explained_variance_[pca.explained_variance_ <= 1e-15])
71
Figure 23
Figure 24
coordinates = pca.transform(Xhat)
Figure 25
# Listing 23
plt.plot(range(1, 785), np.cumsum(pca.explained_variance_ratio_), marker=”o”)
plt.xlabel(‘Number of components’, fontsize=14)
plt.ylabel(‘Explained variance ratio’, fontsize=14)
plt.show()
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
np.random.seed(0)
mu = [2, 2]
Sigma = [[6, 4],
[4, 6]]
points = np.random.multivariate_normal(mu, Sigma, 150)
np.round(np.cov(points.T), 2)
array([[5.82, 4.13],
[4.13, 6.39]])
points = np.random.multivariate_normal(mu, Sigma, 30000)
np.round(np.cov(points.T), 2)
array([[6.03, 4.02],
[4.02, 5.96]])
Figure 35
Figure 36
Figure 37
Figure 38
Figure 39
Figure 40
Figure 41
Figure 42
Figure 43
Figure 44
Figure 45

Data Scientist and Researcher. LinkedIn: https://www.linkedin.com/in/reza-bagheri-71882a76/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store