Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.


Prerequisites

ConceptsImportanceNotes
Linear algebraNecessary
  • Time to learn: 10


Imports

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

Generate data

sns.set_theme(style="whitegrid", context="notebook", font_scale=1.2)
rng = np.random.default_rng(7)

center = np.array([2.0, 1.0])
scales = np.array([2.0, 0.6])
angle = np.pi / 4

rotation = np.array(
    [[np.cos(angle), -np.sin(angle)],
     [np.sin(angle), np.cos(angle)]]
)

n_samples = 10_000
cloud = rotation @ np.diag(scales) @ rng.standard_normal((2, n_samples)) + center[:, None]


plt.figure(figsize=(5, 5))
plt.plot(cloud[0], cloud[1], "o", markersize=2, alpha=0.5, color='k')
plt.plot(center[0], center[1], "o", color="red", markersize=3, label="Center")
plt.legend()
plt.axis("equal")
plt.title("2D Gaussian Cloud");
<Figure size 500x500 with 1 Axes>

Analysis

Remove mean:

mean = cloud.mean(axis=1, keepdims=True)
detrended_cloud = cloud - mean

Construct covariance matrices

cov_matrix_1 = (detrended_cloud @ detrended_cloud.T) / n_samples
cov_matrix_2 = (detrended_cloud.T @ detrended_cloud) / n_samples

print("Covariance matrix 1 shape:", cov_matrix_1.shape)
print("Covariance matrix 2 shape:", cov_matrix_2.shape)
Covariance matrix 1 shape: (2, 2)
Covariance matrix 2 shape: (10000, 10000)

Since matrix 1 is much smaller, we will find its eigenvalues.

eigen_val, eigen_vect = np.linalg.eig(cov_matrix_1)
eigen_val
array([3.95439111, 0.35316582])

Visualize results

eigen_vect
array([[ 0.70990989, -0.70429252], [ 0.70429252, 0.70990989]])
fig, (ax_left, ax_right) = plt.subplots(1, 2, figsize=(10, 4),
                                        sharex=True, sharey=True)

ax_left.scatter(cloud[0], cloud[1], s=2, c="black", alpha=0.35)
ax_left.set_title("Generated cloud")

ax_right.scatter(cloud[0], cloud[1], s=2, c="black", alpha=0.35)
ax_right.set_title("PCA axes")

for direction, length in zip(eigen_vect.T, eigen_val):
    endpoint = mean[:, 0] + direction * np.sqrt(length)
    ax_right.arrow(
        mean[0, 0],
        mean[1, 0],
        endpoint[0] - mean[0, 0],
        endpoint[1] - mean[1, 0],
        color="magenta",
        linewidth=1.5,
        length_includes_head=True,
        head_width=0.15,
        head_length=0.2,
        zorder=5
    )

for ax in (ax_left, ax_right):
    ax.plot(center[0], center[1], "o", color="red", markersize=3, label="Center")
    ax.set_aspect("equal")
    ax.set_xlim(-6, 8)
    ax.set_ylim(-6, 8)
<Figure size 1000x400 with 2 Axes>

Summary

TBD

What’s next?

TBD

Resources and references

TBD