We will use the benchmarked Python libraries for support vector machines (SVM) to illustrate parts 1-2.
from sklearn import svm
Support vector machines (SVMs) are a class of machine learning algorithms that are frequently used nowadays. Named after their method of learning a decision boundary, SVMs are binary classifiers, meaning that they are designed to work with a scenario involving two classes, typically labeled as 0 and 1. However, in practice, many scenarios require multi-class classification.
For example, consider the iris dataset, which includes three species: setosa, versicolor, and virginica, along with measurements related to these species, such as the length and width of the sepal.
from sklearn import datasets
iris = datasets.load_iris()
import matplotlib.pyplot as plt
_, ax = plt.subplots()
scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
_ = ax.legend(
scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
)
We would like to use the features of the sepal (e.g., its width and length) to classify its species. How should we accordingly modify the SVM to address these issues?
Basic idea: break down the problem to multiple binary classification problems.
Support there are $M$ classes in total, denoted as $\{1, 2, \ldots, M\}$. The training procedure is:
For any two classes $i, j \in \{1, 2, \ldots, M\}$ with $i < j$, train a binary SVM.
Therefore, there are $\frac{1}{2}(M-1)M$ binary SVMs.
Prediction procedure:
For a new input $x$, we recruit $\frac{1}{2}(M-1)M$ binary SVMs to predict its class. Then, the class with the most votes is predicted.
Here is an example of employing the one-versus-one approach with a multi-class support vector machine in Python. For simplicity, the regularization parameter $C$ in the binary SVMs is set to 1.0.
# Take the first two features. We could avoid this by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target
# SVM regularization parameter
C = 1.0
# Initial the one-versus-one SVM by setting kernel and regularization parameter
clf = svm.SVC(kernel="linear", C=C)
# Fit the training data
clf.fit(X, y)
SVC(kernel='linear')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
SVC(kernel='linear')
Then, we can visualize the fitted SVM model:
from sklearn.inspection import DecisionBoundaryDisplay
fig, ax = plt.subplots(1, 1)
disp = DecisionBoundaryDisplay.from_estimator(
clf,
X,
response_method="predict",
cmap=plt.cm.coolwarm,
alpha=0.8,
ax=ax,
xlabel=iris.feature_names[0],
ylabel=iris.feature_names[1],
)
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
ax.set_xticks(())
ax.set_yticks(())
ax.set_title("One-versus-one SVM")
plt.show()
Basic idea: distinguish between one label and all the others, where the class prediction with highest score wins.
Training procedure:
For $i \in \{1, \ldots, M\}$, train a binary SVM to classify label $i$ and the other labels.
Consequently, there are $M$ binary SVMs.
Prediction procedure for a new input $\mathbf{x}$:
For each binary SVM with estimated parameter $\mathbf{w}^{(i)}, b^{(i)}$, compute the classification score $\langle \mathbf{w}^{(i)}, \mathbf{x} \rangle+ b^{(i)}$
$\arg\max_{i} \langle \mathbf{w}^{(i)}, \mathbf{x} \rangle+ b^{(i)}$ is the predicted label.
How to use the one-vs-rest approach based multi-class SVM in Python?
The scikit-learn
library already implements the one-vs-rest approach into LinearSVC
, which has a similar usage as SVC
. We illustrate its usage via the iris dataset.
# Initial one-vs-rest SVM and set the regularization parameter
clf = svm.LinearSVC(C=C, max_iter=10000)
# Fit the training data
clf.fit(X, y)
fig, ax = plt.subplots(1, 1)
disp = DecisionBoundaryDisplay.from_estimator(
clf,
X,
response_method="predict",
cmap=plt.cm.coolwarm,
alpha=0.8,
ax=ax,
xlabel=iris.feature_names[0],
ylabel=iris.feature_names[1],
)
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
ax.set_xticks(())
ax.set_yticks(())
ax.set_title("One-versus-rest SVM")
plt.show()
tasks, Crammer-singer (CS) approach proposed a direct method for training multi-class predictors by optimizing a joint objective over all classes.
Suggested reference: Crammer, Koby, and Yoram Singer. "On the algorithmic implementation of multiclass kernel-based vector machines." Journal of machine learning research 2.Dec (2001): 265-292.
According to the soft classification formulation of SVM:
$$\min_{\mathbf{w}, b, \mathbf{\xi}} \frac{1}{2} \left\|\mathbf{w}\right\|_{2}^2 + C \sum_{i=1}^n \mathbf{\xi}_i \text{ s.t. } y_i(\langle \mathbf{w}, \mathbf{x}_i \rangle + b) \geq 1 - \mathbf{\xi}_i, \mathbf{\xi}_i\geq 0 \forall i,$$The $C$ parameter trades off correct classification of training examples against the maximization of the decision function's margin. For larger values of $C$, a smaller margin will be accepted if the decision function is better at classifying all training points correctly. A lower $C$ will encourage a larger margin, therefore a simpler decision function, at the cost of training accuracy. In other words, $C$ acts as a regularization parameter in the SVM. The basic guidelines are summarized below:
If the underlying model is associated with large noise, then you should decrease it: decreasing $C$ corresponds to more regularization.
Otherwise, you should increase $C$ since the hard classification assumption is likely to hold, and we don't need as much regularization.
For SVC
and LinearSVC
, $C=1$ by default, and it’s a reasonable default choice. Given a dataset, a more reasonable method for selecting $C$ is through cross-validation.
The simplest way to use cross-validation is to call the cross_val_score
helper function in scikit-learn
on the estimator and the dataset.
The following example demonstrates how to estimate the accuracy of a linear kernel support vector machine on the iris dataset. This is achieved by splitting the data, fitting a model, and computing the score at 5 consecutive times, with different splits each time:
from sklearn.model_selection import cross_val_score
import numpy as np
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)
C_list = np.logspace(start=-3, stop=3, num=16)
print(np.round(C_list, 4))
[1.000000e-03 2.500000e-03 6.300000e-03 1.580000e-02 3.980000e-02 1.000000e-01 2.512000e-01 6.310000e-01 1.584900e+00 3.981100e+00 1.000000e+01 2.511890e+01 6.309570e+01 1.584893e+02 3.981072e+02 1.000000e+03]
Above are 32 possible values of $C$ that range within the interval $[10^{-3}, 10^3]$. We will choose the value that leads to the highest cross-validation score. Below is the implementation for this.
clf = svm.SVC(kernel='linear', random_state=42)
score_list = []
for C in C_list:
clf.set_params(C = C)
scores = cross_val_score(clf, X, y, cv=5)
score_list.append(scores.mean())
plt.scatter(np.log10(np.array(C_list)), np.array(score_list))
plt.xlabel("C")
plt.ylabel("Five-fold cross validation score")
plt.show()
We can see that a moderate value of $C$ can lead to the highest cross-validation score for the iris dataset.
Other important tuning parameters for SVMs are the parameters in the kernel function. The most common kernel functions are:
Polynomial kernel: $(\gamma\langle \mathbf{x}, \mathbf{x}'\rangle + r)^d$
Radial basis function (RBF) kernel: $\exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|^2_2)$, where $\gamma > 0$
Next, we will illustrate the effect of the parameter $\gamma$ of the RBF kernel.
Intuitively, the $\gamma$ parameter defines how far the influence of a single training example reaches, with low values meaning 'far' and high values meaning 'close'. The $\gamma$ parameter can be seen as the inverse of the radius of influence of samples selected by the model as support vectors. When $\gamma$ is very small, the model is too constrained and cannot capture the complexity or 'shape' of the data. The region of influence of any selected support vector would include the whole training set, resulting in a model that behaves similarly to a linear model with a set of hyperplanes that separate the centers of high density of any pair of two classes. To illustrate it, we present a heatmap of the classifier’s boundary as a function of $C$ and $\gamma$.
X_2d = X[:, :2]
X_2d = X_2d[y > 0]
y_2d = y[y > 0]
y_2d -= 1
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_2d = scaler.fit_transform(X_2d)
from sklearn.model_selection import GridSearchCV
C_2d_range = [1e-2, 1, 1e2]
gamma_2d_range = [1e-1, 1, 1e1]
classifiers = []
for C in C_2d_range:
for gamma in gamma_2d_range:
clf = svm.SVC(C=C, gamma=gamma)
clf.fit(X_2d, y_2d)
classifiers.append((C, gamma, clf))
plt.figure(figsize=(8, 6))
xx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))
for k, (C, gamma, clf) in enumerate(classifiers):
# evaluate decision function in a grid
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# visualize decision function for these parameters
plt.subplot(len(C_2d_range), len(gamma_2d_range), k + 1)
plt.title("gamma=10^%d, C=10^%d" % (np.log10(gamma), np.log10(C)), size="medium")
# visualize parameter's effect on decision function
plt.pcolormesh(xx, yy, -Z, cmap=plt.cm.RdBu)
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y_2d, cmap=plt.cm.RdBu_r, edgecolors="k")
plt.xticks(())
plt.yticks(())
plt.axis("tight")
Thus, we search for $C$ and $\gamma$ simultaneously and visualize the cross-validation score in a heatmap.
# select C and gamma
C_range = np.logspace(-2, 10, 13)
gamma_range = np.logspace(-9, 3, 13)
param_grid = dict(gamma=gamma_range, C=C_range)
grid = GridSearchCV(svm.SVC(), param_grid=param_grid, cv=5)
grid.fit(X, y)
# visualize cross validation score
import pandas as pd
def make_heatmap(ax, gs, is_sh=False, make_cbar=False):
"""Helper to make a heatmap."""
results = pd.DataFrame(gs.cv_results_)
results[["param_C", "param_gamma"]] = results[["param_C", "param_gamma"]].astype(
np.float64
)
if is_sh:
scores_matrix = results.sort_values("iter").pivot_table(
index="param_gamma",
columns="param_C",
values="mean_test_score",
aggfunc="last",
)
else:
scores_matrix = results.pivot(
index="param_gamma", columns="param_C", values="mean_test_score"
)
im = ax.imshow(scores_matrix)
ax.set_xticks(np.arange(len(C_range)))
ax.set_xticklabels(["{:.0E}".format(x) for x in C_range])
ax.set_xlabel("C", fontsize=15)
ax.set_yticks(np.arange(len(gamma_range)))
ax.set_yticklabels(["{:.0E}".format(x) for x in gamma_range])
ax.set_ylabel("gamma", fontsize=15)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
if is_sh:
iterations = results.pivot_table(
index="param_gamma", columns="param_C", values="iter", aggfunc="max"
).values
for i in range(len(gamma_range)):
for j in range(len(C_range)):
ax.text(
j,
i,
iterations[i, j],
ha="center",
va="center",
color="w",
fontsize=20,
)
if make_cbar:
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
fig.colorbar(im, cax=cbar_ax)
cbar_ax.set_ylabel("mean_test_score", rotation=-90, va="bottom", fontsize=15)
fig, ax = plt.subplots(ncols=1, sharey=True)
make_heatmap(ax, grid, make_cbar=True)
ax.set_title("GridSearch", fontsize=15)
plt.show()
In this part, we will introduce kernel principal component analysis (KPCA), which is an extension of principal component analysis (PCA) that utilizes the techniques of kernel methods. By using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space, showcasing distinctive advantages not shared by PCA. To illustrate this, we create a dataset comprised of two nested circles.
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
X, y = make_circles(n_samples=1000, factor=0.3, noise=0.05, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)
import matplotlib.pyplot as plt
_, ax = plt.subplots(ncols=1, sharex=True, sharey=True, figsize=(4, 4))
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train)
ax.set_ylabel("Feature #1")
ax.set_xlabel("Feature #0")
ax.set_title("Training data")
Text(0.5, 1.0, 'Training data')
Now, we will use PCA both with and without a kernel to observe the effect of using such a kernel. The kernel used here is an RBF kernel. The PCA
and KernelPCA
implementations in scikit-learn
are utilized for our study.
from sklearn.decomposition import PCA, KernelPCA
pca = PCA(n_components=2)
kernel_pca = KernelPCA(
n_components=2, kernel="rbf", gamma=3, fit_inverse_transform=True, alpha=0.1
)
X_test_pca = pca.fit(X_train).transform(X_test)
X_test_kernel_pca = kernel_pca.fit(X_train).transform(X_test)
fig, (orig_data_ax, pca_proj_ax, kernel_pca_proj_ax) = plt.subplots(
ncols=3, figsize=(14, 4)
)
orig_data_ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test)
orig_data_ax.set_ylabel("Feature #1")
orig_data_ax.set_xlabel("Feature #0")
orig_data_ax.set_title("Testing data")
pca_proj_ax.scatter(X_test_pca[:, 0], X_test_pca[:, 1], c=y_test)
pca_proj_ax.set_ylabel("Principal component #1")
pca_proj_ax.set_xlabel("Principal component #0")
pca_proj_ax.set_title("Projection of testing data\n using PCA")
kernel_pca_proj_ax.scatter(X_test_kernel_pca[:, 0], X_test_kernel_pca[:, 1], c=y_test)
kernel_pca_proj_ax.set_ylabel("Principal component #1")
kernel_pca_proj_ax.set_xlabel("Principal component #0")
_ = kernel_pca_proj_ax.set_title("Projection of testing data\n using KernelPCA")
Examining the PCA-derived projection depicted in the central image, we notice the absence of scaling alterations; the dataset, composed of two concentric circles centered at zero, retains its original isotropy. However, a rotation of the data is evident. Consequently, this type of projection proves ineffective for employing a linear classifier to differentiate between the two classes.
Employing a kernel facilitates a non-linear projection. Specifically, through the application of an RBF kernel, we anticipate the dataset to expand in such a manner that it largely maintains the relative proximities of data point pairs that are nearby in the initial space. This effect is visible in the right-hand figure, where data points belonging to the same class are more closely grouped than those of the opposing class, effectively disentangling the two sets of samples. This arrangement now permits the application of a linear classifier to segregate the samples into their respective classes.
Before we introduce kernel canonical correlation analysis, we first introduce the canonical correlation analysis (CCA). In CCA, we are given a pair of random variables $X \in \mathbb{R}^p, Y \in \mathbb{R}^q$. In practice, there are $(X_i, Y_i)$ pairs, $i = {1, ..., N}$. An example of such $(X_i, Y_i)$ could be two sets of different measures of the same person.
We want to find a pair of vectors $A \in \mathbb{R}^p$ and $B \in \mathbb{R}^q$, for $X$ and $Y$ respectively, so that $X$ and $Y$ are maximally correlated after the mapping: $$\text{CCA}(X,Y) = \underset{A \in \mathbb{R}^p, B \in \mathbb{R}^q}{\operatorname{\max}}\; \mathrm{corr}(A^\top X, B^\top Y).$$
Including dimension reduction as a key step, CCA can be used for 'independence' testing: given two examples, how much is one 'independent' from the other? This has a wide range of applications, for example, to check whether two images are similar.
We will illustrate it using the original cat image and the transformed ones.
Image rotation is very easy with the PIL
module:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import cv2
img_rgb = cv2.imread("cat.jpg")[
..., ::-1
] # cv2 reads in image in BGR order; we need to switch it back to RGB
imgplot = plt.imshow(img_rgb)
plt.show()
from PIL import Image, ImageOps
# use a smaller image because we will need to stack the pixels vertically, 3072x3072 rows would be A LOT
img_rgb_small = Image.fromarray(np.uint8(img_rgb)).resize((64, 64))
img_rgb_small = np.array(img_rgb_small)
img_rotated_45 = Image.fromarray(np.uint8(img_rgb_small)).rotate(
45
) # convert numpy array (img) to PIL image, then rotate
img_rotated_45 = np.array(img_rotated_45) # convert PIL image back to numpy array
plt.rcParams["figure.figsize"] = [5, 5]
plt.subplot(1, 2, 1)
plt.imshow(img_rgb_small)
plt.title("Original (downsized)")
plt.subplot(1, 2, 2)
plt.imshow(img_rotated_45)
plt.title("Rotated")
plt.show()
CCA assumes that the data points in the two matrices are paired. The original matrix is of size (64,64,3). We will flatten the pixels in each color channel and stack the 3 channels together to form a (4096,3) matrix for the resized original image and the transformed ones. This can be done using np.reshape()
.
Example pixel value before reshaping:
print(img_rgb_small.shape)
img_rgb_small[-3:, 0, :]
(64, 64, 3)
array([[92, 76, 62], [83, 68, 55], [78, 65, 52]], dtype=uint8)
The corresponding rows that represent the example pixels:
img_rgb_flattened = np.reshape(img_rgb_small, [-1, 3])
print(img_rgb_flattened.shape)
img_rgb_flattened[64 * 61, :], img_rgb_flattened[64 * 62, :], img_rgb_flattened[
64 * 63, :
]
(4096, 3)
(array([92, 76, 62], dtype=uint8), array([83, 68, 55], dtype=uint8), array([78, 65, 52], dtype=uint8))
Reshape the transformed image as well.
img_rotated_45_flattened = np.reshape(img_rotated_45, [-1, 3])
img_rotated_45_flattened.shape
(4096, 3)
Now we can run CCA.
from sklearn.cross_decomposition import CCA
cca = CCA(n_components=1, scale=False)
cca.fit(img_rgb_flattened, img_rotated_45_flattened)
CCA(n_components=1, scale=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
CCA(n_components=1, scale=False)
Recall that we are looking for a vector $A \in \mathbb{R}^{3}$ and a vector $B \in \mathbb{R}^{3}$. They can be retrieved using method x_rotations_
and y_rotations
.
cca.x_rotations_.shape, cca.y_rotations_.shape
((3, 1), (3, 1))
cca.x_rotations_, cca.y_rotations_
(array([[-0.32409948], [ 0.84857033], [-0.41819604]]), array([[ 0.53220846], [ 0.26005736], [-0.80568252]]))
To get the reduced data, we can manually calculate it according to the formula. Note that although we turned off scale
, this implemented method still centers the data. This means that for manual calculation, we need to center our data first.
img_reduced = np.dot(img_rgb_flattened - cca._x_mean, cca.x_rotations_)
img_rotated_45_reduced = np.dot(
img_rotated_45_flattened - cca._y_mean, cca.y_rotations_
)
img_reduced.shape, img_rotated_45_reduced.shape
((4096, 1), (4096, 1))
img_reduced[:5, :], img_rotated_45_reduced[:5, :]
(array([[-6.23217452], [-6.33844932], [-6.33844932], [-6.33844932], [-6.33844932]]), array([[-9.43415382], [-9.43415382], [-9.43415382], [-9.43415382], [-9.43415382]]))
Or, we can use the transform()
method implemented in sklearn
.
img_reduced, img_rotated_45_reduced = cca.transform(
img_rgb_flattened, img_rotated_45_flattened
)
img_reduced.shape, img_rotated_45_reduced.shape
((4096, 1), (4096, 1))
The results should be the same as our manual calculation.
img_reduced[:5, :], img_rotated_45_reduced[:5, :]
(array([[-6.23217452], [-6.33844932], [-6.33844932], [-6.33844932], [-6.33844932]]), array([[-9.43415382], [-9.43415382], [-9.43415382], [-9.43415382], [-9.43415382]]))
We can check the correlation coefficient on the two reduced vectors.
from scipy.stats import pearsonr
pearsonr(img_reduced[:, 0], img_rotated_45_reduced[:, 0])[0]
0.5678747482419151
The reduced 1-dimensional embedding from the original image and the rotated image is close but not considered identical, as the Pearson correlation coefficient is around 0.5. Since the image has been rotated and the corresponding matrix representation of it is changed dramatically, to some extent CCA is still able to identify the similarity, or lack of independence, between the two.
What if we rotate the image to different angle? Will we see a difference in the Pearson correlation score?
degrees = [0, 10, 20, 30, 40, 50]
d_img_cca = {}
for degree in degrees:
d_img_cca[degree] = {}
img_rotated = Image.fromarray(np.uint8(img_rgb_small)).rotate(
degree
) # convert numpy array (img) to PIL image, then rotate
img_rotated = np.array(img_rotated) # convert PIL image back to numpy array
d_img_cca[degree]["img_rotated"] = img_rotated
img_rotated_flattened = np.reshape(img_rotated, [-1, 3])
cca = CCA(n_components=1, scale=False)
cca.fit(img_rgb_flattened, img_rotated_flattened)
img_reduced, img_rotated_reduced = cca.transform(
img_rgb_flattened, img_rotated_flattened
)
d_img_cca[degree]["r"] = pearsonr(img_reduced[:, 0], img_rotated_reduced[:, 0])
plt.rcParams["figure.figsize"] = [20, 3]
for j, (degree, v) in enumerate(d_img_cca.items()):
img_rotated = v["img_rotated"]
plt.subplot(1, len(degrees), j + 1)
plt.imshow(img_rotated)
plt.title(
f"rotated {degree} degree\nPearson R:\n{v['r'][0]:.2f} (p={v['r'][1]:.2f})"
)
plt.show()
Canonical Correlation Analysis by nature applies a linear transformation: $$\text{CCA}(X,Y) = \underset{A \in \mathbb{R}^p, B \in \mathbb{R}^q}{\operatorname{max}} \mathrm{corr}(A^\top X, B^\top Y)$$
Linear transformation is not always sufficient. You may want to turn to non-linear transformations for help.
This is addressed in Kernel Canonical Correlation Analysis (KCCA). $$\text{KCCA}(X,Y) = \underset{f,g}{\operatorname{max}} \mathrm{corr}(f(X), g(Y))$$ $$\mathrm{corr}(f(X),g(Y)) = \frac{\text{cov}_{XY}(f(X),g(Y))}{\sqrt{\text{var}_X f(X) \text{var}_Y g(Y)}}$$
We will use the ITE Toolbox
(Bitbucket repository) to illustrate KCCA.
!git clone https://bitbucket.org/szzoli/ite-in-python.git
!mv ite-in-python/ite .
Cloning into 'ite-in-python'... remote: Enumerating objects: 75, done. remote: Counting objects: 100% (75/75), done. remote: Compressing objects: 100% (73/73), done. remote: Total 75 (delta 37), reused 0 (delta 0), pack-reused 0 (from 0) Receiving objects: 100% (75/75), 89.10 KiB | 4.05 MiB/s, done. Resolving deltas: 100% (37/37), done.
from ite.cost import BIKCCA
%%time
plt.rcParams["figure.figsize"] = [20, 3]
for j, (degree, v) in enumerate(d_img_cca.items()):
img_rotated = v["img_rotated"]
img_rotated_flattened = np.reshape(img_rotated, [-1, 3])
co = BIKCCA(
eta=0.2
) # the smaller the eta, the more accurate the approximation is & the longer the time it takes
kcca = co.estimation(np.hstack((img_rgb_flattened, img_rotated_flattened)), [3, 3])
plt.subplot(1, len(degrees), j + 1)
plt.imshow(img_rotated)
# plt.imshow(img_rotated, cmap='gray', vmin=0, vmax=255)
plt.title(f"rotated {degree} degree\nKCCA score:\n{kcca:.2f}")
plt.show()
CPU times: user 14min 48s, sys: 3min 8s, total: 17min 57s Wall time: 3min 35s
Now, let's say we have two image of the same cat, one is nice and clear with the cat facing the front while the other is not that standard. For the latter, this time we will both rotate the image and add some noise on it.
We will try to rotate the second image to different angles, compare different dissimilarity scores (L2 norm, Pearson coefficient from embeddings created by CCA, and also the score given by KCCA), and see how the scores change as we rotate the second image.
# 1. rotate 30 degree
img_noisy = Image.fromarray(np.uint8(img_rgb_small)).rotate(30)
img_noisy = np.array(img_noisy)
# 2. use Sobel to extract the edges
img_noisy = cv2.Sobel(
src=img_noisy, ddepth=cv2.CV_64F, dx=1, dy=1, ksize=1, scale=0.5, delta=128
) # scale & delta transforms the resulting values into [0,255] range
img_noisy = img_noisy.astype(int) # convert float values to int
The transformed noisy image has the same 3 RGB channels. The Sobel filter is applied on each of the channels to produce filtered values for each channel.
img_noisy.shape
(64, 64, 3)
img_noisy.min(), img_noisy.max()
(20, 238)
plt.rcParams["figure.figsize"] = [5, 6]
plt.subplot(1, 2, 1)
plt.imshow(img_rgb_small)
plt.title("Image 1\nto be compared against")
plt.subplot(1, 2, 2)
plt.imshow(img_noisy)
plt.title("Image 2\nto be rotated")
plt.show()
# CAREFULLY: This block might take a bit of time...
degrees = list(range(-60, 60, 5))
rs = []
l2s = []
kccas = []
for degree in degrees:
img_noisy_rotated = Image.fromarray(np.uint8(img_noisy)).rotate(
degree
) # convert numpy array (img) to PIL image, then rotate
img_noisy_rotated = np.array(
img_noisy_rotated
) # convert PIL image back to numpy array
img_noisy_rotated_flattened = np.reshape(img_noisy_rotated, [-1, 3])
cca = CCA(n_components=1, scale=False)
cca.fit(img_rgb_flattened, img_noisy_rotated_flattened)
img_reduced, img_noisy_reduced = cca.transform(
img_rgb_flattened, img_noisy_rotated_flattened
)
rs.append(pearsonr(img_reduced[:, 0], img_noisy_reduced[:, 0])[0])
l2s.append(np.linalg.norm(img_reduced[:, 0] - img_noisy_reduced[:, 0]))
co = BIKCCA(eta=0.1)
kcca = co.estimation(
np.hstack((img_rgb_flattened, img_noisy_rotated_flattened)), [3, 3]
)
kccas.append(kcca)
plt.rcParams["figure.figsize"] = [5, 5]
plt.imshow(img_noisy_rotated)
plt.title("Example of rotated Image 2")
plt.show()
plt.rcParams["figure.figsize"] = [12, 8]
plt.subplot(3, 1, 1)
plt.plot(degrees, l2s)
plt.vlines(x=-30, ymin=0, ymax=max(l2s), ls="--", color="grey")
plt.xticks(
np.arange(min(degrees), max(degrees) + 1, 10)
) # set custom frequency of xticks
plt.gca().invert_yaxis() # invert y axis so that visually higher in the graph means higher similarity, consistent with CCA/KCCA
plt.ylabel("L2 norm")
plt.subplot(3, 1, 2)
plt.plot(degrees, rs)
plt.vlines(x=-30, ymin=0, ymax=max(rs), ls="--", color="grey")
plt.xticks(np.arange(min(degrees), max(degrees) + 1, 10))
plt.ylabel("Pearson R by CCA")
plt.subplot(3, 1, 3)
plt.plot(degrees, kccas)
plt.vlines(x=-30, ymin=0, ymax=max(kccas), ls="--", color="grey")
plt.xticks(np.arange(min(degrees), max(degrees) + 1, 10))
plt.yticks(np.arange(0, max(kccas) + 0.1, 0.1))
plt.ylabel("Score by KCCA")
plt.xlabel("degree rotated")
plt.show()