from kernel_logreg import KernelLogisticRegression
import numpy as np
from matplotlib import pyplot as plt
from mlxtend.plotting import plot_decision_regions
from scipy.optimize import minimize
from sklearn.datasets import make_blobs, make_circles, make_moons
from sklearn.metrics.pairwise import rbf_kernel
Kernel Logistic Regression
In this blog post I have implemented kernel logistic regression. Kernel logistic regression is a lot like regular linear logistic regression except we modify the data beforehand in order to still be able to use linear regression on nonlinear data.
For those that are curious, my loss function remains largely unchanged. I simply use the kernel matrix instead of the regular nxp data matrix. I use all numpy functions that performs the operations on all of the rows.
all='ignore')
np.seterr(
def draw_line(w, x_min, x_max, color="black", ax=None, alpha=1):
= np.linspace(x_min, x_max, 101)
x = -(w[0]*x + w[2])/w[1]
y if ax is None:
= color, alpha=alpha)
plt.plot(x, y, color else:
= color, alpha=alpha)
ax.plot(x, y, color
def construct_plot(X, y, clf, ax=None, title="Accuracy = "):
=clf, ax=ax)
plot_decision_regions(X, y, clfif ax is None: ax = plt.gca()
= ax.set(title = f"{title}{(KLR.predict(X) == y).mean()}",
title = "Feature 1",
xlabel = "Feature 2") ylabel
Experiments
Experiment 1: Basic Check
Here I am just confirming that all of my code actually works.
Running a test on moon shaped data with gamma=0.1
I get a model that is a pretty good fit for both train and test data.
= make_moons(200, shuffle = True, noise = 0.2)
X_train, y_train = KernelLogisticRegression(rbf_kernel, gamma = .1)
KLR
KLR.fit(X_train, y_train)
= make_moons(100, shuffle = True, noise = 0.2)
X_test, y_test = plt.subplots(1, 2, figsize=(12, 6))
fig, (ax1, ax2)
"Training Accuracy = ")
construct_plot(X_train, y_train, KLR, ax1, "Test Accuracy = ") construct_plot(X_test, y_test, KLR, ax2,
Experiment 2: Choosing Gamma
In this experiment I will demonstrate the importance of choosing a good gamma value. A gamma too low will underfit the data, but a gamma too high will overfit the data. As we can see values for gamma above 1 begin to lose accuracy on the test set.
= make_moons(80, shuffle = True, noise = 0.2)
X_train, y_train = make_moons(80, shuffle = True, noise = 0.2)
X_test, y_test = [.01, .1, 1, 10, 100, 1000, 10000]
gammas = plt.subplots(len(gammas), 2, figsize=(12, 6*len(gammas)))
fig, axarr
for i, gamma in enumerate(gammas):
= KernelLogisticRegression(rbf_kernel, gamma=gamma)
KLR
KLR.fit(X_train, y_train)0], f"gamma={gamma} Training Accuracy = ")
construct_plot(X_train, y_train, KLR, axarr[i][1], f"gamma={gamma} Test Accuracy = ") construct_plot(X_test, y_test, KLR, axarr[i][
Experiment 3: Varied Noise
In this experiment I vary the noise paramater of the make_moons
function, and I test what values of gamma are best with increased noise.
In the last experiment we tested with noise=.2
and saw that a gamma value of around 1 was sufficient.
Testing with Noise=.05
From this test we can gather that when there is very little noise (and very little difference between the train and test data), we just need a high enough gamma to follow the shape.
= make_moons(80, shuffle = True, noise = 0.05)
X_train, y_train = make_moons(80, shuffle = True, noise = 0.05)
X_test, y_test = [.01, .1, 1, 10]
gammas = plt.subplots(len(gammas), 2, figsize=(12, 6*len(gammas)))
fig, axarr
for i, gamma in enumerate(gammas):
= KernelLogisticRegression(rbf_kernel, gamma=gamma)
KLR
KLR.fit(X_train, y_train)0], f"gamma={gamma} Training Accuracy = ")
construct_plot(X_train, y_train, KLR, axarr[i][1], f"gamma={gamma} Test Accuracy = ") construct_plot(X_test, y_test, KLR, axarr[i][
Testing with Noise=.04
In this test we see that a gamma of .1 does a the best at matching the shape we want. Higher than .1 is overfit and below .1 is underfit.
= make_moons(80, shuffle = True, noise = 0.4)
X_train, y_train = make_moons(80, shuffle = True, noise = 0.4)
X_test, y_test = [.01, .1, 1, 10]
gammas = plt.subplots(len(gammas), 2, figsize=(12, 6*len(gammas)))
fig, axarr
for i, gamma in enumerate(gammas):
= KernelLogisticRegression(rbf_kernel, gamma=gamma)
KLR
KLR.fit(X_train, y_train)0], f"gamma={gamma} Training Accuracy = ")
construct_plot(X_train, y_train, KLR, axarr[i][1], f"gamma={gamma} Test Accuracy = ") construct_plot(X_test, y_test, KLR, axarr[i][
Conclusions from Experiment 3
Overall, I don’t have enough evidence to suggest that the best value for gamma depends on the noise level. This generally matches what I have seen in the documentation that assigns gamma based off of the size of the training set.
Experiment 4: Other Shapes
Testing with gamma=.1
= [0, .1, .2]
noises = plt.subplots(len(noises), 2, figsize=(12, 6*len(noises)))
fig, axarr = .1
gamma for i, noise in enumerate(noises):
= make_circles(50, shuffle = True, noise = noise)
X_train, y_train = make_circles(50, shuffle = True, noise = noise)
X_test, y_test = KernelLogisticRegression(rbf_kernel, gamma=gamma)
KLR
KLR.fit(X_train, y_train)0], f"noise={noise} Training Accuracy = ")
construct_plot(X_train, y_train, KLR, axarr[i][1], f"noise={noise} Test Accuracy = ") construct_plot(X_test, y_test, KLR, axarr[i][
Testing with gamma=1
= [0, .1, .2]
noises = plt.subplots(len(noises), 2, figsize=(12, 6*len(noises)))
fig, axarr = 1
gamma for i, noise in enumerate(noises):
= make_circles(50, shuffle = True, noise = noise)
X_train, y_train = make_circles(50, shuffle = True, noise = noise)
X_test, y_test = KernelLogisticRegression(rbf_kernel, gamma=gamma)
KLR
KLR.fit(X_train, y_train)0], f"noise={noise} Training Accuracy = ")
construct_plot(X_train, y_train, KLR, axarr[i][1], f"noise={noise} Test Accuracy = ") construct_plot(X_test, y_test, KLR, axarr[i][
Testing with gamma=10
A gamma value of 10 is too high with
= [0, .1, .2]
noises = plt.subplots(len(noises), 2, figsize=(12, 6*len(noises)))
fig, axarr = 10
gamma for i, noise in enumerate(noises):
= make_circles(50, shuffle = True, noise = noise)
X_train, y_train = make_circles(50, shuffle = True, noise = noise)
X_test, y_test = KernelLogisticRegression(rbf_kernel, gamma=gamma)
KLR
KLR.fit(X_train, y_train)0], f"noise={noise} Training Accuracy = ")
construct_plot(X_train, y_train, KLR, axarr[i][1], f"noise={noise} Test Accuracy = ") construct_plot(X_test, y_test, KLR, axarr[i][
Conclusions from Experiment 4
When there is no noise pretty much any reasonable gamma will do, but once there starts to be signficant noise, the lower gamma value does better on this set.