401 Project 10 - Regression: Perceptrons

Project Objectives

In this project, we will be learning about perceptrons and how they can be used for regression. We will be using the Boston Housing dataset as it has many different potential features and target variables.

Learning Objectives
  • Understand the basic concepts behind a perceptron

  • Implement activation functions and their derivatives

  • Implement a perceptron class for regression

Supplemental Reading and Resources

Dataset

  • /anvil/projects/tdm/data/boston_housing/boston.csv

Questions

Question 1 (2 points)

A perceptron is a simple model that can be used for regression. These perceptrons can be combined together to create neural networks. In this project, we will be creating a perceptron from scratch.

To start, let’s load in the Boston Housing dataset with the below code:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
df = pd.read_csv('/anvil/projects/tdm/data/boston_housing/boston.csv')

X = df.drop(columns=['MEDV'])
y = df[['MEDV']]

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=7)

y_train = y_train.to_numpy()
y_test = y_test.to_numpy()

Now, we can begin discussing what a perceptron is. A perceptron is a simple model that takes in a set of inputs and produces an output. The perceptron is defined by a set of weights and a bias term (similar to our linear regression model having coefficients and an y intercept term). The perceptron then takes the dot product of the input features and the weights and adds the bias term.

Then, the perceptron will apply some activation function before outputting the final value. This activation function is some non-linear function that allows the perceptron to learn complex data, instead of behaving as a linear model.

There are many different activation functions, some of the most common are listed below:

Activation Function Formula Derivative Usage

Linear

x

1

Final layer of regression to output continuous values

ReLU

max(0, x)

1 if x > 0, 0 otherwise

Hidden layers of neural networks

Sigmoid

1 / (1 + exp(-x))

sigmoid(x) * (1 - sigmoid(x))

Final layer of binary classification, or hidden layers of neural networks

Tanh

(exp(x) - exp(-x)) / (exp(x) + exp(-x))

1 - tanh(x)^2

Hidden layers of neural networks

For this project, we will be creating a perceptron class that can be used for regression. There are many different parameters that can be set when creating a perceptron, such as the learning rate, number of epochs, and activation function.

For this question, please implement functions for the Linear, ReLU, Sigmoid, and Tanh activation functions. Additionally, implement the derivative of each of these functions. These functions should be able to take in a numpy array and return the transformed array.

import numpy as np
def linear(x):
    pass
def linear_d(x):
    pass
def relu(x):
    pass
def relu_d(x):
    pass
def sigmoid(x):
    pass
def sigmoid_d(x):
    pass
def tanh(x):
    pass
def tanh_d(x):
    pass

To test your functions, you can use the below code:

x = np.array([-1, 0, 1])
print(linear(x)) # should return [-1, 0, 1]
print(linear_d(x)) # should return [1, 1, 1]
print(relu(x)) # should return [0, 0, 1]
print(relu_d(x)) # should return [0, 0, 1]
print(sigmoid(x)) # should return [0.26894142, 0.5, 0.73105858]
print(sigmoid_d(x)) # should return [0.19661193, 0.25, 0.19661193]
print(tanh(x)) # should return [-0.76159416, 0, 0.76159416]
print(tanh_d(x)) # should return [0.41997434, 1, 0.41997434]
Deliverables
  • Completed activation and derivative functions

  • Test the functions with the provided code

Question 2 (2 points)

Now that we have our activation functions, let’s start working on our Perceptron class. This class will create a perceptron that can be used for regression problems. Below is a skeleton of our Perceptron class:

class Perceptron:
    def __init__(self, learning_rate=0.01, n_epochs=1000, activation='relu'):
        # this will initialize the perceptron with the given parameters
        pass

    def activate(self, x):
        # this will apply the activation function to the input
        pass

    def activate_derivative(self, x):
        # this will apply the derivative of the activation function to the input
        pass

    def compute_linear(self, X):
        # this will calculate the linear combination of the input and weights
        pass

    def error(self, y_pred, y_true):
        # this will calculate the error between the predicted and true values
        pass

    def backward_gradient(self, error, linear):
        # this will update the weights and bias of the perceptron
        pass

    def predict(self, X):
        # this will predict the output of the perceptron given the input
        pass

    def train(self, X, y, reset_weights = True):
        # this will train the perceptron on the given input and target values
        pass

    def test(self, X, y):
        # this will test the perceptron on the given input and target values
        pass

Now, it may seem daunting to implement all of these functions. However, most of these functions are as simple as one mathematical operation.

For this question, please implement the init, activate, and activate_derivative functions. The init function should initialize the perceptron with the given parameters, as well as setting weights and bias terms to None.

The activate function should apply the activation function to the input, and the activate_derivative function should apply the derivative of the activation function to the input. It is important that these functions use the proper function based on the value of self.activation. Additionally, if the activation function is not set to one of the three functions we implemented earlier, the default should be the ReLU function.

To test your functions, you can use the below code:

test_x = np.array([-2, 0, 1.5])
p = Perceptron(learning_rate=0.01, n_epochs=1000, activation='linear')
print(p.activate(test_x)) # should return [-2, 0, 1.5]
print(p.activate_derivative(test_x)) # should return [1, 1, 1]
p.activation = 'relu'
print(p.activate(test_x)) # should return [0, 0, 1.5]
print(p.activate_derivative(test_x)) # should return [0, 0, 1]
p.activation = 'sigmoid'
print(p.activate(test_x)) # should return [0.11920292, 0.5, 0.81757448]
print(p.activate_derivative(test_x)) # should return [0.10499359, 0.25, 0.14914645]
p.activation = 'tanh'
print(p.activate(test_x)) # should return [-0.96402758, 0, 0.90514825]
print(p.activate_derivative(test_x)) # should return [0.07065082, 1, 0.18070664]
p.activation = 'invalid'
print(p.activate(test_x)) # should return [0, 0, 1.5]
print(p.activate_derivative(test_x)) # should return [0, 0, 1]
Deliverables
  • Implement the init, activate, and activate_derivative functions

  • Test the functions with the provided code

Question 3 (2 points)

Now, let’s move onto the harder topics. The basic concept behind how this perceptron works is that it will take in an input, calculate the predicted value, find the error between the predicted and true value, and then update the weights and bias based on this error and it’s learning rate. This process is then repeated for the set number of epochs.

In this sense, there are to main portions of the perceptron that need to be implemented: the forward and backward passes. The forward pass is the process of calculating the predicted value, and the backward pass is the process of updating the weights and bais based on the calculated error.

For this question, we will implement the compute_linear, predict, error, and backward_gradient functions.

The compute_linear function should calculate the linear combination of the input, weights, and bias, by computing the dot product of the input and weights and adding the bias term.

The predict function should compute the linear combination of the input and then apply the activation function to the result.

The error function should calculate the error between the predicted (y_pred) and true (y_true) values, ie true - predicted.

The backward_gradient should calculate the gradient of the error, which is simply the negative of the error multiplied by the activation derivative of the linear combination.

To test your functions, you can use the below code:

p = Perceptron(learning_rate=0.01, n_epochs=1000, activation='sigmoid')
p.weights = np.array([1, 2, 3])
p.bias = 4

test_X = np.array([1,2,3])
test_y = np.array([20])

l = p.compute_linear(test_X)
print(l) # should return 18
error = p.error(l, test_y)
print(error) # should return 2
gradient = p.backward_gradient(error, l)
print(gradient) # should return -36
pred = p.predict(test_X) # should return 0.9999999847700205
print(pred)
Deliverables
  • Implement the compute_linear, predict, error, and backward_gradient functions

  • Test the functions with the provided code

Question 4 (2 points)

Now that we have implemented all of our helper functions, we can implement our train function.

Firstly, if the argument 'reset_weights' is true, or if reset_weights is false but the weights and bias are not set, we will initialize our weights to a np array of zeros with the same length as the number of features in our input data. We will also initialize our bias to 0. In any other case, we will not modify the weights and bias.

Then, this function will train the perceptron on the given training data. For each datapoint in the training data, we will get the linear combination of the input and the predicted value through our activation function. Then, we will compute the error and get the backward gradient. Then, we will calculate the gradient for our weights (simply the input times the backward gradient) and the gradient for our bias (simply the backward gradient). Finally, we will update the weights and bias by multiplying the gradients by the learning rate, and subtracting them from the current weights and bias. This process will be repeated for the set number of epochs.

In this case, we are updating the weights and bias after every datapoint. This is commonly known as Stochastic Gradient Descent (SGD). Another common method is to calculate our error for every datapoint in the epoch, and then update the weights and bias based on the average error at the end of each epoch. This method is known as Batch Gradient Descent (BGD). A more sophisticated called Mini-Batch Gradient Descent (MBGD) is a combination of the two philosophies, where we group our data into small batches and update our weights and bias after each batch. This results in more weight/bias updates than BGD, but less than SGD.

In order to test your function, we will create a perceptron and train it on the Boston Housing dataset. We will then print the weights and bias of the perceptron.

np.random.seed(3)
p = Perceptron(learning_rate=0.01, n_epochs=1000, activation='linear')
p.train(X_train, y_train)
print(p.weights)
print(p.bias)

If you implemented the functions correctly, you should see the following output:

[-1.08035188  0.47131981  0.09222406  0.46998928 -1.90914324  3.14497775
 -0.01770744 -3.04430895  2.62947786 -1.84244828 -2.03654589  0.79672007
 -2.79553875]
[22.44124231]
Deliverables
  • Implement the train function

  • Test the function with the provided code

Question 5 (2 points)

Finally, let’s implement the test function. This function will test the perceptron on the given test data. This function should return our summary statistics from the previous project (mean squared error, root mean squared error, mean absolute error, and r squared) in a dictionary.

To test your function, you can use the below code:

p.test(X_test, y_test)

If you implemented the function correctly, you should see the following output:

{'mean_squared_error': 23.013, 'root_mean_squared_error': 4.797, 'mean_absolute_error': 3.394, 'r_squared': 0.719}
Deliverables
  • Implement the test function

  • Test the function with the provided code

Question 6 (2 points)

As mentioned in question 4, there are multiple different methods for updating the weights and bias of our class. In this question, please add the following outline to your function:

  • Rename the train function to train_sgd

  • Add the following function signatures:

def train_bgd(self, X, y):
    pass

def train_mbgd(self, X, y, n_batches=16):
    pass

def train(self, X, y, method='sgd', n_batches=16):
    pass

After you have added these signatures to your class, please implement the train_bgd function, which will train the perceptron using Batch Gradient Descent as described in question 4. This function should calculate the weight/bias gradients for every point in the dataset, and then update the weights and bias based on the average gradients at the end of each epoch.

Additionally, please implement the train function to function as a selector for the different training methods. If method is set to 'sgd', the function should call the train_sgd function. If method is set to 'bgd', the function should call the train_bgd function. If method is set to 'mbgd', the function should call the train_mbgd function. If method is set to anything else, the function should raise a ValueError.

To test your functions, you can use the below code:

np.random.seed(3)
p = Perceptron(learning_rate=0.1, n_epochs=1000, activation='linear')
p.train(X_train, y_train, method='bgd')
print(p.weights)
print(p.bias)
p.test(X_test, y_test)

If you implemented the function correctly, you should see the following output:

[-1.01203489  0.86314322  0.12818681  0.80290412 -2.02780693  3.08686583
  0.04321048 -3.00595432  2.64831884 -1.92232099 -2.03927489  0.8549853
 -3.67072291]
[22.68586412]
{'mse': 19.14677015365128,
 'rmse': 4.375702246914349,
 'mae': 3.336506171166659,
 'r_squared': 0.6898903916034843}

Question 7 (2 points)

Finally, please implement the train_mbgd function. This function will train the perceptron using Mini-Batch Gradient Descent as described in question 4. This function should split our data into n_batches number of batches, and then update the weights and bias based on the average gradients at the end of each batch.

You should use the np.array_split function to split the data into batches. This function will return a list of numpy arrays, where each array is a batch of data. You can then loop through this list to update the weights and bias for each batch.

To test your functions, you can use the below code:

np.random.seed(3)
p = Perceptron(learning_rate=0.1, n_epochs=1000, activation='linear')
p.train(X_train, y_train, method='mbgd')
print(p.weights)
print(p.bias)
p.test(X_test, y_test)

If you implemented the function correctly, you should see the following output:

[-0.97274486  0.67793429  0.08464404  0.72503617 -1.91926787  3.18789867
  0.01581749 -2.97858639  2.61498091 -1.97518827 -2.00677852  0.89807989
 -3.26179108]
[22.52676272]
{'mse': 19.022979683470613,
 'rmse': 4.361534097478846,
 'mae': 3.2954565543935757,
 'r_squared': 0.6918953571367247}

Now that we have implemented SGD, BGD, and MBGD, let’s compare the mean squared error of each method at each epoch. To do this, we create a new function called train_mse, that will test the perceptron on the test data at the end of each epoch and store the mean squared error in a list. We will then plot this list to compare the performance of each method.

A common mistake is to create the perceptron object with n_epochs=n_epochs. If you do this, the perceptron will train for n_epochs, n_epochs times. Instead, you should create the perceptron object with n_epochs=1, and then call the train function with reset_weights=False, n_epochs times.

Here is the outline of the function:

def train_mse(X_train, y_train, X_test, y_test, learning_rate=0.01, n_epochs=1000, method='sgd'):
    pass

To test your functions, you can use the below code:

np.random.seed(3)
sgd_data = train_mse(X_train, y_train, X_test, y_test, learning_rate=0.1, n_epochs=500, method='sgd')
bgd_data = train_mse(X_train, y_train, X_test, y_test, learning_rate=0.1, n_epochs=500, method='bgd')
mbgd_data = train_mse(X_train, y_train, X_test, y_test, learning_rate=0.1, n_epochs=500, method='mbgd')

import matplotlib.pyplot as plt
plt.plot(sgd_data, label='SGD')
plt.plot(bgd_data, label='BGD')
plt.plot(mbgd_data, label='MBGD')

plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.show()
  • Implement the train_mbgd function

  • Implement the train_mse function

  • Test the functions with the provided code

  • How do the graphs of the mean squared error compare between the three methods? Which method do you think is the best?

Submitting your Work

Items to submit
  • firstname_lastname_project10.ipynb

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, comments (in markdown or with hashtags), and code output, even though it may not. Please take the time to double check your work. See the instructions on how to double check your submission.

You will not receive full credit if your .ipynb file submitted in Gradescope does not show all of the information you expect it to, including the output for each question result (i.e., the results of running your code), and also comments about your work on each question. Please ask a TA if you need help with this. Please do not wait until Friday afternoon or evening to complete and submit your work.