During my learning time, I came across one awesome channel which helped me understand and implement. I believe watching helps one retain better than just reading, so I am going to leave that link here.
Implementation will be done on Python as the title suggests and the dataset is a teeny tiny one that is used in the codebasics video given above, you can get it here.
So let's start with importing the libraries we need. Pandas, NumPy, and matplotlib are all we need. After importing them, upload the data.
We will implement all three gradient descent. The first implementation will be of Batch Gradient Descent. Before we define the function of GD, we have to scale the features.
Why scaling?
You can see that area, bedroom, and price are three different features. So to make them come all in unison we have to scale them. Scaling will help to range the values between 0 and 1. We will use MinMax Scaler () to scale the data.
After scaling the data, we will define the function of Gradient descent:
def gradient_descent(x,y,epochs, learning_rate=0.01): number_of_features=x.shape[1] w = np.ones(shape=(number_of_features))#weights b=0 #bias total_samples= x.shape[0] # number of rows in X
The number of features will be 2 so that's what we are setting in. W is weights and b is bias. The formula we are using :
Total_samples is the number of records we have in our dataset.
cost_list=[] epoch_list=[]
We will also declare two lists that will capture our cost and epoch so we can plot a graph to see what's happening:
for i in range(epochs): y_pred=np.dot(w, x.T) + b
# This is the same as y=mx+b. x.T is transpose of x # Next step is derivative. This formula can be taken in as it is or you can go ahead and indulge yourself in deep maths. w_grad = -(2/total_samples)*(x.T.dot(y-y_pred)) b_grad = -(2/total_samples)*np.sum(y-y_pred)
# Second step is to adjust the weights using derivative w = w — learning_rate * w_grad b = b — learning_rate * b_grad
#cost is the mean squared error cost=np.mean(np.square(y-y_pred))
#We need to save the cost and epoch value of some iterations so we can plot the graph. We will go with every 10th value. The values will get added to the cost and epoch list we made
if i %10==0: cost_list.append(cost) epoch_list.append(i)
#Return the weight, bias, cost list, and epoch list
i.e. w1 is 0.7071, w2 is 0.6745 and bias is -0.2303
Now we need to define one predict function for our BGD:
def predict(Area,Bedroom,w,b): scaled_X = sx.transform([[Area, Bedroom]])[0] # here w1 = w[0] , w2 = w[1], and bias is b # equation for price is w1*area + w2*bedrooms + bias # scaled_X[0] is area # scaled_X[1] is bedroom scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b # Once we get the price prediction we need to rescale it back to the original value # Also since it returns a 2D array, to get a single value we need to do value[0][0]if you don't add it your price will come with 2 sets of brackets return sy.inverse_transform([[scaled_price]])[0][0]
#we will predict the price for 2600 sq ft and 4 bedroom house:
predict(2600,4,w,b)
128.45484403267596
close enough!. I think you would like to see the graph for which we saved cost and epoch. To print a graph:
The function for stochastic and mini is almost the same with just a little adjustment. For stochastic, we will use random to select a random sample from the dataset.
#We will import random
import random def SGD(X, y_true, epochs, learning_rate = 0.01):
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to the number of features. # In our case number_of_features = 2(area, bedroom )
w = np.ones(shape=(number_of_features)) b = 0 total_samples = X.shape[0]
cost_list = [] epoch_list = []
for i in range(epochs): random_index = random.randint(0,total_samples-1)
# random index from total samples sample_x = X[random_index] sample_y = y_true[random_index]
number_of_features = X.shape[1] w = np.ones(shape=(number_of_features)) b = 0 total_samples = X.shape[0] if batch_size > total_samples: # In this case mini batch becomes the same as batch gradient descent batch_size = total_samples
cost_list = [] epoch_list = []
num_batches = int(total_samples/batch_size)
for i in range(epochs): random_indices = np.random.permutation(total_samples) X_tmp = X[random_indices] y_tmp = y_true[random_indices]
for j in range(0,total_samples,batch_size): Xj = X_tmp[j:j+batch_size] yj = y_tmp[j:j+batch_size] y_predicted = np.dot(w, Xj.T) + b
So with this, we have completed the implementation of all three gradient descent algorithms with their plots. I hope I made it understandable for you guys. Like this and help to spread the love.
Comments
Post a Comment