Sklearn Feature Scaling with StandardScaler, MinMaxScaler, RobustScaler and MaxAbsScaler

In Sklearn standard scaling is applied using StandardScaler() function of sklearn.preprocessing module.

Min-Max Normalization

In Min-Max Normalization, for any given feature, the minimum value of that feature gets transformed to 0 while the maximum value will transform to 1 and all other values are normalized between 0 and 1. This method however has a drawback as it is sensitive to outliers.

In Sklearn MaxAbs-Scaler is applied using MaxAbsScaler() function of sklearn.preprocessing module.

Robust-Scaler

Loading Dataset

Next, we load the dataset in a data frame and drop the non-numerical feature ocean_proximity. The top 10 rows of the dataset are then observed.
In [3]:
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

# Creating Regression Model
clf = KNeighborsRegressor()
clf.fit(X_train, y_train)

# Accuracy on Tesing Data
clf.predict(X_test)
score=clf.score(X_test,y_test)
print("Accuracy for our testing dataset without Feature scaling is : {:.3f}%".format(score*100) )
Out[3]:
First, the dataset is split into train and test. Then a StandardScaler object is created using which the training dataset is fit and transformed and with the same object, the test dataset is also transformed.
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating StandardScaler Object
scaler = preprocessing.StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#Seeing the scaled values of X_train
X_train.head()
Just like earlier, a MinMaxScaler object is created using which the training dataset is fit and transformed and with the same object, the test dataset is transformed.
In [6]:
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating MinMax Object
mm = preprocessing.MinMaxScaler()

X_train = mm.fit_transform(X_train)
X_test = mm.transform(X_test)

#Seeing the scaled values of X_train
X_train.head()
# Creating Regression Model
model=KNeighborsRegressor() 
model.fit(X_train,y_train)

# Accuracy on Tesing Data
y_test_hat=model.predict(X_test) 
score=model.score(X_test,y_test)
print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )
Out [7]:

In [8]:

# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating MaxAbsScaler Object
mab=MaxAbsScaler()

X_train = mab.fit_transform(X_train)
X_test = mab.transform(X_test)

 

Next, we create the KNN regression model using the scaled data and it can be seen that the test accuracy is 99.38%

# Creating Regression Model
model=KNeighborsRegressor() 
model.fit(X_train,y_train)

# Accuracy on Tesing Data
y_test_hat=model.predict(X_test) 
score=model.score(X_test,y_test)
print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )
Out[9]:
In [10]:
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating RobustScaler Object
rob =RobustScaler()

X_train = rob.fit_transform(X_train)
X_test = rob.transform(X_test)

In [11]:

# Creating Regression Model
model=KNeighborsRegressor() 
model.fit(X_train,y_train)

# Accuracy on Tesing Data
y_test_hat=model.predict(X_test) 
score=model.score(X_test,y_test)
print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )

Summary

  • Veer Kumar

    I am passionate about Analytics and I am looking for opportunities to hone my current skills to gain prominence in the field of Data Science.

Follow Us

2 Responses

    1. The scaler objects have been created by fitting on the training dataset only. So there is no possibility of test data leaking into the training process.

Leave a Reply

Your email address will not be published. Required fields are marked *