Sklearn Feature Scaling with StandardScaler, MinMaxScaler, RobustScaler and MaxAbsScaler

Sklearn Feature Scaling with StandardScaler, MinMaxScaler, RobustScaler and MaxAbsScaler

In Sklearn standard scaling is applied using StandardScaler() function of sklearn.preprocessing module.

Min-Max Normalization

In Min-Max Normalization, for any given feature, the minimum value of that feature gets transformed to 0 while the maximum value will transform to 1 and all other values are normalized between 0 and 1. This method however has a drawback as it is sensitive to outliers.

In Sklearn MaxAbs-Scaler is applied using MaxAbsScaler() function of sklearn.preprocessing module.

Robust-Scaler

Loading Dataset

Next, we load the dataset in a data frame and drop the non-numerical feature ocean_proximity. The top 10 rows of the dataset are then observed.
In [3]:
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

# Creating Regression Model
clf = KNeighborsRegressor()
clf.fit(X_train, y_train)

# Accuracy on Tesing Data
clf.predict(X_test)
score=clf.score(X_test,y_test)
print("Accuracy for our testing dataset without Feature scaling is : {:.3f}%".format(score*100) )
Out[3]:
First, the dataset is split into train and test. Then a StandardScaler object is created using which the training dataset is fit and transformed and with the same object, the test dataset is also transformed.
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating StandardScaler Object
scaler = preprocessing.StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#Seeing the scaled values of X_train
X_train.head()
Just like earlier, a MinMaxScaler object is created using which the training dataset is fit and transformed and with the same object, the test dataset is transformed.
In [6]:
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating MinMax Object
mm = preprocessing.MinMaxScaler()

X_train = mm.fit_transform(X_train)
X_test = mm.transform(X_test)

#Seeing the scaled values of X_train
X_train.head()
# Creating Regression Model
model=KNeighborsRegressor() 
model.fit(X_train,y_train)

# Accuracy on Tesing Data
y_test_hat=model.predict(X_test) 
score=model.score(X_test,y_test)
print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )
Out [7]:

In [8]:

# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating MaxAbsScaler Object
mab=MaxAbsScaler()

X_train = mab.fit_transform(X_train)
X_test = mab.transform(X_test)

 

Next, we create the KNN regression model using the scaled data and it can be seen that the test accuracy is 99.38%

# Creating Regression Model
model=KNeighborsRegressor() 
model.fit(X_train,y_train)

# Accuracy on Tesing Data
y_test_hat=model.predict(X_test) 
score=model.score(X_test,y_test)
print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )
Out[9]:
In [10]:
# Train Test Split
X=df.iloc[:,:-1]
y=df.iloc[:,[7]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

#Creating RobustScaler Object
rob =RobustScaler()

X_train = rob.fit_transform(X_train)
X_test = rob.transform(X_test)

In [11]:

# Creating Regression Model
model=KNeighborsRegressor() 
model.fit(X_train,y_train)

# Accuracy on Tesing Data
y_test_hat=model.predict(X_test) 
score=model.score(X_test,y_test)
print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )

Summary

Follow Us

2 Responses

    1. The scaler objects have been created by fitting on the training dataset only. So there is no possibility of test data leaking into the training process.

Leave a Reply

Your email address will not be published. Required fields are marked *