Contents
- 1 Introduction
- 2 What is Microsoft Hummingbird?
- 3 Hands-on Example of Hummingbird Library
- 3.1 Installing Microsoft hummingbird library
- 3.2 Importing the libraries
- 3.3 Function for calculating Runtime
- 3.4 Specifying the parameters for the Model
- 3.5 Generation of Dataset
- 3.6 Splitting the dataset into training and testing subsets
- 3.7 Building a Random Forest Classifier Model using Sklearn library
- 3.8 Function for loading the model to a GPU
- 3.9 Converting sklearn model to Pytorch model
- 3.10 Performing prediction through Sklearn Model
- 3.11 Prediction with PyTorch Model built using Hummingbird Library
- 3.12 Reviewing the Results
- 4 Conclusion
Introduction
We have seen a lot of advancements in the field of deep learning where computationally heavy neural network models that are built on powerful hardware like GPU for accelerating the training process. At the moment only deep learning frameworks like Tensorflow, PyTorch, Keras can harness the power of GPU computation. But the other traditional machine learning models that are built using Scikit Learn models cannot leverage GPU power for faster processing. Recently Microsoft has released one of its kind open-source library called Hummmingbird that tries to address this gap somewhat.
What is Microsoft Hummingbird?
Microsoft Hummingbird is an open-source library that can be used for converting already trained traditional ML Models (that are not neural networks) into tensor-based computational models.
Tensors, which is a vectored matrix, are widely used to create neural network models in deep learning frameworks like PyTorch, Tensorflow, and Keras due to its fast computational abilities. Microsoft Hummingbird aims to use tensors for the faster processing of the inference program of pre-trained traditional ML models.
Capabilities and Features of Hummingbird
Hummingbird has come up with some unique features. Let’s look at them and understand how we can be benefitted from this innovation of Microsoft.
- Optimizing the model through neural network frameworks.
- Accelerating the model building process with advanced hardwares.
- Hummingbird has this unprecedented quality where it supports both traditional and neural network models.
- Re-Scaling and Re-Engineering of the models have been made easier.
Traditional and Neural Network Models supported by Hummingbird
Since hummingbird is at its inception stage, currently it supports PyTorch Framework as the backend to convert our traditional models to PyTorch based models. We can expect the inclusion of other deep learning frameworks as well in the near future.
Currently, we can use this open-source library for converting tree-based classifiers and regressors traditional models only which are as follows –
- Decision Trees
- Extra Trees
- Gradient Boosting
- HistGradient Boosting
- Random Forest
- LightGBM
- XGBoost
The developers of Hummingbird are also looking to add Linear Classifiers such as Linear Regression, Logistic Regression, etc. Along with this, Feature Selectors, Matrix Decomposition Methods, Feature Pre-Processing, and Text Featurizers are also planned to be added to this library.
Syntax of Hummingbird library
Well, this library doesn’t have a lot of functions that we have to interact with, we only have to deal with the convert function which is found in the hummingbird.ml.convert module.
Now let’s look at the convert function.
def convert(model,backend,test_input=None, extra_config={})
Through this function, an input tradition ML model can be converted to a tensor model. Currently, the convert function is able to work with Sklearn, LightGBM, and XGBoost models.
Arguments
- model: The input model that has to be converted
- backend: The output model.
- test_input: This input data is mostly used when model execution is tracked.
- extra_config: These extra configurations are used by individual operator converters. Generally, the number of features and tree implementation i.e. the depth of the tree is specified through these variables.
Hands-on Example of Hummingbird Library
Let us now analyze the performance of this hummingbird library. For this, we will be building a Random Forest Classifier(Tradition ML Model) using sklearn library. We’ll perform a binary classification using this model and then will review the time and memory consumption by this model. Furthermore, we’ll convert the above-constructed model to PyTorch based model(Neural Network Model or DNN Framework based). After this, the model built using PyTorch will be analyzed for its time and memory usage. At last, we’ll compare the results.
Installing Microsoft hummingbird library
!pip install hummingbird-ml
Importing the libraries
import matplotlib.pyplot as plt
import numpy as np
import time
import warnings
from hummingbird.ml import convert
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
%matplotlib inline
warnings.filterwarnings("ignore")
Function for calculating Runtime
Since we are focusing on reviewing the performance of the models through analyzing their time and memory consumption, we have built a function for calculating the time.
def timeit(method):
def timed(*args,**kw):
ts = time.time()
result = method(*args,**kw)
te = time.time()
if 'log_time' in kw:
name = kw.get('log_name',method.__name__.upper())
kw['log_time'][name] = int((te - ts) * 1000)
else:
print('%r %2.2f ms' % \
(method.__name__,(te - ts) * 1000))
return result
return timed
Specifying the parameters for the Model
Here we are specifying the variables that will be used for building and training of the models.
num_classes = 2
num_of_samples = 10000000
num_of_features = 50
Generation of Dataset
With the help of numpy, we are generating a random dataset with parameters specified before for the model.
X = np.array(np.random.rand(num_of_samples,num_of_features),dtype=np.float32)
y = np.random.randint(num_classes,size=num_of_samples)
X[0:2]
y[0:2]
Splitting the dataset into training and testing subsets
We will be using 25% of dataset for testing purposes.
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=42)
Building a Random Forest Classifier Model using Sklearn library
sk_model = RandomForestClassifier(n_estimators=10,max_depth=9)
sk_model.fit(X_train,y_train)
Function for loading the model to a GPU
In this custom-built function, we are loading the model to a GPU.
@timeit
def output_prediction(inp_model,test_x,test_y,is_humming):
if is_humming == 1:
inp_model.to('cuda')
return inp_model.predict(test_x)
Converting sklearn model to Pytorch model
model = convert(sk_model,'pytorch')
Performing prediction through Sklearn Model
We can see that the time taken for producing the result using RandomForest Classifier built with the help of sklearn library is 1965.80 ms
y_pred_sklearn = output_prediction(sk_model,X_test,y_test,0)
Prediction with PyTorch Model built using Hummingbird Library
When we look at PyTorch Model built using Hummingbird Library, the time taken is significantly less, 326.11 ms to be precise.
y_pred_humming = output_prediction(model,X_test,y_test,1)
Reviewing the Results
Memory Usage
As we can see the Random Forest Classifier has used up 5.90 GB of RAM (on CPU).

Â
When we look at the memory usage of PyTorch based model, it has used 6.15 of RAM and 1.91 GB of GPU.

Runtime
We can see that sklearn’s random forest classifier model completes its execution in 1965.80 ms, whereas PyTorch based model built through hummingbird library takes only 326.11 ms

Â

Conclusion
We hope this article gave you a good insight int the new open-sourced library Hummingbird released by Microsoft. We looked at the features of this library and its main functionality and cover an example where we converted the ML model into the PyTorch model using hummingbird library and compared their performances.
You can find more details and the latest update on the official GitHub page of Microsoft Hummingbird Library.