Ensemble Learning in Data Science

3 min readMay 15, 2021

Introduction

Ensemble Learning is a machine learning technique that combines several base models in order to produce one optimal predictive model. Its main goal is to increase the overall accuracy of the model.

Types of Ensemble Methods

Bagging or Bootstrap Aggregation

In this method, we combine bootstrapping and aggregation to form one ensemble model. From the dataset, multiple subsets are bootstrapped. After subsampling a decision tree is formed on each one of them. Now, an algorithm is used to aggregate over the decision tree to form the best possible predictor. The most common usage of bagging is done by the random forest algorithm.

Boosting

It is similar to bagging as it also creates an ensemble by resampling the data, which are then combined by majority voting. One of the main differences is that here the resampling is done strategically to provide the most informative training dataset for each consecutive classifier. In simple language, we create multiple weak learners and predict using voting.

Each iteration of boosting creates three weak classifiers:

The first classifier C1 is trained with a random subset of the dataset
The second classifier C2 is trained by choosing the most informative subset. Specifically, the dataset contains only half of the data points that were correctly classified but the classifier C1 and the other half is misclassified.
The third classifier C3 is trained with instances on which C1 and C2 both disagree.

The three classifiers give the result using a majority vote.

There are three types of boosting algorithms

Ada Boost: It is a specific boosting algorithm developed for classification problems. In each iteration, AdaBoost identifies miss-classified data points, increasing their weights so that the next classifier will pay extra attention to get them right
Gradient Boosting: It approaches the problem somewhat differently. Instead of adjusting weights like AdaBoost here, we focus on the difference between the prediction and the ground truth. Gradient boosting uses a differential loss function and works for both regression and classification.
XGBoost: It is a higher-level implementation of Gradient Boosting. The main reason behind the high popularity of XGBoost is scalability, which drives fast learning through parallel and distributed computing and offers efficient memory usage.

Stacking

It is a technique that combines multiple classifications and regression models via a meta classifier or regressor. The base-level models are trained on the outputs of base-level models as features. It is similar to the stacking of books, Starting from a heterogeneous model at the base level we try stacking different models so as to get better accuracies.