Ensemble Learning in Data Science

Introduction

Types of Ensemble Methods

Bagging or Bootstrap Aggregation

Boosting

Each iteration of boosting creates three weak classifiers:

  • The first classifier C1 is trained with a random subset of the dataset
  • The second classifier C2 is trained by choosing the most informative subset. Specifically, the dataset contains only half of the data points that were correctly classified but the classifier C1 and the other half is misclassified.
  • The third classifier C3 is trained with instances on which C1 and C2 both disagree.

The three classifiers give the result using a majority vote.

There are three types of boosting algorithms

  1. Gradient Boosting: It approaches the problem somewhat differently. Instead of adjusting weights like AdaBoost here, we focus on the difference between the prediction and the ground truth. Gradient boosting uses a differential loss function and works for both regression and classification.
  2. XGBoost: It is a higher-level implementation of Gradient Boosting. The main reason behind the high popularity of XGBoost is scalability, which drives fast learning through parallel and distributed computing and offers efficient memory usage.

Stacking

Data Science Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store