Random Forest#
Random Forest (RF) is an ensemble learning method for regression and classification.
It builds multiple Decision Trees and combines their predictions.
Essentially, it’s “a forest of decision trees” where each tree votes (for classification) or averages (for regression).
Key Idea: Combining many weak learners (trees) reduces overfitting and improves generalization.
2. How Random Forest Works (Intuition)#
Bootstrap Sampling (Bagging):
Randomly sample data with replacement for each tree.
Each tree gets a slightly different dataset → introduces diversity.
Random Feature Selection:
At each split in a tree, only a random subset of features is considered.
Prevents trees from being too correlated (e.g., one strong feature dominating all trees).
Tree Training:
Each tree is trained independently using its bootstrapped dataset and random features.
Prediction Aggregation:
Regression: Average the predictions of all trees.
Classification: Majority vote among trees.
3. Why Random Forest Works#
Reduces overfitting: Individual trees may overfit, but averaging predictions smooths errors.
Handles high-dimensional data: Random feature selection prevents a single feature from dominating splits.
Robust to noise: Noise in training data affects individual trees, but not the ensemble.
4. Key Hyperparameters in Random Forest#
Hyperparameter |
Description |
Effect |
|---|---|---|
|
Number of trees in the forest |
More trees → better performance, slower training |
|
Maximum depth of each tree |
Controls overfitting |
|
Min samples required to split a node |
Higher → simpler trees |
|
Min samples required at a leaf |
Prevents leaves with very few samples |
|
Number of features to consider at each split |
Lower → more diversity, higher bias |
|
Whether to use bootstrap sampling |
Usually True (bagging) |
|
Split quality ( |
How splits are chosen |
5. Advantages of Random Forest#
Handles both regression and classification.
Works well on nonlinear data without much feature engineering.
Less prone to overfitting than a single Decision Tree.
Can compute feature importance.
Robust to outliers and noise.
6. Disadvantages / Considerations#
Slower than a single Decision Tree (more trees → more computation).
Less interpretable than a single Decision Tree (ensemble is a “black box”).
Needs careful tuning of
n_estimators,max_depth,max_featuresfor optimal performance.
7. Random Forest Intuition (Visual)#
Imagine predicting house prices:
Each tree learns different patterns from random subsets of houses and features.
One tree might overpredict some areas, another underpredicts others.
Averaging all trees gives a more accurate and stable prediction.