Assumptions#
1. Random Forest is Non-Parametric#
Non-parametric means it doesn’t assume a specific form of the relationship between features and target (no linearity assumption).
It can model complex nonlinear relationships naturally.
Implication: You don’t need to transform features to fit a line or polynomial; trees handle splits automatically.
2. Key Implicit Assumptions#
While RF is flexible, it still makes a few practical assumptions:
A. Observations are Independent#
Random Forest assumes that training samples are independent.
Correlated or time-dependent samples (like time series) may require special handling.
Example:
In stock price prediction, consecutive days are correlated → standard RF may ignore temporal dependency.
B. Features Should Have Some Predictive Power#
Random Forest works best if at least some features are informative.
Including completely irrelevant features usually won’t hurt too much because RF selects random subsets, but too many noisy features may reduce performance.
C. Data Representativeness#
Training data should be representative of the population you want to predict.
Bagging (bootstrap sampling) assumes each sample is drawn from the same underlying distribution.
D. Decision Trees Assume Split Criteria are Meaningful#
RF splits nodes using measures like:
Gini Impurity or Entropy (classification)
Variance reduction / MSE (regression)
This assumes that splitting features can actually reduce impurity.
If all features are weak or unrelated, RF won’t perform well.
3. What Random Forest Does NOT Assume#
No linearity: Can capture nonlinear patterns
No normality: Features or target do not need to be normally distributed
No homoscedasticity: Variance of errors can vary
No feature scaling required: Trees are scale-invariant
4. Practical Notes#
RF is robust to outliers, missing values (some implementations), and feature correlations.
Correlated features reduce the diversity among trees, slightly decreasing the benefit of ensembling.
✅ Summary#
Aspect |
Assumption? |
Notes |
|---|---|---|
Feature-target relationship |
No (non-parametric) |
Can capture nonlinear patterns |
Observation independence |
Yes |
Samples should be independent |
Feature informativeness |
Yes (some features must help) |
Random feature selection mitigates irrelevant features |
Data representativeness |
Yes |
Training data should reflect population |
Scaling / normality |
No |
RF is scale-invariant and distribution-free |