Assumptions#

1. Random Forest is Non-Parametric#

  • Non-parametric means it doesn’t assume a specific form of the relationship between features and target (no linearity assumption).

  • It can model complex nonlinear relationships naturally.

Implication: You don’t need to transform features to fit a line or polynomial; trees handle splits automatically.


2. Key Implicit Assumptions#

While RF is flexible, it still makes a few practical assumptions:

A. Observations are Independent#

  • Random Forest assumes that training samples are independent.

  • Correlated or time-dependent samples (like time series) may require special handling.

Example:

  • In stock price prediction, consecutive days are correlated → standard RF may ignore temporal dependency.


B. Features Should Have Some Predictive Power#

  • Random Forest works best if at least some features are informative.

  • Including completely irrelevant features usually won’t hurt too much because RF selects random subsets, but too many noisy features may reduce performance.


C. Data Representativeness#

  • Training data should be representative of the population you want to predict.

  • Bagging (bootstrap sampling) assumes each sample is drawn from the same underlying distribution.


D. Decision Trees Assume Split Criteria are Meaningful#

  • RF splits nodes using measures like:

    • Gini Impurity or Entropy (classification)

    • Variance reduction / MSE (regression)

  • This assumes that splitting features can actually reduce impurity.

  • If all features are weak or unrelated, RF won’t perform well.


3. What Random Forest Does NOT Assume#

  • No linearity: Can capture nonlinear patterns

  • No normality: Features or target do not need to be normally distributed

  • No homoscedasticity: Variance of errors can vary

  • No feature scaling required: Trees are scale-invariant


4. Practical Notes#

  • RF is robust to outliers, missing values (some implementations), and feature correlations.

  • Correlated features reduce the diversity among trees, slightly decreasing the benefit of ensembling.


Summary#

Aspect

Assumption?

Notes

Feature-target relationship

No (non-parametric)

Can capture nonlinear patterns

Observation independence

Yes

Samples should be independent

Feature informativeness

Yes (some features must help)

Random feature selection mitigates irrelevant features

Data representativeness

Yes

Training data should reflect population

Scaling / normality

No

RF is scale-invariant and distribution-free