Assumptions#

1. Recursive Partitioning Can Capture the True Pattern#

  • The algorithm assumes the data can be separated into subgroups that are relatively homogeneous in terms of the target class.

  • Example: splitting on “Weather = Sunny” should meaningfully separate classes like “Play Tennis” vs “Don’t Play”.


2. Features Have Predictive Power#

  • At least some features must carry information about the target.

  • Otherwise, splits won’t reduce impurity, and the tree won’t learn meaningful patterns.


3. No Linear/Distributional Assumptions#

  • Unlike regression models, decision trees don’t assume linearity between features and target.

  • They don’t assume normality of features or equal variance across classes. ✅ This makes them non-parametric and flexible.


4. Features Are Independent for Splitting#

  • At each split, the algorithm treats features independently and chooses the “best” one.

  • It does not assume feature independence globally, but locally at a split it ignores feature interactions unless they show up in deeper splits.


5. Sufficient Data for Each Split#

  • Assumes there’s enough data in each node to compute reliable impurity measures (Gini/Entropy).

  • Small datasets can make trees unstable (high variance).


6. Target Variable is Well-Defined#

  • Assumes that the target classes are mutually exclusive and exhaustive.

  • Example: A loan application is either “Approved” or “Rejected”, not both.


Summary

  • What Trees DON’T assume: linearity, normality, equal variance, feature scaling.

  • What Trees DO assume:

    • Recursive partitioning can separate data meaningfully.

    • Some features are predictive.

    • Enough samples exist per node to make good splits.


👉 This low number of assumptions is why Decision Trees work well in practice, especially when extended into ensembles (Random Forests, Gradient Boosting).