Assumptions#
1. Recursive Partitioning Can Capture the True Pattern#
The algorithm assumes the data can be separated into subgroups that are relatively homogeneous in terms of the target class.
Example: splitting on “Weather = Sunny” should meaningfully separate classes like “Play Tennis” vs “Don’t Play”.
2. Features Have Predictive Power#
At least some features must carry information about the target.
Otherwise, splits won’t reduce impurity, and the tree won’t learn meaningful patterns.
3. No Linear/Distributional Assumptions#
Unlike regression models, decision trees don’t assume linearity between features and target.
They don’t assume normality of features or equal variance across classes. ✅ This makes them non-parametric and flexible.
4. Features Are Independent for Splitting#
At each split, the algorithm treats features independently and chooses the “best” one.
It does not assume feature independence globally, but locally at a split it ignores feature interactions unless they show up in deeper splits.
5. Sufficient Data for Each Split#
Assumes there’s enough data in each node to compute reliable impurity measures (Gini/Entropy).
Small datasets can make trees unstable (high variance).
6. Target Variable is Well-Defined#
Assumes that the target classes are mutually exclusive and exhaustive.
Example: A loan application is either “Approved” or “Rejected”, not both.
Summary
What Trees DON’T assume: linearity, normality, equal variance, feature scaling.
What Trees DO assume:
Recursive partitioning can separate data meaningfully.
Some features are predictive.
Enough samples exist per node to make good splits.
👉 This low number of assumptions is why Decision Trees work well in practice, especially when extended into ensembles (Random Forests, Gradient Boosting).