- Decision trees are classifiers
- Each node is a choice based on a certain attribute
- Each leaf is a result - a classificiation
- Want the decision tree to be as shallow as possible
When to use them:
- Instances are described by key/value pairs
- The target function is discrete
- Noisy data
- Missing attributes
- Percentage of times the tree gives the wrong answer based on its training data or its test data
Heuristic: Information Gain
- To biuld a decision tree, we have to work out which is the best attribute to partition on at any given point
- Entropy: The uncertainty or impurity of a data set - how much information it contains.
- Information gain: the amount of entropy we lose by asking a question.
Why you might not get an optimal tree:
- No backtracking
- Some point s have nodes where all possible choices have the same information gain, so it picks the first one. But another choice might give us a better set of subtrees.
Entropy: - cΣi=1 - ( pi log2 (pi) )
Information gain: - For an attribute: - Entropy(S) - Σvz (|Sv| / |S|) Entropy(Sv) - For each value add (num of eg / total eg) * entropy(positive eg, negative eg)