# Decision Trees

- Decision trees are classifiers
- Each node is a choice based on a certain attribute
- Each leaf is a result - a classificiation
- Want the decision tree to be as shallow as possible

When to use them:

- Instances are described by key/value pairs
- The target function is discrete
- Noisy data
- Missing attributes

Error rate:

- Percentage of times the tree gives the wrong answer based on its training data or its test data

Heuristic: Information Gain

- To biuld a decision tree, we have to work out which is the best attribute to partition on at any given point
- Entropy: The uncertainty or impurity of a data set - how much information it contains.
- Information gain: the amount of entropy we lose by asking a question.

Why you might not get an optimal tree:

- No backtracking
- Some point s have nodes where all possible choices have the same information gain, so it picks the first one. But another choice might give us a better set of subtrees.

Entropy: - cΣi=1 - ( pi log2 (pi) )

Information gain: - For an attribute: - Entropy(S) - Σvz (|Sv| / |S|) Entropy(Sv) - For each value add (num of eg / total eg) * entropy(positive eg, negative eg)