• Decision trees are classifiers
• Each node is a choice based on a certain attribute
• Each leaf is a result - a classificiation
• Want the decision tree to be as shallow as possible

When to use them:

• Instances are described by key/value pairs
• The target function is discrete
• Noisy data
• Missing attributes

Error rate:

• Percentage of times the tree gives the wrong answer based on its training data or its test data

Heuristic: Information Gain

• To biuld a decision tree, we have to work out which is the best attribute to partition on at any given point
• Entropy: The uncertainty or impurity of a data set - how much information it contains.
• Information gain: the amount of entropy we lose by asking a question.

Why you might not get an optimal tree:

• No backtracking
• Some point s have nodes where all possible choices have the same information gain, so it picks the first one. But another choice might give us a better set of subtrees.

Entropy: - cΣi=1 - ( pi log2 (pi) )

Information gain: - For an attribute: - Entropy(S) - Σvz (|Sv| / |S|) Entropy(Sv) - For each value add (num of eg / total eg) * entropy(positive eg, negative eg)