• Decision trees are classifiers
  • Each node is a choice based on a certain attribute
  • Each leaf is a result - a classificiation
  • Want the decision tree to be as shallow as possible

When to use them:

  • Instances are described by key/value pairs
  • The target function is discrete
  • Noisy data
  • Missing attributes

Error rate:

  • Percentage of times the tree gives the wrong answer based on its training data or its test data

Heuristic: Information Gain

  • To biuld a decision tree, we have to work out which is the best attribute to partition on at any given point
  • Entropy: The uncertainty or impurity of a data set - how much information it contains.
  • Information gain: the amount of entropy we lose by asking a question.

Why you might not get an optimal tree:

  • No backtracking
  • Some point s have nodes where all possible choices have the same information gain, so it picks the first one. But another choice might give us a better set of subtrees.

Entropy: - cΣi=1 - ( pi log2 (pi) )

Information gain: - For an attribute: - Entropy(S) - Σvz (|Sv| / |S|) Entropy(Sv) - For each value add (num of eg / total eg) * entropy(positive eg, negative eg)