Decision trees

Decision trees are a class of supervised learning algorithms like a flow chart that consists of a sequence of nodes, where the values for a sample are used to make a decision on the next node to go to.  

The following example gives a very good idea of how decision trees are a class of supervised learning algorithms:

As with most classification algorithms, there are two stages to using them:

  • The first stage is the training stage, where a tree is built using training data. While the nearest neighbor algorithm from the previous chapter did not have a training phase, it is needed for decision trees. In this way, the nearest neighbor algorithm is a lazy learner, only doing any work when it needs to make a prediction. In contrast, decision trees, like most classification methods, are eager learners, undertaking work at the training stage and therefore needing to do less in the predicting stage.
  • The second stage is the predicting stage, where the trained tree is used to predict the classification of new samples. Using the previous example tree, a data point of ["is raining", "very windy"] would be classed as bad weather.

There are many algorithms for creating decision trees. Many of these algorithms are iterative. They start at the base node and decide the best feature to use for the first decision, then go to each node and choose the next best feature, and so on. This process is stopped at a certain point when it is decided that nothing more can be gained from extending the tree further.

The scikit-learn package implements the Classification and Regression Trees (CART) algorithm as its default dDecision tree class, which can use both categorical and continuous features.