.. ******************************************************************************
.. * Copyright 2021 Intel Corporation
.. *
.. * Licensed under the Apache License, Version 2.0 (the "License");
.. * you may not use this file except in compliance with the License.
.. * You may obtain a copy of the License at
.. *
.. *     http://www.apache.org/licenses/LICENSE-2.0
.. *
.. * Unless required by applicable law or agreed to in writing, software
.. * distributed under the License is distributed on an "AS IS" BASIS,
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. * See the License for the specific language governing permissions and
.. * limitations under the License.
.. *******************************************************************************/

.. _dt:

Decision Tree
*************

Decision trees partition the feature space into a set of hypercubes,
and then fit a simple model in each hypercube. The simple model can
be a prediction model, which ignores all predictors and predicts the
majority (most frequent) class (or the mean of a dependent variable
for regression), also known as 0-R or constant classifier.

Decision tree induction forms a tree-like graph structure as shown in
the figure below, where:

-  Each internal (non-leaf) node denotes a test on one of the features
-  Each branch descending from a non-leaf node corresponds to an outcome of the
   test
-  Each external node (leaf) denotes the mentioned simple model

.. figure:: images/decision-tree-structure.png
  :width: 600
  :alt:

  Decision Tree Structure

A test is a rule for partitioning the feature space. A test
depends on feature values. Each outcome of a test represents an
appropriate hypercube associated with both the test and one of the
descending branches.

If a test is a Boolean expression (for
example, :math:`f < c` or :math:`f = c`, where :math:`f` is a feature and :math:`c` is a constant fitted
during decision tree induction), the inducted decision tree is a
binary tree, so its non-leaf nodes have exactly two branches,
'true' and 'false', each corresponding to the result of the Boolean
expression.

Prediction is performed by starting at the root node of the tree,
testing features by the test specified in this node, then moving down
the tree branch corresponding to the outcome of the test for the
given sample. This process is then repeated for the subtree rooted
at the node, discovered at the selected branch. The final result is the prediction of the simple
model at the leaf node.

Decision trees are often used in ensemble algorithms, such as boosting, bagging, or decision forest.