Decision Forest Classification and Regression (DF)¶
Decision Forest (DF) classification and regression algorithms are based on an ensemble of tree-structured classifiers, which are known as decision trees. Decision forest is built using the general technique of bagging, a bootstrap aggregation, and a random choice of features. For more details, see [Breiman84] and [Breiman2001].
| Operation | Computational methods | Programming Interface | |||
Mathematical formulation¶
Refer to Developer Guide: Decision Forest Classification and Regression.
Programming Interface¶
All types and functions in this section are declared in the
oneapi::dal::decision_forest namespace and are available via inclusion of the
oneapi/dal/algo/decision_forest.hpp header file.
Enum classes¶
- 
enum class error_metric_mode¶
- error_metric_mode::none
- Do not compute error metric. 
- error_metric_mode::out_of_bag_error
- Train produces \(1 \times 1\) table with cumulative prediction error for out of bag observations. 
- error_metric_mode::out_of_bag_error_per_observation
- Train produces \(n \times 1\) table with prediction error for out-of-bag observations. 
 
- 
enum class variable_importance_mode¶
- variable_importance_mode::none
- Do not compute variable importance. 
- variable_importance_mode::mdi
- Mean Decrease Impurity. Computed as the sum of weighted impurity decreases for all nodes where the variable is used, averaged over all trees in the forest. 
- variable_importance_mode::mda_raw
- Mean Decrease Accuracy (permutation importance). For each tree, the prediction error on the out-of-bag portion of the data is computed (error rate for classification, MSE for regression). The same is done after permuting each predictor variable. The difference between the two are then averaged over all trees. 
- variable_importance_mode::mda_scaled
- Mean Decrease Accuracy (permutation importance). This is MDA_Raw value scaled by its standard deviation. 
 
- 
enum class infer_mode¶
- infer_mode::class_labels
- Infer produces a \(n \times 1\) table with the predicted labels. 
- infer_mode::class_responses
- deprecated. 
- infer_mode::class_probabilities
- Infer produces \(n \times c\) table with the predicted class probabilities for each observation. 
 
- 
enum class voting_mode¶
- voting_mode::weighted
- The final prediction is combined through a weighted majority voting. 
- voting_mode::unweighted
- The final prediction is combined through a simple majority voting. 
 
Descriptor¶
- 
template<typename Float= float, typenameMethod= method::by_default, typenameTask= task::by_default>
 classdescriptor¶
- Template Parameters
- Float – The floating-point type that the algorithm uses for intermediate computations. Can be - floator- double.
- Method – Tag-type that specifies an implementation of algorithm. Can be - method::denseor- method::hist.
- Task – Tag-type that specifies type of the problem to solve. Can be - task::classificationor- task::regression.
 
 - Constructors - 
descriptor() = default¶
- Creates a new instance of the class with the default property values. 
 - Properties - 
std::int64_t max_tree_depth¶
- The maximal depth of the tree. If 0, then nodes are expanded until all leaves are pure or until all leaves contain less or equal to min observations in leaf node samples. Default value: 0. - Getter & Setter
- std::int64_t get_max_tree_depth() const- auto & set_max_tree_depth(std::int64_t value)
 
 - 
std::int64_t max_bins¶
- The maximal number of discrete bins to bucket continuous features. Used with - method::histsplit-finding method only. Increasing the number results in higher computation costs. Default value: 256.- Getter & Setter
- std::int64_t get_max_bins() const- auto & set_max_bins(std::int64_t value)
- Invariants
- max_bins > 1
 
 - 
std::int64_t min_observations_in_split_node¶
- The minimal number of observations in a split node. Default value: 2. - Getter & Setter
- std::int64_t get_min_observations_in_split_node() const- auto & set_min_observations_in_split_node(std::int64_t value)
- Invariants
 
 - 
infer_mode infer_mode¶
- The infer mode. Used with - task::classificationonly.- Getter & Setter
- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> infer_mode get_infer_mode() const- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_infer_mode(infer_mode value)
 
 - 
double min_weight_fraction_in_leaf_node¶
- The min weight fraction in a leaf node. The minimum weighted fraction of the total sum of weights (of all input observations) required to be at a leaf node. Default value: 0.0. - Getter & Setter
- double get_min_weight_fraction_in_leaf_node() const- auto & set_min_weight_fraction_in_leaf_node(double value)
- Invariants
 
 - 
std::int64_t tree_count¶
- The number of trees in the forest. Default value: 100. - Getter & Setter
- std::int64_t get_tree_count() const- auto & set_tree_count(std::int64_t value)
- Invariants
- tree_count > 0
 
 - 
std::int64_t seed¶
- Seed for the random numbers generator used by the algorithm. - Getter & Setter
- std::int64_t get_seed() const- auto & set_seed(std::int64_t value)
- Invariants
- tree_count > 0
 
 - 
std::int64_t max_leaf_nodes¶
- The maximal number of the leaf nodes. If 0, the number of leaf nodes is not limited. Default value: 0. - Getter & Setter
- std::int64_t get_max_leaf_nodes() const- auto & set_max_leaf_nodes(std::int64_t value)
 
 - 
bool bootstrap¶
- The bootstrap mode, if true, the training set for a tree is a bootstrap of the whole training set, if False, the whole dataset is used to build each tree. Default value: true. - Getter & Setter
- bool get_bootstrap() const- auto & set_bootstrap(bool value)
 
 - 
variable_importance_mode variable_importance_mode¶
- The variable importance mode. Default value: variable_importance_mode::none. - Getter & Setter
- variable_importance_mode get_variable_importance_mode() const- auto & set_variable_importance_mode(variable_importance_mode value)
 
 - 
double impurity_threshold¶
- The impurity threshold, a node will be split if this split induces a decrease of the impurity greater than or equal to the input value. Default value: 0.0. - Getter & Setter
- double get_impurity_threshold() const- auto & set_impurity_threshold(double value)
- Invariants
- impurity_threshold >= 0.0
 
 - 
bool memory_saving_mode¶
- The memory saving mode. Default value: false. - Getter & Setter
- bool get_memory_saving_mode() const- auto & set_memory_saving_mode(bool value)
 
 - 
voting_mode voting_mode¶
- The voting mode. Used with - task::classificationonly.- Getter & Setter
- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> voting_mode get_voting_mode() const- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_voting_mode(voting_mode value)
 
 - 
std::int64_t min_bin_size¶
- The minimal number of observations in a bin. Used with - method::histsplit-finding method only. Default value: 5.- Getter & Setter
- std::int64_t get_min_bin_size() const- auto & set_min_bin_size(std::int64_t value)
- Invariants
- min_bin_size > 0
 
 - 
std::int64_t features_per_node¶
- The number of features to consider when looking for the best split for a node. Default value: task::classification ? sqrt(p) : p/3, where p is the total number of features. - Getter & Setter
- std::int64_t get_features_per_node() const- auto & set_features_per_node(std::int64_t value)
 
 - 
double min_impurity_decrease_in_split_node¶
- The min impurity decrease in a split node is a threshold for stopping the tree growth early. A node will be split if its impurity is above the threshold, otherwise it is a leaf. Default value: 0.0. - Getter & Setter
- double get_min_impurity_decrease_in_split_node() const- auto & set_min_impurity_decrease_in_split_node(double value)
- Invariants
 
 - 
double observations_per_tree_fraction¶
- The fraction of observations per tree. Default value: 1.0. - Getter & Setter
- double get_observations_per_tree_fraction() const- auto & set_observations_per_tree_fraction(double value)
- Invariants
 
 - 
std::int64_t class_count¶
- The class count. Used with - task::classificationonly. Default value: 2.- Getter & Setter
- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> std::int64_t get_class_count() const- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_class_count(std::int64_t value)
 
 - 
std::int64_t min_observations_in_leaf_node¶
- The minimal number of observations in a leaf node. Default value: 1 for classification, 5 for regression. - Getter & Setter
- std::int64_t get_min_observations_in_leaf_node() const- auto & set_min_observations_in_leaf_node(std::int64_t value)
- Invariants
 
 - 
error_metric_mode error_metric_mode¶
- The error metric mode. Default value: error_metric_mode::none. - Getter & Setter
- error_metric_mode get_error_metric_mode() const- auto & set_error_metric_mode(error_metric_mode value)
 
 
Method tags¶
Task tags¶
- 
struct classification¶
- Tag-type that parameterizes entities used for solving classification problem. 
- 
struct regression¶
- Tag-type that parameterizes entities used for solving regression problem. 
- 
using by_default= classification¶
- Alias tag-type for classification task. 
Model¶
- 
template<typename Task= task::by_default>
 classmodel¶
- Template Parameters
- Task – Tag-type that specifies the type of the problem to solve. Can be - task::classificationor- task::regression.
 - Constructors - 
model()¶
- Creates a new instance of the class with the default property values. 
 - Public Methods - 
std::int64_t get_tree_count() const¶
- The number of trees in the forest. 
 - 
template<typename T= Task, typenameNone= detail::enable_if_classification_t<T>>
 std::int64_tget_class_count() const¶
- The class count. Used with - oneapi::dal::decision_forest::task::classificationonly.
 - 
template<typename Visitor>
 voidtraverse_depth_first(std::int64_t tree_idx, Visitor &&visitor) const¶
- Performs Depth First Traversal of i-th tree. - Parameters
- tree_idx – Index of the tree to traverse. 
- visitor – This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&). 
 
 
 - 
template<typename Visitor>
 voidtraverse_breadth_first(std::int64_t tree_idx, Visitor &&visitor) const¶
- Performs Breadth First Traversal of i-th tree. - Parameters
- tree_idx – Index of the tree to traverse. 
- visitor – This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&). 
 
 
 
Training train(...)¶
Input¶
- 
template<typename Task= task::by_default>
 classtrain_input¶
- Template Parameters
- Task – Tag-type that specifies type of the problem to solve. Can be - task::classificationor- task::regression.
 - Constructors - 
train_input(const table &data, const table &responses)¶
- Creates a new instance of the class with the given - dataand- responsesproperty values.
 - Properties - 
const table &labels¶
- Vector of labels \(y\) for the training set \(X\). Default value: table{}. - Getter & Setter
- const table & get_labels() const- auto & set_labels(const table &value)
 
 
Result¶
- 
template<typename Task= task::by_default>
 classtrain_result¶
- Template Parameters
- Task – Tag-type that specifies type of the problem to solve. Can be - task::classificationor- task::regression.
 - Constructors - 
train_result()¶
- Creates a new instance of the class with the default property values. 
 - Properties - 
const table &oob_err_per_observation¶
- A \(n \times 1\) table containing out-of-bag error value per observation. Computed when - error_metric_modeset with- error_metric_mode::out_of_bag_error_per_observation. Default value: table{}.- Getter & Setter
- const table & get_oob_err_per_observation() const- auto & set_oob_err_per_observation(const table &value)
 
 - 
const table &oob_err¶
- A \(1 \times 1\) table containing cumulative out-of-bag error value. Computed when - error_metric_modeset with- error_metric_mode::out_of_bag_error. Default value: table{}.- Getter & Setter
- const table & get_oob_err() const- auto & set_oob_err(const table &value)
 
 - 
const table &var_importance¶
- A \(1 \times p\) table containing variable importance value for each feature. Computed when - variable_importance_mode != variable_importance_mode::none. Default value: table{}.- Getter & Setter
- const table & get_var_importance() const- auto & set_var_importance(const table &value)
 
 
Operation¶
- 
template<typename Descriptor>
 decision_forest::train_resulttrain(const Descriptor &desc, const decision_forest::train_input &input)¶
- Parameters
- desc – Decision Forest algorithm descriptor - decision_forest::descriptor.
- input – Input data for the training operation 
 
 - Preconditions
- input.data.is_empty == false- input.labels.is_empty == false- input.labels.column_count == 1- desc.get_bootstrap() == true || (desc.get_bootstrap() == false && desc.get_variable_importance_mode() != variable_importance_mode::mda_raw && desc.get_variable_importance_mode() != variable_importance_mode::mda_scaled)- desc.get_bootstrap() == true || (desc.get_bootstrap() == false && desc.get_error_metric_mode() == error_metric_mode::none)
 
Inference infer(...)¶
Input¶
- 
template<typename Task= task::by_default>
 classinfer_input¶
- Template Parameters
- Task – Tag-type that specifies the type of the problem to solve. Can be - task::classificationor- task::regression.
 - Constructors - 
infer_input(const model<Task> &trained_model, const table &data)¶
- Creates a new instance of the class with the given - modeland- dataproperty values.
 - Properties 
Result¶
- 
template<typename Task= task::by_default>
 classinfer_result¶
- Template Parameters
- Task – Tag-type that specifies the type of the problem to solve. Can be - task::classificationor- task::regression.
 - Constructors - 
infer_result()¶
- Creates a new instance of the class with the default property values. 
 - Properties - 
const table &labels¶
- The \(n \times 1\) table with the predicted labels. Default value: table{}. - Getter & Setter
- const table & get_labels() const- auto & set_labels(const table &value)
 
 - 
const table &probabilities¶
- A \(n \times c\) table with the predicted class probabilities for each observation. - Getter & Setter
- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> const table & get_probabilities() const- template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_probabilities(const table &value)
 
 
Operation¶
- 
template<typename Descriptor>
 decision_forest::infer_resultinfer(const Descriptor &desc, const decision_forest::infer_input &input)¶
- Parameters
- desc – Decision Forest algorithm descriptor - decision_forest::descriptor.
- input – Input data for the inference operation 
 
 - Preconditions
- input.data.is_empty == false
 
