k-Nearest Neighbors Classification (k-NN)¶

\(k\)-NN classification and search algorithms are based on finding the \(k\) nearest observations to the training set. For classification, the problem is to infer the class of a new feature vector by computing the majority vote of its \(k\) nearest observations from the training set. For search, the problem is to infer \(k\) nearest observations from the training set to a new feature vector. The nearest observations are computed based on the chosen distance metric.

Operation	Computational methods		Programming Interface
Training	Brute-force	k-d tree	train(…)	train_input	train_result
Inference	Brute-force	k-d tree	infer(…)	infer_input	infer_result

Mathematical formulation¶

Refer to Developer Guide: k-Nearest Neighbors Classification.

Programming Interface¶

All types and functions in this section are declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

Enum classes¶

enum class voting_mode¶

voting_mode::uniform: Uniform weights for neighbors for prediction voting.
voting_mode::distance: Weight neighbors by the inverse of their distance.

Result options¶

class result_option_id¶

Public Methods

constexpr result_option_id() = default¶

constexpr result_option_id(const result_option_id_base &base)¶

Descriptor¶

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default, typename Distance = oneapi::dal::minkowski_distance::descriptor<Float>> class descriptor¶

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::brute_force or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::regression, or task::search.
Distance – The descriptor of the distance used for computations. Can be minkowski_distance::descriptor or chebyshev_distance::descriptor.

Constructors

descriptor(std::int64_t class_count, std::int64_t neighbor_count)¶: Creates a new instance of the class with the given class_count and neighbor_count property values.

template<typename M = Method, typename None = detail::enable_if_brute_force_t<M>> descriptor(std::int64_t class_count, std::int64_t neighbor_count, const distance_t &distance)¶: Creates a new instance of the class with the given class_count, neighbor_count and distance property values. Used with method::brute_force only.

template<typename T = Task, typename None = detail::enable_if_not_classification_t<T>> descriptor(std::int64_t neighbor_count)¶: Creates a new instance of the class with the given neighbor_count property value. Used with task::search only.

template<typename T = Task, typename None = detail::enable_if_not_classification_t<T>> descriptor(std::int64_t neighbor_count, const distance_t &distance)¶: Creates a new instance of the class with the given neighbor_count and distance property values. Used with task::search only.

Properties

std::int64_t class_count¶

The number of classes c.

Getter & Setter: std::int64_t get_class_count() const

auto & set_class_count(std::int64_t value)
Invariants: class_count > 1

voting_mode voting_mode¶

The voting mode.

Getter & Setter: voting_mode get_voting_mode() const

auto & set_voting_mode(voting_mode value)

std::int64_t neighbor_count¶

The number of neighbors k.

Getter & Setter: std::int64_t get_neighbor_count() const

auto & set_neighbor_count(std::int64_t value)
Invariants: neighbor_count > 0

const distance_t &distance¶

Choose distance type for calculations. Used with method::brute_force only.

Getter & Setter: template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> const distance_t & get_distance() const

template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> auto & set_distance(const distance_t &dist)

result_option_id result_options¶

Choose which results should be computed and returned.

Getter & Setter: result_option_id get_result_options() const

auto & set_result_options(const result_option_id &value)

Method tags¶

struct brute_force¶: Tag-type that denotes brute-force computational method.

struct kd_tree¶: Tag-type that denotes k-d tree computational method.

using by_default = brute_force ¶: Alias tag-type for brute-force computational method.

Task tags¶

struct classification¶: Tag-type that parameterizes entities used for solving classification problem.

struct regression¶: Tag-type that parameterizes entities used for solving the regression problem.

struct search¶: Tag-type that parameterizes entities used for solving the search problem.

using by_default = classification ¶: Alias tag-type for classification task.

Model¶

template<typename Task = task::by_default> class model¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

model()¶: Creates a new instance of the class with the default property values.

Training `train(...)`¶

Input¶

template<typename Task = task::by_default> class train_input¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

train_input(const table &data, const table &responses)¶: Creates a new instance of the class with the given data and responses property values.

train_input(const table &data)¶

Properties

const table &labels¶

Vector of labels y for the training set X. Default value: table{}.

Getter & Setter: const table & get_labels() const

template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)

const table &data¶

The training set X. Default value: table{}.

Getter & Setter: const table & get_data() const

auto & set_data(const table &data)

const table &responses¶

Vector of responses y for the training set X. Default value: table{}.

Getter & Setter: const table & get_responses() const

template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_responses(const table &responses)

Result¶

template<typename Task = task::by_default> class train_result¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

train_result()¶: Creates a new instance of the class with the default property values.

Properties

const model<Task> &model¶

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const

auto & set_model(const model< Task > &value)

Operation¶

template<typename Descriptor> knn::train_result train(const Descriptor &desc, const knn::train_input &input)¶

Parameters

desc – k-NN algorithm descriptor knn::descriptor
input – Input data for the training operation

Preconditions: input.data.has_data == true

input.labels.has_data == true

input.data.row_count == input.labels.row_count

input.labels.column_count == 1

input.labels[i] >= 0

input.labels[i] < desc.class_count

Inference `infer(...)`¶

Input¶

template<typename Task = task::by_default> class infer_input¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

infer_input(const table &data, const model<Task> &model)¶: Creates a new instance of the class with the given model and data property values.

Properties

const table &data¶

The dataset for inference \(X'\). Default value: table{}.

Getter & Setter: const table & get_data() const

auto & set_data(const table &data)

const model<Task> &model¶

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const

auto & set_model(const model< Task > &m)

Result¶

template<typename Task = task::by_default> class infer_result¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

infer_result()¶: Creates a new instance of the class with the default property values.

Properties

const table &labels¶

The predicted labels. Default value: table{}.

Getter & Setter: const table & get_labels() const

template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)

const table &indices¶

Indices of nearest neighbors. Default value: table{}.

Getter & Setter: const table & get_indices() const

auto & set_indices(const table &value)

const result_option_id &result_options¶

Result options that indicates availability of the properties.

Getter & Setter: const result_option_id & get_result_options() const

auto & set_result_options(const result_option_id &value)

const table &distances¶

Distances to nearest neighbors. Default value: table{}.

Getter & Setter: const table & get_distances() const

auto & set_distances(const table &value)

const table &responses¶

The predicted responses. Default value: table{}.

Getter & Setter: const table & get_responses() const

template <typename T = Task, typename None = detail::enable_if_not_search_t<T>> auto & set_responses(const table &value)

Operation¶

template<typename Descriptor> knn::infer_result infer(const Descriptor &desc, const knn::infer_input &input)¶

Parameters

desc – k-NN algorithm descriptor knn::descriptor
input – Input data for the inference operation

Preconditions: input.data.has_data == true
Postconditions: result.labels.row_count == input.data.row_count

result.labels.column_count == 1

result.labels[i] >= 0

result.labels[i] < desc.class_count

Usage example¶

Training¶

knn::model<> run_training(const table& data,
                        const table& labels) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = train(knn_desc, data, labels);

   return result.get_model();
}

Inference¶

table run_inference(const knn::model<>& model,
                  const table& new_data) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = infer(knn_desc, model, new_data);

   print_table("labels", result.get_labels());
}

Examples¶

Batch Processing:

knn_cls_brute_force_dense_batch.cpp

oneDAL documentation

k-Nearest Neighbors Classification (k-NN)¶

Mathematical formulation¶

Programming Interface¶

Enum classes¶

Result options¶

Descriptor¶

Method tags¶

Task tags¶

Model¶

Training train(...)¶

Input¶

Result¶

Operation¶

Inference infer(...)¶

Input¶

Result¶

Operation¶

Usage example¶

Training¶

Inference¶

Examples¶

Training `train(...)`¶

Inference `infer(...)`¶