K-Means¶
The K-Means algorithm solves clustering problem by partitioning \(n\) feature vectors into \(k\) clusters minimizing some criterion. Each cluster is characterized by a representative point, called a centroid.
Operation  | 
Computational methods  | 
Programming Interface  | 
||
Mathematical formulation¶
Refer to Developer Guide: K-Means.
Programming Interface¶
All types and functions in this section are declared in the
oneapi::dal::kmeans namespace and be available via inclusion of the
oneapi/dal/algo/kmeans.hpp header file.
Descriptor¶
- 
template<typename 
Float= float, typenameMethod= method::by_default, typenameTask= task::by_default>
classdescriptor¶ - Template Parameters
 Float – The floating-point type that the algorithm uses for intermediate computations. Can be
floatordouble.Method – Tag-type that specifies an implementation of algorithm. Can be
method::lloyd_dense.Task – Tag-type that specifies the type of the problem to solve. Can be
task::clustering.
Constructors
- 
descriptor(std::int64_t cluster_count = 2)¶ Creates a new instance of the class with the given
cluster_count.
Properties
- 
std::int64_t 
max_iteration_count¶ The maximum number of iterations
T. Default value: 100.- Getter & Setter
 std::int64_t get_max_iteration_count() constauto & set_max_iteration_count(int64_t value)- Invariants
 max_iteration_count >= 0
- 
std::int64_t 
cluster_count¶ The number of clusters k. Default value: 2.
- Getter & Setter
 std::int64_t get_cluster_count() constauto & set_cluster_count(int64_t value)- Invariants
 cluster_count > 0
- 
double 
accuracy_threshold¶ The threshold \(\varepsilon\) for the stop condition. Default value: 0.0.
- Getter & Setter
 double get_accuracy_threshold() constauto & set_accuracy_threshold(double value)- Invariants
 accuracy_threshold >= 0.0
Method tags¶
- 
using 
by_default= lloyd_dense¶ Alias tag-type for Lloyd’s computational method.
Task tags¶
- 
struct 
clustering¶ Tag-type that parameterizes entities used for solving clustering problem.
- 
using 
by_default= clustering¶ Alias tag-type for the clustering task.
Model¶
- 
template<typename 
Task= task::by_default>
classmodel¶ - Template Parameters
 Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering.
Constructors
- 
model()¶ Creates a new instance of the class with the default property values.
Public Methods
- 
std::int64_t 
get_cluster_count() const¶ Number of clusters k in the trained model.
Properties
Training train(...)¶
Input¶
- 
template<typename 
Task= task::by_default>
classtrain_input¶ - Template Parameters
 Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering.
Constructors
- 
train_input(const table &data, const table &initial_centroids)¶ Creates a new instance of the class with the given
dataandinitial_centroids.
Properties
Result¶
- 
template<typename 
Task= task::by_default>
classtrain_result¶ - Template Parameters
 Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering.
Constructors
- 
train_result()¶ Creates a new instance of the class with the default property values.
Properties
- 
const table &
labels¶ An \(n \times 1\) table with the labels \(y_i\) assigned to the samples \(x_i\) in the input data, \(1 \leq 1 \leq n\). Default value: table{}.
- Getter & Setter
 const table & get_labels() constauto & set_labels(const table &value)
- 
int64_t 
iteration_count¶ The number of iterations performed by the algorithm. Default value: 0.
- Getter & Setter
 int64_t get_iteration_count() constauto & set_iteration_count(std::int64_t value)- Invariants
 iteration_count >= 0
- 
const model<Task> &
model¶ The trained K-means model. Default value: model<Task>{}.
- Getter & Setter
 const model< Task > & get_model() constauto & set_model(const model< Task > &value)
- 
const table &
responses¶ An \(n \times 1\) table with the responses \(y_i\) assigned to the samples \(x_i\) in the input data, \(1 \leq 1 \leq n\). Default value: table{}.
- Getter & Setter
 const table & get_responses() constauto & set_responses(const table &value)
- 
double 
objective_function_value¶ The value of the objective function \(\Phi_X(C)\), where C is
model.centroids.- Getter & Setter
 double get_objective_function_value() constauto & set_objective_function_value(double value)- Invariants
 objective_function_value >= 0.0
Operation¶
- 
template<typename 
Descriptor>
kmeans::train_resulttrain(const Descriptor &desc, const kmeans::train_input &input)¶ - Parameters
 desc – K-Means algorithm descriptor
kmeans::descriptorinput – Input data for the training operation
- Preconditions
 - Postconditions
 result.labels.row_count == input.data.row_countresult.labels.column_count == 1result.labels[i] >= 0result.labels[i] < desc.cluster_countresult.iteration_count <= desc.max_iteration_countresult.model.centroids.row_count == desc.cluster_countresult.model.centroids.column_count == input.data.column_count
Inference infer(...)¶
Input¶
- 
template<typename 
Task= task::by_default>
classinfer_input¶ - Template Parameters
 Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering.
Constructors
- 
infer_input(const model<Task> &trained_model, const table &data)¶ Creates a new instance of the class with the given
modelanddata.
Properties
Result¶
- 
template<typename 
Task= task::by_default>
classinfer_result¶ - Template Parameters
 Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering.
Constructors
- 
infer_result()¶ Creates a new instance of the class with the default property values.
Properties
- 
const table &
labels¶ An \(n \times 1\) table with assignments labels to feature vectors in the input data. Default value: table{}.
- Getter & Setter
 const table & get_labels() constauto & set_labels(const table &value)
- 
const table &
responses¶ An \(n \times 1\) table with assignments responses to feature vectors in the input data. Default value: table{}.
- Getter & Setter
 const table & get_responses() constauto & set_responses(const table &value)
- 
double 
objective_function_value¶ The value of the objective function \(\Phi_X(C)\), where C is defined by the corresponding
infer_input::model::centroids. Default value: 0.0.- Getter & Setter
 double get_objective_function_value() constauto & set_objective_function_value(double value)- Invariants
 objective_function_value >= 0.0
Operation¶
- 
template<typename 
Descriptor>
kmeans::infer_resultinfer(const Descriptor &desc, const kmeans::infer_input &input)¶ - Parameters
 desc – K-Means algorithm descriptor
kmeans::descriptorinput – Input data for the inference operation
- Preconditions
 - Postconditions
 
Usage example¶
Training¶
kmeans::model<> run_training(const table& data,
                           const table& initial_centroids) {
   const auto kmeans_desc = kmeans::descriptor<float>{}
      .set_cluster_count(10)
      .set_max_iteration_count(50)
      .set_accuracy_threshold(1e-4);
   const auto result = train(kmeans_desc, data, initial_centroids);
   print_table("labels", result.get_labels());
   print_table("centroids", result.get_model().get_centroids());
   print_value("objective", result.get_objective_function_value());
   return result.get_model();
}
Inference¶
table run_inference(const kmeans::model<>& model,
                  const table& new_data) {
   const auto kmeans_desc = kmeans::descriptor<float>{}
      .set_cluster_count(model.get_cluster_count());
   const auto result = infer(kmeans_desc, model, new_data);
   print_table("labels", result.get_labels());
}
Examples¶
Batch Processing:
Batch Processing:
Batch Processing: