DBSCAN¶

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).

Operation	Computational methods	Programming Interface
Compute	Default method	compute(…)	compute_input	compute_result

Mathematical formulation¶

Refer to Developer Guide: DBSCAN.

Programming Interface¶

All types and functions in this section are declared in the oneapi::dal::dbscan namespace and are available via inclusion of the oneapi/dal/algo/dbscan.hpp header file.

Descriptor¶

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default> class descriptor¶

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::brute_force.
Task – Tag-type that specifies the type of the problem to solve. Can be task::clustering.

Constructors

descriptor(double epsilon, std::int64_t min_observations)¶: Creates a new instance of the class with the given epsilon, min_observations.

Properties

result_option_id result_options¶

Choose which results should be computed and returned.

Getter & Setter: result_option_id get_result_options() const

auto & set_result_options(const result_option_id &value)

bool mem_save_mode¶

The flag for memory saving mode.

Getter & Setter: bool get_mem_save_mode() const

auto & set_mem_save_mode(bool value)

std::int64_t min_observations¶

The number of neighbors.

Getter & Setter: std::int64_t get_min_observations() const

auto & set_min_observations(int64_t value)

double epsilon¶

The distance epsilon for neighbor search.

Getter & Setter: double get_epsilon() const

auto & set_epsilon(double value)
Invariants: epsilon >= 0.0

Method tags¶

struct brute_force¶

using by_default = brute_force ¶

Task tags¶

struct clustering¶: Tag-type that parameterizes entities used for solving clustering problem.

using by_default = clustering ¶: Alias tag-type for the clustering task.

Computation `compute(...)`¶

Input¶

template<typename Task = task::by_default> class compute_input¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.

Constructors

compute_input(const table &data = {}, const table &weights = {})¶: Creates a new instance of the class with the given data and weights.

Properties

const table &weights¶

A single column table with the weights, where each row stores one weight per observation.

Getter & Setter: const table & get_weights() const

auto & set_weights(const table &weights)

const table &data¶

An \(n \times p\) table with the data to be clustered, where each row stores one feature vector.

Getter & Setter: const table & get_data() const

auto & set_data(const table &data)

Result¶

template<typename Task = task::by_default> class compute_result¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.

Constructors

compute_result()¶: Creates a new instance of the class with the default property values.

Properties

const table &core_observations¶

An \(m \times p\) table with the core observations in the input data. \(m\) is a number of core observations.

Getter & Setter: const table & get_core_observations() const

auto & set_core_observations(const table &value)

std::int64_t cluster_count¶

The number of clusters found by the algorithm.

Getter & Setter: std::int64_t get_cluster_count() const

auto & set_cluster_count(std::int64_t value)
Invariants: cluster_count >= 0

const table &core_observation_indices¶

An \(m \times 1\) table with the indices of core observations in the input data. \(m\) is a number of core observations.

Getter & Setter: const table & get_core_observation_indices() const

auto & set_core_observation_indices(const table &value)

const table &core_flags¶

An \(n \times 1\) table with the core flags \(y_i\) assigned to the samples \(x_i\) in the input data.

Getter & Setter: const table & get_core_flags() const

auto & set_core_flags(const table &value)

const result_option_id &result_options¶

Result options that indicates availability of the properties. Default value: default_result_options<Task>.

Getter & Setter: const result_option_id & get_result_options() const

auto & set_result_options(const result_option_id &value)

const table &responses¶

An \(n \times 1\) table with the responses \(y_i\) assigned to the samples \(x_i\) in the input data. Default value: table{}.

Getter & Setter: const table & get_responses() const

auto & set_responses(const table &value)

Operation¶

template<typename Descriptor> dbscan::compute_result compute(const Descriptor &desc, const dbscan::compute_input &input)¶

Parameters

desc – DBSCAN algorithm descriptor dbscan::descriptor
input – Input data for the compute operation

Preconditions: input.data.has_data == true

!input.weights.has_data || input.weights.row_count == input.data.row_count && input.weights.column_count == 1

Usage example¶

Compute¶

void run_compute(const table& data,
                           const table& weights) {
   double epsilon = 1.0;
   std::int64_t max_observations = 5;
   const auto dbscan_desc = kmeans::descriptor<float>{epsilon, max_observations}
      .set_result_options(dal::dbscan::result_options::responses);

   const auto result = compute(dbscan_desc, data, weights);

   print_table("responses", result.get_responses());
}

Examples¶

Batch Processing:

dbscan_brute_force_batch.cpp

oneDAL documentation

DBSCAN¶

Mathematical formulation¶

Programming Interface¶

Descriptor¶

Method tags¶

Task tags¶

Computation compute(...)¶

Input¶

Result¶

Operation¶

Usage example¶

Compute¶

Examples¶

Computation `compute(...)`¶