DBSCAN¶
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).
Operation |
Computational methods |
Programming Interface |
||
Default method |
Mathematical formulation¶
Refer to Developer Guide: DBSCAN.
Programming Interface¶
All types and functions in this section are declared in the
oneapi::dal::dbscan
namespace and are available via inclusion of the
oneapi/dal/algo/dbscan.hpp
header file.
Descriptor¶
-
template<typename
Float
= float, typenameMethod
= method::by_default, typenameTask
= task::by_default>
classdescriptor
¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
float
ordouble
.Method – Tag-type that specifies an implementation of algorithm. Can be
method::brute_force
.Task – Tag-type that specifies the type of the problem to solve. Can be
task::clustering
.
Constructors
-
descriptor
(double epsilon, std::int64_t min_observations)¶ Creates a new instance of the class with the given
epsilon
,min_observations
.
Properties
-
result_option_id
result_options
¶ Choose which results should be computed and returned.
- Getter & Setter
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
-
bool
mem_save_mode
¶ The flag for memory saving mode.
- Getter & Setter
bool get_mem_save_mode() const
auto & set_mem_save_mode(bool value)
-
std::int64_t
min_observations
¶ The number of neighbors.
- Getter & Setter
std::int64_t get_min_observations() const
auto & set_min_observations(int64_t value)
Task tags¶
-
struct
clustering
¶ Tag-type that parameterizes entities used for solving clustering problem.
-
using
by_default
= clustering¶ Alias tag-type for the clustering task.
Computation compute(...)
¶
Input¶
-
template<typename
Task
= task::by_default>
classcompute_input
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering
.
Constructors
-
compute_input
(const table &data = {}, const table &weights = {})¶ Creates a new instance of the class with the given
data
andweights
.
Properties
Result¶
-
template<typename
Task
= task::by_default>
classcompute_result
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering
.
Constructors
-
compute_result
()¶ Creates a new instance of the class with the default property values.
Properties
-
const table &
core_observations
¶ An \(m \times p\) table with the core observations in the input data. \(m\) is a number of core observations.
- Getter & Setter
const table & get_core_observations() const
auto & set_core_observations(const table &value)
-
std::int64_t
cluster_count
¶ The number of clusters found by the algorithm.
- Getter & Setter
std::int64_t get_cluster_count() const
auto & set_cluster_count(std::int64_t value)
- Invariants
cluster_count >= 0
-
const table &
core_observation_indices
¶ An \(m \times 1\) table with the indices of core observations in the input data. \(m\) is a number of core observations.
- Getter & Setter
const table & get_core_observation_indices() const
auto & set_core_observation_indices(const table &value)
-
const table &
core_flags
¶ An \(n \times 1\) table with the core flags \(y_i\) assigned to the samples \(x_i\) in the input data.
- Getter & Setter
const table & get_core_flags() const
auto & set_core_flags(const table &value)
-
const result_option_id &
result_options
¶ Result options that indicates availability of the properties. Default value: default_result_options<Task>.
- Getter & Setter
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
Operation¶
-
template<typename
Descriptor
>
dbscan::compute_resultcompute
(const Descriptor &desc, const dbscan::compute_input &input)¶ - Parameters
desc – DBSCAN algorithm descriptor
dbscan::descriptor
input – Input data for the compute operation
- Preconditions
Usage example¶
Compute¶
void run_compute(const table& data,
const table& weights) {
double epsilon = 1.0;
std::int64_t max_observations = 5;
const auto dbscan_desc = kmeans::descriptor<float>{epsilon, max_observations}
.set_result_options(dal::dbscan::result_options::responses);
const auto result = compute(dbscan_desc, data, weights);
print_table("responses", result.get_responses());
}
Examples¶
Batch Processing:
Batch Processing:
Batch Processing: