Principal Components Analysis (PCA)¶
Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation¶
Programming Interface¶
All types and functions in this section are declared in the
oneapi::dal::pca
namespace and be available via inclusion of the
oneapi/dal/algo/pca.hpp
header file.
Descriptor¶
-
template<typename
Float
= float, typenameMethod
= method::by_default, typenameTask
= task::by_default>
classdescriptor
¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
float
ordouble
.Method – Tag-type that specifies an implementation of algorithm. Can be
method::cov
ormethod::svd
.Task – Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
-
descriptor
(std::int64_t component_count = 0)¶ Creates a new instance of the class with the given
component_count
property value.
Properties
-
bool
deterministic
¶ Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.
- Getter & Setter
bool get_deterministic() const
auto & set_deterministic(bool value)
-
std::int64_t
component_count
¶ The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\). Default value: 0.
- Getter & Setter
std::int64_t get_component_count() const
auto & set_component_count(int64_t value)
- Invariants
component_count >= 0
Method tags¶
-
struct
cov
¶ Tag-type that denotes Covariance computational method.
-
using
by_default
= cov¶ Alias tag-type for Covariance computational method.
Task tags¶
-
struct
dim_reduction
¶ Tag-type that parameterizes entities used for solving dimensionality reduction problem.
-
using
by_default
= dim_reduction¶ Alias tag-type for dimensionality reduction task.
Model¶
-
template<typename
Task
= task::by_default>
classmodel
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
-
model
()¶ Creates a new instance of the class with the default property values.
Properties
Training train(...)
¶
Input¶
-
template<typename
Task
= task::by_default>
classtrain_input
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
-
train_input
(const table &data)¶ Creates a new instance of the class with the given
data
property value.
Properties
Result¶
-
template<typename
Task
= task::by_default>
classtrain_result
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
-
train_result
()¶ Creates a new instance of the class with the default property values.
Public Methods
-
const table &
get_eigenvectors
() const¶ An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector.
Properties
-
const table &
eigenvalues
¶ A \(1 \times r\) table that contains the eigenvalues for for the first
r
features. Default value: table{}.- Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
-
const table &
variances
¶ A \(1 \times r\) table that contains the variances for the first
r
features. Default value: table{}.- Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
Operation¶
-
template<typename
Descriptor
>
pca::train_resulttrain
(const Descriptor &desc, const pca::train_input &input)¶ - Parameters
desc – PCA algorithm descriptor
pca::descriptor
input – Input data for the training operation
- Preconditions
- Postconditions
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count
Inference infer(...)
¶
Input¶
-
template<typename
Task
= task::by_default>
classinfer_input
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
-
infer_input
(const model<Task> &trained_model, const table &data)¶ Creates a new instance of the class with the given
model
anddata
property values.
Properties
Result¶
-
template<typename
Task
= task::by_default>
classinfer_result
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
-
infer_result
()¶ Creates a new instance of the class with the default property values.
Properties
Operation¶
-
template<typename
Descriptor
>
pca::infer_resultinfer
(const Descriptor &desc, const pca::infer_input &input)¶ - Parameters
desc – PCA algorithm descriptor
pca::descriptor
input – Input data for the inference operation
- Preconditions
- Postconditions
Usage example¶
Training¶
pca::model<> run_training(const table& data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(5)
.set_deterministic(true);
const auto result = train(pca_desc, data);
print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_eigenvectors());
return result.get_model();
}
Inference¶
table run_inference(const pca::model<>& model,
const table& new_data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(model.get_component_count());
const auto result = infer(pca_desc, model, new_data);
print_table("labels", result.get_transformed_data());
}
Examples¶
Batch Processing:
Batch Processing:
Batch Processing: