Z-score¶

Z-score normalization is an algorithm that produces data with each feature (column) having zero mean and unit variance.

Details¶

Given a set \(X\) of \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\) of dimension \(p\), the problem is to compute the matrix \(Y = (y_{ij})\) of dimension \(n \times p\) as following:

\[y_{ij} = \frac {x_{ij} - m_j} {\Delta}\]

where:

\(m_j\) is the mean of \(j\)-th component of set \((X)_j\), where \(j = \overline{1, p}\)
value of \(\Delta\) depends omn a computation mode

oneDAL provides two modes for computing the result matrix. You can enable the mode by setting the flag doScale to a certain position (for details, see Algorithm Parameters). The mode may include:

Centering only. In this case, \(\Delta = 1\) and no scaling is performed. After normalization, the mean of \(j\)-th component of result set \((Y)_j\) will be zero.
Centering and scaling. In this case, \(\Delta = \sigma_j\), where \(\sigma_j\) is the standard deviation of \(j\)-th component of set \((X)_j\). After normalization, the mean of \(j\)-th component of result set \((Y)_j\) will be zero and its variance will get a value of one.

Note

Some algorithms require normalization parameters (mean and variance) as an input. The implementation of Z-score algorithm in oneDAL does not return these values by default. Enable this option by setting the resultsToCompute flag. For details, see Algorithm Parameters.

Batch Processing¶

Algorithm Input¶

Z-score normalization algorithm accepts an input as described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Z-score (Batch Processing)¶
Input ID	Input
`data`	Pointer to the numeric table of size \(n \times p\). Note This table can be an object of any class derived from `NumericTable`.

Algorithm Parameters¶

Z-score normalization algorithm has the following parameters. Some of them are required only for specific values of the computation method parameter method:

Algorithm Parameters for Z-score (Batch Processing)¶
Parameter	method	Default Value	Description
`algorithmFPType`	`defaultDense` or `sumDense`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	Not applicable	`defaultDense`	Available computation methods: defaultDense a performance-oriented method. Mean and variance are computed by low order moments algorithm. For details, see Batch Processing for Moments of Low Order. sumDense a method that uses the basic statistics associated with the numeric table of pre-computed sums. Returns an error if pre-computed sums are not defined.
`moments`	`defaultDense`	SharedPtr<low_order_moments::Batch<algorithmFPType, low_order_moments::defaultDense> >	Pointer to the low order moments algorithm that computes means and standard deviations to be used for Z-score normalization with the `defaultDense` method.
`doScale`	`defaultDense` or `sumDense`	`true`	If true, the algorithm applies both centering and scaling. Otherwise, the algorithm provides only centering.
`resultsToCompute`	`defaultDense` or `sumDense`	Not applicable	Optional. Pointer to the data collection containing the following key-value pairs for Z-score: `mean` - means `variance` - variances Provide one of these values to request a single characteristic or use bitwise OR to request a combination of them.

Algorithm Output¶

Z-score normalization algorithm calculates the result as described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Algorithm Output for Z-score (Batch Processing)¶
Result ID	Result
`normalizedData`	Pointer to the \(n \times p\) numeric table that stores the result of normalization. Note By default, the result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.
`means`	Optional. Pointer to the \(1 \times p\) numeric table that contains mean values for each feature. If the function result is not requested through the `resultsToCompute` parameter, the numeric table contains a `NULL` pointer.
`variances`	Optional. Pointer to the \(1 \times p\) numeric table that contains variance values for each feature. If the function result is not requested through the `resultsToCompute` parameter, the numeric table contains a `NULL` pointer. -

Note

By default, each numeric table specified by the collection elements is an object of the HomogenNumericTable class. You can also define the result as an object of any class derived from NumericTable, except for PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples¶

Batch Processing:

zscore_dense_batch.cpp

oneDAL documentation