.. ****************************************************************************** .. * Copyright 2020-2021 Intel Corporation .. * .. * Licensed under the Apache License, Version 2.0 (the "License"); .. * you may not use this file except in compliance with the License. .. * You may obtain a copy of the License at .. * .. * http://www.apache.org/licenses/LICENSE-2.0 .. * .. * Unless required by applicable law or agreed to in writing, software .. * distributed under the License is distributed on an "AS IS" BASIS, .. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. * See the License for the specific language governing permissions and .. * limitations under the License. .. *******************************************************************************/ Z-score ======= Z-score normalization is an algorithm that produces data with each feature (column) having zero mean and unit variance. Details ******* Given a set :math:`X` of :math:`n` feature vectors :math:`x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})` of dimension :math:`p`, the problem is to compute the matrix :math:`Y = (y_{ij})` of dimension :math:`n \times p` as following: .. math:: y_{ij} = \frac {x_{ij} - m_j} {\Delta} where: - :math:`m_j` is the mean of :math:`j`-th component of set :math:`(X)_j`, where :math:`j = \overline{1, p}` - value of :math:`\Delta` depends omn a computation mode |short_name| provides two modes for computing the result matrix. You can enable the mode by setting the flag ``doScale`` to a certain position (for details, see `Algorithm Parameters`_). The mode may include: - **Centering only.** In this case, :math:`\Delta = 1` and no scaling is performed. After normalization, the mean of :math:`j`-th component of result set :math:`(Y)_j` will be zero. - **Centering and scaling.** In this case, :math:`\Delta = \sigma_j`, where :math:`\sigma_j` is the standard deviation of :math:`j`-th component of set :math:`(X)_j`. After normalization, the mean of :math:`j`-th component of result set :math:`(Y)_j` will be zero and its variance will get a value of one. .. note:: Some algorithms require normalization parameters (mean and variance) as an input. The implementation of Z-score algorithm in |short_name| does not return these values by default. Enable this option by setting the resultsToCompute flag. For details, see `Algorithm Parameters`_. Batch Processing **************** Algorithm Input --------------- Z-score normalization algorithm accepts an input as described below. Pass the ``Input ID`` as a parameter to the methods that provide input for your algorithm. For more details, see :ref:`algorithms`. .. tabularcolumns:: |\Y{0.2}|\Y{0.8}| .. list-table:: Algorithm Input for Z-score (Batch Processing) :widths: 10 60 :header-rows: 1 * - Input ID - Input * - ``data`` - Pointer to the numeric table of size :math:`n \times p`. .. note:: This table can be an object of any class derived from ``NumericTable``. Algorithm Parameters -------------------- Z-score normalization algorithm has the following parameters. Some of them are required only for specific values of the computation method parameter ``method``: .. tabularcolumns:: |\Y{0.15}|\Y{0.15}|\Y{0.15}|\Y{0.55}| .. list-table:: Algorithm Parameters for Z-score (Batch Processing) :header-rows: 1 :widths: 10 10 10 60 :align: left :class: longtable * - Parameter - method - Default Value - Description * - ``algorithmFPType`` - ``defaultDense`` or ``sumDense`` - ``float`` - The floating-point type that the algorithm uses for intermediate computations. Can be ``float`` or ``double``. * - ``method`` - Not applicable - ``defaultDense`` - Available computation methods: defaultDense a performance-oriented method. Mean and variance are computed by low order moments algorithm. For details, see :ref:`Batch Processing for Moments of Low Order `. sumDense a method that uses the basic statistics associated with the numeric table of pre-computed sums. Returns an error if pre-computed sums are not defined. * - ``moments`` - ``defaultDense`` - `SharedPtr >` - Pointer to the low order moments algorithm that computes means and standard deviations to be used for Z-score normalization with the ``defaultDense`` method. * - ``doScale`` - ``defaultDense`` or ``sumDense`` - ``true`` - If true, the algorithm applies both centering and scaling. Otherwise, the algorithm provides only centering. * - ``resultsToCompute`` - ``defaultDense`` or ``sumDense`` - Not applicable - *Optional*. Pointer to the data collection containing the following key-value pairs for Z-score: - ``mean`` - means - ``variance`` - variances Provide one of these values to request a single characteristic or use bitwise OR to request a combination of them. Algorithm Output ---------------- Z-score normalization algorithm calculates the result as described below. Pass the ``Result ID`` as a parameter to the methods that access the results of your algorithm. For more details, see :ref:`algorithms`. .. tabularcolumns:: |\Y{0.2}|\Y{0.8}| .. list-table:: Algorithm Output for Z-score (Batch Processing) :widths: 10 60 :header-rows: 1 :class: longtable * - Result ID - Result * - ``normalizedData`` - Pointer to the :math:`n \times p` numeric table that stores the result of normalization. .. note:: By default, the result is an object of the ``HomogenNumericTable`` class, but you can define the result as an object of any class derived from ``NumericTable`` except ``PackedTriangularMatrix``, ``PackedSymmetricMatrix``, and ``CSRNumericTable``. * - ``means`` - *Optional*. Pointer to the :math:`1 \times p` numeric table that contains mean values for each feature. If the function result is not requested through the ``resultsToCompute`` parameter, the numeric table contains a ``NULL`` pointer. * - ``variances`` - *Optional*. Pointer to the :math:`1 \times p` numeric table that contains variance values for each feature. If the function result is not requested through the ``resultsToCompute`` parameter, the numeric table contains a ``NULL`` pointer. - .. note:: By default, each numeric table specified by the collection elements is an object of the ``HomogenNumericTable`` class. You can also define the result as an object of any class derived from ``NumericTable``, except for ``PackedSymmetricMatrix``, ``PackedTriangularMatrix``, and ``CSRNumericTable``. Examples ******** .. tabs:: .. tab:: C++ (CPU) Batch Processing: - :cpp_example:`zscore_dense_batch.cpp ` .. tab:: Java* .. note:: There is no support for Java on GPU. Batch Processing: - :java_example:`ZScoreDenseBatch.java ` .. tab:: Python* Batch Processing: - :daal4py_example:`normalization_zscore_batch.py`