.. ******************************************************************************
.. * Copyright 2020-2021 Intel Corporation
.. *
.. * Licensed under the Apache License, Version 2.0 (the "License");
.. * you may not use this file except in compliance with the License.
.. * You may obtain a copy of the License at
.. *
.. *     http://www.apache.org/licenses/LICENSE-2.0
.. *
.. * Unless required by applicable law or agreed to in writing, software
.. * distributed under the License is distributed on an "AS IS" BASIS,
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. * See the License for the specific language governing permissions and
.. * limitations under the License.
.. *******************************************************************************/

.. _kmeans_computation_batch:

Batch Processing
****************

Algorithm Input
+++++++++++++++

The K-Means clustering algorithm accepts the input described
below. Pass the ``Input ID`` as a parameter to the methods that
provide input for your algorithm.

.. tabularcolumns::  |\Y{0.2}|\Y{0.8}|

.. list-table:: Algorithm Input for K-Means Computaion (Batch Processing)
   :header-rows: 1
   :widths: 10 60
   :align: left
   :class: longtable

   * - Input ID
     - Input
   * - ``data``
     - Pointer to the :math:`n \times p` numeric table with the data to be clustered.
   * - ``inputCentroids``
     - Pointer to the :math:`nClusters \times p` numeric table with the initial centroids.

.. note:: The input for ``data`` and ``inputCentroids`` can be an object of any class derived from ``NumericTable``.

Algorithm Parameters
++++++++++++++++++++

The K-Means clustering algorithm has the following parameters:

.. tabularcolumns::  |\Y{0.15}|\Y{0.15}|\Y{0.7}|

.. list-table:: Algorithm Parameters for K-Means Computaion (Batch Processing)
   :header-rows: 1
   :widths: 10 10 60
   :align: left
   :class: longtable

   * - Parameter
     - Default Value
     - Description
   * - ``algorithmFPType``
     - ``float``
     - The floating-point type that the algorithm uses for intermediate computations. Can be ``float`` or ``double``.
   * - ``method``
     - ``defaultDense``
     - Available computation methods for K-Means clustering:

       For CPU:

       - ``defaultDense`` - implementation of Lloyd's algorithm
       - ``lloydCSR`` - implementation of Lloyd's algorithm for CSR numeric tables

       For GPU:

       - ``defaultDense`` - implementation of Lloyd's algorithm

   * - ``nClusters``
     - Not applicable
     - The number of clusters. Required to initialize the algorithm.
   * - ``maxIterations``
     - Not applicable
     - The number of iterations. Required to initialize the algorithm.
   * - ``accuracyThreshold``
     - :math:`0.0`
     - The threshold for termination of the algorithm.
   * - ``gamma``
     - :math:`1.0`
     - The weight to be used in distance calculation for binary categorical features.
   * - ``distanceType``
     - ``euclidean``
     - The measure of closeness between points (observations) being clustered. The only distance type supported so far is the Euclidian distance.
   * - **DEPRECATED:** ``assignFlag``

       **USE INSTEAD:** ``resultsToEvaluate``

     - ``true``
     - A flag that enables computation of assignments, that is, assigning cluster indices to respective observations.
   * - ``resultsToEvaluate``
     - ``computeCentroids`` | ``computeAssignments`` | ``computeExactObjectiveFunction``
     - The 64-bit integer flag that specifies which extra characteristics of the K-Means algorithm to compute.

       Provide one of the following values to request a single characteristic or use bitwise OR to request a combination of the characteristics:

       - ``computeCentroids`` for computation centroids.
       - ``computeAssignments`` for computation of assignments, that is, assigning cluster indices to respective observations.
       - ``computeExactObjectiveFunction`` for computation of exact ObjectiveFunction.


Algorithm Output
++++++++++++++++

The K-Means clustering algorithm calculates the result described
below. Pass the ``Result ID`` as a parameter to the methods that access
the results of your algorithm.

.. tabularcolumns::  |\Y{0.2}|\Y{0.8}|

.. list-table:: Algorithm Output for K-Means Computaion (Batch Processing)
   :header-rows: 1
   :widths: 10 60
   :align: left
   :class: longtable

   * - Result ID
     - Result
   * - ``centroids``
     -
       Pointer to the :math:`nClusters \times p` numeric table with the cluster centroids,
       computed when ``computeCentroids`` option is enabled.

       .. include:: ./../../includes/default_result_numeric_table.rst

   * - ``assignments``
     -
       Pointer to the :math:`n \times 1` numeric table with
       assignments of cluster indices to feature vectors in the input data,
       computed when ``computeAssignments`` option is enabled.

       .. include:: ./../../includes/default_result_numeric_table.rst

   * - ``objectiveFunction``
     -
       Pointer to the :math:`1 \times 1` numeric table with the minimum value of the objective function
       obtained at the last iteration of the algorithm, might be inexact.
       When ``computeExactObjectiveFunction`` option is enabled, exact objective function is computed.

       .. include:: ./../../includes/default_result_numeric_table.rst

   * - ``nIterations``
     -
       Pointer to the :math:`1 \times 1` numeric table with the actual number of iterations
       done by the algorithm.

       .. include:: ./../../includes/default_result_numeric_table.rst

.. note::
  You can skip update of centroids and objectiveFunction in the
  result and compute assignments using original inputCentroids.
  To do this, set ``resultsToEvaluate`` flag only to ``computeAssignments`` and ``maxIterations`` to zero.