Cosine Distance Matrix

Given \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots x_n = (x_{n1}, \ldots, x_{np})\) of dimension Lmath:p, the problem is to compute the symmetric \(n \times n\) matrix \(D_{\text{cos}} = (d_{ij})\) of distances between feature vectors, where

\[d_{ij} = 1 - \frac {\sum_{k=1}^{p} x_{ik} x_{jk}} {\sqrt{ \sum_{k=1}^{p} x_{ik}^2 } \sqrt{ \sum_{k=1}^{p} x_{jk}^2 }}\]
\[i = \overline{1, n}\]
\[j = \overline{1, n}\]

Batch Processing

Algorithm Input

The cosine distance matrix algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Cosine Distance Matrix (Batch Processing)

Input ID

Input

data

Pointer to the \(n \times p\) numeric table for which the distance is computed.

The input can be an object of any class derived from NumericTable.

Algorithm Parameters

The cosine distance matrix algorithm has the following parameters:

Algorithm Parameters for Cosine Distance Matrix (Batch Processing)

Parameter

Default Value

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Performance-oriented computation method, the only method supported by the algorithm.

Algorithm Output

The cosine distance matrix algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Algorithm Output for Cosine Distance Matrix (Batch Processing)

Result ID

Result

cosineDistance

Pointer to the numeric table that represents the \(n \times n\) symmetric distance matrix \(D_\text{cos}\).

By default, the result is an object of the PackedSymmetricMatrix class with the lowerPackedSymmetricMatrix layout. However, you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix and CSRNumericTable.

Examples

Batch Processing:

Performance Considerations

To get the best overall performance when computing the cosine distance matrix:

  • If input data is homogeneous, provide the input data and store results in homogeneous numeric tables of the same type as specified in the algorithmFPType class template parameter.

  • If input data is non-homogeneous, use AOS layout rather than SOA layout.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​.

Notice revision #20201201