Basic Statistics

Basic statistics algorithm computes the following set of quantitative dataset characteristics:

  • minimums/maximums

  • sums

  • means

  • sums of squares

  • sums of squared differences from the means

  • second order raw moments

  • variances

  • standard deviations

  • variations

Operation

Computational methods

Programming Interface

dense

dense

compute(…)

compute_input

compute_result

Mathematical formulation

Computing

Given a set \(X\) of \(n\) \(p\)-dimensional feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\), the problem is to compute the following sample characteristics for each feature in the data set:

Statistic

Definition

Minimum

\(min(j) = \smash{\displaystyle \min_i } \{x_{ij}\}\)

Maximum

\(max(j) = \smash{\displaystyle \max_i } \{x_{ij}\}\)

Sum

\(s(j) = \sum_i x_{ij}\)

Sum of squares

\(s_2(j) = \sum_i x_{ij}^2\)

Means

\(m(j) = \frac {s(j)} {n}\)

Second order raw moment

\(a_2(j) = \frac {s_2(j)} {n}\)

Sum of squared difference from the means

\(\text{SDM}(j) = \sum_i (x_{ij} - m(j))^2\)

Variance

\(k_2(j) = \frac {\text{SDM}(j) } {n - 1}\)

Standard deviation

\(\text{stdev}(j) = \sqrt {k_2(j)}\)

Variation coefficient

\(V(j) = \frac {\text{stdev}(j)} {m(j)}\)

Computation method: dense

The method computes the basic statistics for each feature in the data set.

Programming Interface

Refer to API Reference: Basic statistics.

Distributed mode

The algorithm supports distributed execution in SMPD mode (only on GPU).