Mean Squared Error Algorithm

Note

Mean Squared Error Algorithm is not supported on GPU.

Details

Given \(x = (x_{i1}, \ldots, x_{ip}) \in R^p\), a set of feature vectors \(i \in \{1, \ldots, n\}\), and a set of respective responses \(y_i\), the mean squared error (MSE) objective function \(F(\theta; x, y)\) is a function that has the format:

\[F(\theta; x, y) = \sum _{i=1}^{n} F_i(\theta; x, y) = \frac {1}{2n} \sum _{i=1}^{n} (y_i - h(\theta, x_i))^2\]
\[M(\theta) = 0\]
\[\mathrm{prox}_\gamma^M (\theta_j) = \theta_j, j = 1, \ldots, p\]

In oneDAL implementation of the MSE, the \(h(\theta, y_i)\) is represented as:

\[h(\theta, y_i) = \theta_0 + \sum _{j=1}^{p} \theta_j x_{ij}\]

For a given set of the indices \(I = \{i_1, i_2, \ldots, i_m\}\), \(1 \leq i_r < n\), \(l \in \{1, \ldots, m\}\), \(|I| = m\), the value and the gradient of the sum of functions in the argument \(x\) respectively have the format:

\[F_I(\theta; x, y) = \frac {1}{2m} \sum_{i_k \in I} (y_{i_k} - h(\theta, x_{i_k}))^2\]
\[\nabla F_I(\theta; x, y) = \left\{ \frac{\partial F_I}{\partial \theta_0}, \ldots, \frac{\partial F_I}{\partial \theta_p} \right\}\]

where

\[\frac{\partial F_I}{\partial \theta_0} = \frac{1}{m} \sum_{i_k \in I} (y_{i_k} - h(\theta, x_{i_k}))\]
\[\frac{\partial F_I}{\partial \theta_j} = \frac{1}{m} \sum_{i_k \in I} (y_{i_k} - h(\theta, x_{i_k})) x_{i_k j}, j = 1, \ldots, p\]

\(lipschitzConstant = \underset{i = 1, \ldots, n} \max \| x_i \|_2\)

Computation

Algorithm Input

The mean squared error algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for MSE Computaion

Input ID

Input

argument

A numeric table of size \((p + 1) \times 1\) with the input argument \(\theta\) of the objective function.

data

A numeric table of size \(n \times p\) with the data \(x_{ij}\).

dependentVariables

A numeric table of size \(n \times 1\) with dependent variables \(y_i\).

Optional Algorithm Input

The mean squared error algorithm accepts the optional input described below. Pass the Optional Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Optional Algorithm Input for MSE Computaion

Input ID

Input

weights

Optional input. Pointer to the \(1 \times n\) numeric table with weights of samples. The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix and PackedSymmetricMatrix.

By default, all weights are equal to \(1\).

gramMatrix

Optional input. Pointer to the :mathL`p times p` numeric table with pre-computed Gram matrix. The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix and PackedSymmetricMatrix.

By default, the table is set to empty numeric table.

Algorithm Parameters

The mean squared error algorithm has the following parameters. Some of them are required only for specific values of the computation method parameter method:

Algorithm Parameters for MSE Computaion

Parameter

Default value

Description

penaltyL1

\(0\)

The numeric table of size \(1 \times \mathrm{nDependentVariables}\) with L1 regularized coefficients.

penaltyL2

\(0\)

The numeric table of size \(1 \times \mathrm{nDependentVariables}\) with L2 regularized coefficients.

interceptFlag

true

Flag to indicate whether or not to compute the intercept.

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Performance-oriented computation method.

numberOfTerms

Not applicable

The number of terms in the objective function.

batchIndices

Not applicable

The numeric table of size \(1 \times m\), where \(m\) is the batch size, with a batch of indices to be used to compute the function results. If no indices are provided, the implementation uses all the terms in the computation.

Note

This parameter can be an object of any class derived from NumericTable except for PackedTriangularMatrix and PackedSymmetricMatrix.

resultsToCompute

gradient

The 64-bit integer flag that specifies which characteristics of the objective function to compute.

Provide one of the following values to request a single characteristic or use bitwise OR to request a combination of the characteristics:

value

Value of the objective function

nonSmoothTermValue

Value of non-smooth term of the objective function

gradient

Gradient of the smooth term of the objective function

hessian

Hessian of smooth term of the objective function

proximalProjection

Projection of proximal operator for non-smooth term of the objective function

lipschitzConstant

Lipschitz constant of the smooth term of the objective function

Algorithm Output

For the output of the mean squared error algorithm, see Output for objective functions.

Examples