Stochastic Gradient Descent Algorithm

The stochastic gradient descent (SGD) algorithm is a special case of an iterative solver. See Iterative Solver for more details.

Computation methods

The following computation methods are available in oneDAL for the stochastic gradient descent algorithm:

Mini-batch method

The mini-batch method (miniBatch) of the stochastic gradient descent algorithm [Mu2014] follows the algorithmic framework of an iterative solver with an empty set of intrinsic parameters of the algorithm \(S_t\), algorithm-specific transformation \(T\) defined for the learning rate sequence \({\{\eta_t\}}_{t=1, \ldots, \text{nIterations}}\), conservative sequence \({\{\gamma_t\}}_{t=1, \ldots, \text{nIterations}}\) and the number of iterations in the internal loop \(L\), algorithm-specific vector \(U\) and power \(d\) of Lebesgue space defined as follows:

\[T\left({\theta }_{t-1}, g\left({\theta }_{t-1}\right), {S}_{t-1}\right)\]

For \(l\) from \(1\) until \(L\):

  1. Update the function argument: \({\theta }_{t}:= {\theta }_{t}-{\eta }_{t}\left(g\left({\theta }_{t}\right)+{\gamma }_{t}\left({\theta }_{t}-{\theta }_{t-1 }\right)\right)\)

  2. Compute the gradient: \(g\left({\theta }_{t}\right)=\nabla {F}_{I}\left({\theta }_{t}\right)\)

Convergence check: \(U=g\left({\theta }_{t-1}\right), d=2\)

Default method

The default method (defaultDense) is a particular case of the mini-batch method with the batch size \(b=1\), \(L=1\), and conservative sequence \({\gamma }_{t}\equiv 0\).

Momentum method

The momentum method (momentum) of the stochastic gradient descent algorithm [Rumelhart86] follows the algorithmic framework of an iterative solver with the set of intrinsic parameters \(S_t\), algorithm-specific transformation \(T\) defined for the learning rate sequence \({\{\eta_t\}}_{t=1, \ldots, \text{nIterations}}\) and momentum parameter \(\mu in [0,1]\), and algorithm-specific vector \(U\) and power \(d\) of Lebesgue space defined as follows:

\[T\left({\theta }_{t-1}, g\left({\theta }_{t-1}\right), {S}_{t-1}\right)\]
  1. \({v}_{t}=\mu \cdot {v}_{t-1}+{\eta }_{t}\cdot g\left({\theta }_{t-1}\right)\)

  2. \({\theta }_{t}={\theta }_{t-1}-{v}_{t}\)

For the momentum method of the SGD algorithm, the set of intrinsic parameters \(S_t\) only contains the last update vector \(v_t\).

Convergence check: \(U=g\left({\theta }_{t-1}\right), d=2\)

Computation

The stochastic gradient descent algorithm is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see Computation.

Algorithm Parameters

In addition to parameters of the iterative solver, the stochastic gradient descent algorithm has the following parameters. Some of them are required only for specific values of the computation method parameter method:

Algorithm Parameters for Stochastic Gradient Descent Algorithm Computaion

Parameter

method

Default Value

Description

algorithmFPType

defaultDense, miniBatch, momentum

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

Not applicable

defaultDense

Available computation methods:

For CPU:

  • defaultDense

  • miniBatch

  • momentum

For GPU:

  • miniBatch

batchIndices

defaultDense, miniBatch, momentum

Not applicable

The numeric table with 32-bit integer indices of terms in the objective function. The method parameter determines the size of the numeric table:

  • defaultDense: nIterations x 1

  • miniBatch and momentum: nIterations x batchSize

If no indices are provided, the implementation generates random indices.

batchSize

miniBatch,``momentum``

\(128\)

The number of batch indices to compute the stochastic gradient.

If batchSize equals the number of terms in the objective function, no random sampling is performed, and all terms are used to calculate the gradient.

The algorithm ignores this parameter if the batchIndices parameter is provided.

For the defaultDense value of method, one term is used to compute the gradient on each iteration.

conservativeSequence

miniBatch

A numeric table of size \(1 \times 1\) that contains the default conservative coefficient equal to 1.

The numeric table of size \(1 \times \text{nIterations}\) or \(1 \times 1\). The contents of the table depend on its size:

  • size = \(1 \times \text{nIterations}\): values of the conservative coefficient sequence \(\gamma^k\) for \(k = 1, \ldots, \text{nIterations}\).

  • size = \(1 \times 1\) the value of conservative coefficient at each iteration \(\gamma^1 = \ldots = \gamma^\text{nIterations}\).

innerNIterations

miniBatch

\(5\)

The number of inner iterations for the miniBatch method.

learningRateSequence

defaultDense, miniBatch, momentum

A numeric table of size \(1 \times 1\) that contains the default step length equal to 1.

The numeric table of size \(1 \times \text{nIterations}\) or \(1 \times 1\). The contents of the table depend on its size:

  • size = \(1 \times \text{nIterations}\): values of the learning rate sequence \(\eta^k\) for \(k = 1, \ldots, \text{nIterations}\).

  • size = \(1 \times 1\): the value of learning rate at each iteration \(\eta^1 = \ldots = \eta^\text{nIterations}\).

momentum

momentum

\(0.9\)

The momentum value.

engine

defaultDense, miniBatch, momentum

SharePtr< engines:: mt19937:: Batch>()

Pointer to the random number generator engine that is used internally for generation of 32-bit integer indices of terms in the objective function.