.. ****************************************************************************** .. * Copyright 2020-2021 Intel Corporation .. * .. * Licensed under the Apache License, Version 2.0 (the "License"); .. * you may not use this file except in compliance with the License. .. * You may obtain a copy of the License at .. * .. * http://www.apache.org/licenses/LICENSE-2.0 .. * .. * Unless required by applicable law or agreed to in writing, software .. * distributed under the License is distributed on an "AS IS" BASIS, .. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. * See the License for the specific language governing permissions and .. * limitations under the License. .. *******************************************************************************/ .. re-use for math equations: .. |x_vector| replace:: :math:`(x_1, \ldots, x_p)` .. |j_1_k| replace:: :math:`j = 1, \ldots, k` .. _elastic_net: Elastic Net =========== Elastic Net is a method for modeling relationship between a dependent variable (which may be a vector) and one or more explanatory variables by fitting regularized least squares model. Elastic Net regression model has the special penalty, a sum of L1 and L2 regularizations, that takes advantage of both :ref:`ridge` and :ref:`LASSO ` algorithms. This penalty is particularly useful in a situation with many correlated predictor variables [Friedman2010]_. Details ******* Let |x_vector| be a vector of input variables and :math:`y = (y_1, \ldots, y_k)` be the response. For each |j_1_k|, the Elastic Net model has the form similar to linear and ridge regression models [Hoerl70]_ with one exception: the coefficients are estimated by minimizing mean squared error (MSE) objective function that is regularized by :math:`L_1` and :math:`L_2` penalties. .. math:: y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj} Here :math:`x_i`, :math:`i = 1, \ldots, p`, are referred to as independent variables, :math:`y_j`, |j_1_k|, is referred to as dependent variable or response. Training Stage -------------- Let :math:`(x_{11}, \ldots, x_{1p}, y_{11}, \ldots, y_{1k}) \ldots (x_{n1}, \ldots, x_{np}, y_{n1}, \ldots, y_{nk})` be a set of training data (for regression task, :math:`n >> p`, and for feature selection :math:`p` could be greater than :math:`n`). The matrix :math:`X` of size :math:`n \times p` contains observations :math:`x_{ij}`, :math:`i = 1, \ldots, n`, :math:`j = 1, \ldots, p` of independent variables. For each :math:`y_j`, :math:`j = 1, \ldots, k`, the Elastic Net regression estimates :math:`(\beta_{0j}, \beta_{1j}, \ldots, \beta_{pj})` by minimizing the objective function: .. math:: F_j(\beta) = \frac{1}{2n} \sum_{i=1}^{n}(y_{ij} - \beta_{0j} - \sum_{q=1}^{p}{\beta_{qj}x_{iq})^2} + \lambda_{1j} \sum_{q=1}^{p}|\beta_{qj}| + \lambda_{2j} \frac{1}{2}\sum_{q=1}^{p}\beta_{qj}^{2} In the equation above, the first term is a mean squared error function, the second and the third are regularization terms that penalize the :math:`L_1` and :math:`L_2` norms of vector :math:`\beta_j`, where :math:`\lambda_{1j} \geq 0`, :math:`\lambda_{2j} \geq 0`, |j_1_k|. For more details, see [Hastie2009]_ and [Friedman2010]_. By default, :ref:`Coordinate Descent ` iterative solver is used to minimize the objective function. :ref:`SAGA ` solver is also applicable for minimization. Prediction Stage ---------------- Prediction based on Elastic Net regression is done for input vector |x_vector| using the equation :math:`y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}` for each |j_1_k|.