[Machine Learning] $R^2$ and Regularization

1 분 소요

Skim through fundamental machine learning concepts and mathematical implications.

Importance of Scaling Data

Regression analysis is very sensitive at scaling data.
So, it is recommended to minimize influence if the input variables’
range and distribution is different.

Min-Max Scaling compares differences according to the rate of change.

Standard Scaling compares how much variables have changed throughout its distribution.

Correlation Coefficient $R^2$

$R^2$ indicates how much a predicted value $\hat{Y}$ explains the actual value $Y$ than the mean value $\overline{Y}$.

$R^2$ has a range from $0$ to $1$, and the closer it gets to $1$, the powerful the regression model is.

Overfitting

Overfitting is when a model is not general because it is overly fitted to the train data.

So it is recommended to use $R_{adj}^2$ , $R_{pred}^2$ to evaluate the efficiency of the model.

Adjusted $R^2$ $R_{adj}^2$ penalizes model by the number of input variables.

$R_{adj}^2 = 1 - \frac{SSE(n-1}{SST(n - p - 1)}$

Predicted $R^2$ $R_{pred}^2$ evaluates model from validation data.

$R_{pred}^2 = 1 - \frac{\Sigma_{i=1}^k(y_i - \hat{y}i)^2}{\Sigma{i=1}^k(y_i - \overline{y}_i)^2}$

If $R^2$ is much lesser than $R_{adj}^2$ and $R_{pred}^2$, the model is overfitted.

Variation Index Factor (VIF)

Variation Index Factor (VIF) evaluates multicollinearity between variables.
If the variables are highly correlated, the correlation coefficients become highly unstable.

$VIF_k = \frac{1}{1 - R_k^2}$

If $VIF \ge 10$, the variables have multicollinearity problem.

Regularization

Regularization removes the effect of negative input variables by adding constraint $f(\hat{\beta})$ to regression coefficients.

$min\Sigma_{i=1}^n(y_1 - \hat{y}_i)^2 + f(\hat{\beta})$

Ridge Regression

Ridge Regression (L2 Regularization) substitutes the sum of the squares of regression coefficients to contraint $f(\hat{\beta})$

($\lambda$ = hyperparameter makes more coeffients $0$ if higher)

$Minimize$ $ $ $min\Sigma_{i=1}^n(y_1 - \hat{y}i)^2 + \lambda\Sigma{i=1}^p\hat{\beta}_j^2$

Ridge reduces regression coeffients close to $0$.

LASSO

LASSO (L1 Regularization) substitutes the sum of the absolute values of regression coefficients to contraint $f(\hat{\beta})$

($\lambda$ = hyperparameter makes more coeffients $0$ if higher)

$Minimize$ $ $ $min\Sigma_{i=1}^n(y_1 - \hat{y}i)^2 + \lambda\Sigma{i=1}^p

\hat{\beta}_j

LASSO reduces regression coeffients to $0$.

Twitter Facebook LinkedIn

Uihyun Cho (조의현)

[Machine Learning] $R^2$ and Regularization

Importance of Scaling Data

Correlation Coefficient $R^2$

Overfitting

Variation Index Factor (VIF)

Regularization

Ridge Regression

LASSO

공유하기

댓글남기기

참고

SIR 감염예측 모델링

Sequence Classification of IMDB reviews with LSTM

koGPT2 fine-tuned 심리상담 챗봇

Classify CIFAR-10 dataset with CNN