#
Statitical Learning Theory Note by Hisashi Kashima

The course was taught in Spring 2017.
The course material is here.

- Apr. 17
- Represent discrete input by one-hot encoding

- Regularization in Ridge regression
- Include penalty on the norm of weights $w$ to avoid instability

# Apr. 17

If an input has $N$ possible discrete value, we should encode
the $i$-th value as

In essence, we introduce an $N$-dimensional binary-valued subspace
for the input.

It is not possible to represent it as an variable in $\mathbb{Z}_N$
because it is embedded in $\mathbb{R}$ and will introduce magnitude.

# Regularization in Ridge regression

Ridge regression
share the idea of weight decay in machine learning.
But their starting points differ.

## Include penalty on the norm of weights $w$ to avoid instability

when we introduce the regularization term, it will lead to new solution.

I originally thought that a small enough $\lambda$
should both ensure the stability of solution
and approximate the original solution.
But actually the $\lambda$ here can be plugged back to the original
loss function. The loss function will become

The latter one is simply the weight decaying factor in machine learning.

Written on
April
17th,
2017
by
Hanezu
Feel free to share!