[study] Stanford CS224N: Lecture 3 - Backprop and Neural Networks

1 분 소요

Week 3 task of Stanford CS244n: Natural Language Processing with Deep Learning

Lecture

Named Entity Recognition (NER)

Named Entity Recognition identifies and classifies named entities
into predefined entity categories such as person names, organizations…

For example,

“Harry Kane missed his penalty at the World Cup 2022.”

“Harry Kane” - (Person Name)
“World Cup 2022” - (Location)

Binary Word Window Classification

Binary Word Window Classification classifies center word
for each class based on the presence of word in a given context window.

The classification is binary because it classifies text into {yes/no}
given the {presence/absence} of the target word.

For Example,

“Heungmin Son scored a Hat-trick last week.” (target word -> “Hat-trick”)

The classification will classify the presence of the target word “Hat-trick” in the sentence.
Label $1$ if “Hat-trick” is present in the sentence. If not, label $0$.

Matrix Calculus

Why Calculate gradients using matrix calculus?

Faster calculation speed than non-vectorized gradients
Is an effective method to handle similar iterative operations

Jacobian Matrix is a $mxn$ matrix of partial derivatives.

$n$ = inputs, $m$ = outputs, $f : R^n -> R^m$

Procedures

$x$ = input
$z = Wx + b$

Input Layer

$\frac{\partial z}{\partial x} = W$

Hidden Layer

$\frac{\partial h}{\partial z} = diag(f’(z))$

Output Layer

$\frac{\partial s}{\partial h} = u^T$

Jacobian Matrix

$\frac{\partial s}{\partial u} = h^T$

$\frac{\partial s}{\partial W} = \frac{\partial s}{\partial h}\frac{\partial h}{\partial z}\frac{\partial z}{\partial W}$

$\frac{\partial s}{\partial b} = \frac{\partial s}{\partial h}\frac{\partial h}{\partial z}\frac{\partial z}{\partial b}$

$\frac{\partial s}{\partial h}\frac{\partial h}{\partial z} = \delta$

$\frac{\partial s}{\partial b} = u^Tdiag(f’(z))I$

$\frac{\partial s}{\partial W} = \delta^Tx^T$

Back Propagation

Backpropagation reuses the weights of the network to update weights
to the direction of reducing the loss.

Backpropagation steps

Feed forward input x through the network to produce $\hat{y}$
Calculate difference between output $\hat{y}$ and target $y$
Backpropagate the derivative of loss function with $\hat{y}$
Backpropagate the derivative of $\hat{y}$ with hidden layer
Calculate the product of the gradients from 3 and 4
Update weights to the negative direction

Twitter Facebook LinkedIn

Uihyun Cho (조의현)

[study] Stanford CS224N: Lecture 3 - Backprop and Neural Networks

Lecture

Named Entity Recognition (NER)

Binary Word Window Classification

Matrix Calculus

Procedures

Back Propagation

공유하기

댓글남기기

참고

SIR 감염예측 모델링

Sequence Classification of IMDB reviews with LSTM

koGPT2 fine-tuned 심리상담 챗봇

Classify CIFAR-10 dataset with CNN