next up previous
Next: ch4 Up: ch4 Previous: ch4

Gradient Descent

To understand, consider simpler linear unit, where


\begin{displaymath}o = w_{0} + w_{1}x_1 + \cdots + w_n x_n \end{displaymath}

Let's learn $w_{i}$'s that minimize the squared error


\begin{displaymath}E[\vec{w}] \equiv \frac{1}{2}\sum_{d \in D}(t_{d} - o_{d})^{2} \end{displaymath}

Where $D$ is set of training examples



Don Patterson 2001-12-13