ch4

Next: ch4 Up: ch4 Previous: ch4

More on Backpropagation

Gradient descent over entire network weight vector

Easily generalized to arbitrary directed graphs

Will find a local, not necessarily global error minimum In practice, often works well (can run multiple times)

Often include weight momentum $\alpha$

$\begin{displaymath}\Delta w_{i,j}(n) = \eta \delta_{j} x_{i,j} + \alpha \Delta w_{i,j}(n-1) \end{displaymath}$

Minimizes error over training examples Will it generalize well to subsequent examples?

Training can take thousands of iterations $\rightarrow$ slow!

Using network after training is very fast

Don Patterson 2001-12-13