To understand, consider simpler linear unit, where
Let's learn 's that minimize the squared error
Where is set of training examples