Amazon has many applications whose core is multilabel
classification. This talk will present progress towards a multilabel
learning method that can handle 10^7 training examples, 10^6 features, and
10^5 labels on a single workstation. A sparse linear model is learned for
each label simultaneously by stochastic gradient descent with L2 and L1
regularization. Tractability is achieved through careful use of sparse data
structures, and speed is achieved by using the latest stochastic gradient
methods that do variance reduction. Both theoretically and practically,
these methods achieve order-of-magnitude faster convergence than Adagrad.
We have extended them to handle non-differentiable L1 regularization. We
show experimental results on classifying biomedical articles into 26,853
scientific categories. [Joint work with Galen Andrew, ML intern at Amazon.]
Bio Charles Elkan is the first Amazon Fellow, on leave from being a
professor of computer science at the University of California, San Diego.
In the past, he has been a visiting associate professor at Harvard and a
researcher at MIT. His published research has been mainly in machine
learning, data science, and computational biology. The MEME algorithm that
he developed with Ph.D. students has been used in over 3000 published
research projects in biology and computer science. He is fortunate to have
had inspiring undergraduate and graduate students who are in leadership
positions now such as vice president at Google.