Title: Supervising Music Transcription
Advisors: Sham Kakade and Zaid Harchaoui (Stat)
Abstract: Music transcription can be viewed as a multi-label classification problem, in which we identify notes present in an audio recording at time t based on a two-sided contextual window of audio surrounding t. We introduce a large-scale dataset, MusicNet, consisting of music recordings and labels suitable to supervising transcription and other learning tasks. We will discuss the construction of the dataset, including optimal-alignment protocols and the value of side-information. We will then turn our attention to network architectures and data augmentations that lead to state-of-the-art performance for music transcription. Along the way, we will consider several scientific questions: what are the low level features of musical audio? What are the invariances of these recordings?