Scientific research in the biological domain generates massive amounts of data of many different kinds. With a hypothesis to investigate, researchers run large numbers of experiments that use data from human and animal subjects and produce multiple outputs of different modalities, ranging from simple textual data to signal, image, and 3D volumes, such as CT and MRI scans. Despite the massive scale and complexity of this data, many researchers at the forefront of biological sciences are using antiquated methods for storing their multimedia data. Data are often kept in multiple locations, including computers, notebooks, and file drawers.

The goal of this research is to develop a unified methodology to organize and retrieve biological data from scientific experiments. Our work builds on existing work in experiment management, approximate queries, and content-based image retrieval. We are developing a query framework for multimedia data that provides users with a unified way to access multiple types of data. Queries will be able to handle both single data types and multiple related data types, such as registered CT and MRI scans or neuronal firing patterns and related fMRI data. The data will be organized in a way that is both easy for users to understand and efficient for query access. A prototype system will be built and evaluated on three different applications: a study of language sites in the human brain, an analysis of the relationship of cataract formation to genetic factors, and a study of craniofacial disorders in children.
(Faculty: Shapiro)