![]() eScience Institute Wiki (UW only) | Web
|
|
||
News
StudentsI am actively seeking students to collaborate with on the projects below. We have several projects underway involving cross-disciplinary teams of researchers. If you are a CSE student with an interest in advancing oceanography, biology, physics, or astronomy, or a domain science student with an interest in data-intensive scalable computing, databases, visualization, or algorithms, I would be happy to meet with you to discuss possible collaborations. Send me an email!To new Phd students: Note that the first step in admissions to UW CSE is handled centrally by the department; an email to me does not influence the process. ResearchThe bottleneck to scientific discovery is no longer data acquisition, but data analysis.This trend can be attributed to advances in data acquisition technology: high-throughput lab techniques, remote sensing platforms, and high resolution computational modeling. While the technology and resources necessary to collect or generate such data en masse are becoming widely available, technology to manage and analyze the data have not kept pace. Traditionally, each data acquisition activity was coupled to a specific hypothesis, but now researchers collect data en masse---they "download the world"---exchanging a problem of how to extract knowledge from the environment to one of how to extract knowledge from a database. Data analysis, and not experimental data acquisition, is the new bottleneck to discovery.
Research TopicsManagement of very large or very complex science data. Data-intensive scalable computing, scientific databases, visualization, mashups, integration of ad hoc science data. Current ProjectsHorizon: Visual Data Analytics in the CloudI am the lead PI on two NSF grants exploring the question of how cloud computing can support interactive, visual, exploratory science. Through an NSF Cluster Exploratory grant, and in partnership with visualization experts at the University of Utah, we are exploring the use of MapReduce as a common framework for both scalable data processing and scalable visualization. Through an NSF EAGER grant, I am developing a new visualization algebra for use with the Microsoft Azure platform. The core goal of both projects is to allow scientists to analyze terabytes of data in the cloud as efficiently, conveniently, and as deeply as they can analyze megabytes of data on their laptops. This work led to the HaLoop system.
SQLShare: Database-as-a-Service for Long Tail Science Our approach is to provide a basic system for querying data in the cloud (using Microsoft SQL Azure), then explore a set of smart services to streamline and automate analysis. Specifically: 1) queries are saved as views and can be shared with others for collaborative analysis, 2) we derive automatic starter queries directly from the data to bootstrap analysis, 3) we derive dashboards (``mashups'') directly from the data to automate visual analysis, 4) we are working to translate English fragments into SQL fragments to assist SQL novices, and 5) we are using previous work here at UW on SQL Autocomplete features. Our motivation is long tail science. In contrast to "big science" projects such as the Large Synoptic Survey Telescope and the Large Hadron Collider, the challenge faced in the long tail of science is not only about data volume, but about data complexity. Projects in oceanography or the life sciences may involve cleaning and integrating data from hundreds of heterogeneous data sources. Although sheer scale is not typically the defining feature of these data sources, the volumes involved are not insignificant: In the life sciences, for example, a modern short-read sequencer can generate a terabyte per day. At the University of Washington, there are approximately ten of these sequencers used on campus, and 20 more are scheduled to be purchased in the next few years. Low-cost, high-throughput mass spectrometry, microarray, and flow cytometry are similarly poised to produce exponential growth in data volumes in the next few years. Read more.... This project is supported by a Moore Foundation Grant and a 2010 Jim Gray Seed Grant from Microsoft Research. Parallel Datalog on new Computing PlatformsBuilding on our work on HaLoop, we are developing a Datalog interface to massively parallel platforms including HaLoop/Hadoop, the Cray XMT, and Microsoft's Daytona Platform on the Azure cloud. The Cray XMT supports massive multi-threading --- millions of simultaneous threads accessing shared memory at low latency --- eliminating dependence on a deep cache hierarchy for performance. PNNL is exploring the XMT as a platform for a Graph Database. While the XMT has proven capabilities in graph processing, a general-purpose semantic database necessarily involves "conventional" computation in addition to massively thread-parallel computation. A query language to insulate the user from this heterogeneity, transparently splitting a query into conventional and XMT components, does not exist. We are designing a prototype language with this property. GridFields: Algebraic Manipulation of Unstructured Meshes The large datasets produced by simulations typically have a grid structure that is not amenable to storage within traditional database systems. We've developed an algebra of GridFields that allows convenient manipulation of grid-structured datasets much in the way the relational algebra allows convenient manuipulation of table-structured data. This work originated in the context of CMOP, the NSF Science and Tehcnology Center for Coastal Margin Observation and Prediction. This work is supported by a subcontract from Woods Hole Oceanographic Institute via the NSF-funded Ocean Observatories Initiaitve and an NSF EAGER award. Data Pricing I am a Co-PI on the Data Pricing project. SciDB
TeachingCS599c: Scientific Data Management, Spring 2010, University of Washington, with Magda BalazinskaCS410/510: Scientific Data Management, Summer 2006, Portland State University Publications
Dissertation
Selected Talks (powerpoint)Some of these talks contain macros that require a visualization ActiveX control that you don't have, so you may safely respond with "disable macros" if prompted with a dialog.All movies will appear as still images by default. If you want the movies to play, download download them, unzip them in the same directory as the presentation, and make sure you open the presentation wth the correct working directory (i.e., by double-clicking the file rather than by using File->Open.)
BioBill Howe is the Director of Research for Scalable Data Analytics at the UW eScience Institute and holds an Affiliate Assistant Professor appointment in Computer Science & Engineering, where he studies the application of scientific databases, cloud computing, and frameworks for scalable data analysis. Howe has received two Jim Gray Seed Grant awards from Microsoft Research for work on managing environmental data, and has had two papers selected to appear in VLDB Journal's "Best of Conference" issue (2004 and 2010) for work in data-intensive computing for science. Howe serves on the program and organizing committees for a number of conferences in the area of scientific data management, and serves on the Science Advisory Board of the SciDB project, a project to build a new database system expressly for science. He holds a Ph.D. in Computer Science from Portland State University, where he studied under Prof. David Maier, and a Bachelor's degree in Industrial & Systems Engineering from Georgia Tech.Professional Service
Phd, Computer Science, Portland State University, 2006 I've been working with databases since 1995 when I worked for Delta Airlines as a co-op in their Technical Operations facility. When I graduated from Georgia Tech, I went to work for Deloitte Consulting designing and building enterprise client-server applications, specifically Customer Relatonship Management (CRM) systems with Siebel. After Deloitte and before graduate school, I worked as an independent contractor at Microsoft and other companies as diverse as newly deregulated telecommunications carriers to providers of oil field exploration services. |
|||