Roxana Geambasu
Graduate Student
Computer Science and Engineering
University of Washington
Office: 618 Paul Allen Center
Contact: my-first-name at cs dot washington dot edu.
Current Research
I am interested in systems research, especially distributed systems, storage systems, and databases. I am currently working on several research projects with three great professors at UW:
Steve Gribble, Hank Levy, and Yoshi Kohno.
Personal data organization and sharing
Today's huge amounts of personal data, coupled with the dispersal of personal data across a myriad of home devices and Web services, make the organization and sharing of personal data extremely challenging for the user. I am designing an architecture that will enable the integration of dispersed personal data and facilitate the building of new applications for distributed personal data management. I am working on two projects related to this problem:
-
In Menagerie (with Steve and Hank), we are looking at the challenges posed by personal data dispersion across the Web. The radical shift from the PC desktop to Web-based services is
scattering personal data across a myriad of Web sites, such as Yahoo! Mail, Gmail, Google Docs, Flickr, YouTube, MySpace, and Amazon S3. This dispersal poses
significant new challenges for users, making it more difficult for
them to: (1) organize, search, and archive their data, much of which
is now hosted by Web sites; (2) create heterogeneous
(multi-Web-service) object collections and share them in a protected
way; and (3) manipulate Web objects with standard applications or
build new tools or scripts that operate on those objects.
Menagerie is a software framework that addresses
these challenges. Menagerie creates an integrated file and object
system from heterogeneous, personal Web-service objects dispersed
across the Internet. Our Menagerie architecture has two key parts.
The Menagerie Service Interface (MSI) defines a common Web-service API
for object naming, protection, and access. The Menagerie File System
(MFS) lets desktop applications and Web services manipulate remote Web
objects as if they were local files. Our experience shows that
Menagerie greatly simplifies the construction of new applications that
support collections of heterogeneous Web objects and fine-grained
protected sharing of those objects. We describe the Menagerie
architecture and implementation, present several novel applications we
developed on Menagerie, and provide measurements that show the
practicality of our approach.
A paper presenting the Menagerie framework will appear in WWW 2008.
- In the HomeViews project with Magda, Steve and Hank, we looked at how to make it easier for users to organize and share their huge amounts of personal data stored on their machines. HomeViews is a middleware layer that facilitates the building of new applications for personal data management and P2P sharing. It allows users and applications to organize files into dynamic collections, share these dynamic collections with users over the Internet, and integrate remote collections into the local one. The main technical innovation is the integration of database-style views with a capability-based protection model from operating systems.
A paper describing the HomeViews system appeared in SIGMOD 2007 and I also gave a few talks on this topic (publications).
Distributed systems and databases
- During my summer internship with Microsoft Research (Silicon Valley) in 2007, I worked on a project ("Fault-tolerant System Specification"), where we created and analyzed formal specifications for several fault-tolerant file systems. Our goal was to explore the extent to which formal methods could help in fault-tolerant file system analysis, design, and comparison. For this, we created formal TLA+ specifications for three systems: Blue (the storage system used by Hotmail), GFS (the Google File System, as published in SOSP), and Chain Replication. Overall, we found specifications to be relatively easy to produce, useful for a deep understanding of system functioning, and valuable for system comparison. We used specifications for three purposes: (1) to crystallize design differences and similarities; (2) to understand and mechanically verify consistency properties; and (3) to experiment with alternative designs.
A paper describing our experience with formal specs will appear in DSN 2008.
- In the FlowDB project with Tanya Bragin (UW), Magda Balazinska (UW), and Jaeyeon Jung (Intel), we inspected how relational databases can be used to store and query rapid flows coming from a particular network monitoring application, network intrusion detection systems. Through careful benchmarking of DBMS query and insert performance, we determined how and when commercial databases can be used in conjunction with NIDS and propose techniques to stretch RDBMS' support for high input rates and efficient querying. One of the techniques we developed was on-demand view materialization and indexing (OVMI).
A short paper presenting our results and technique appeared at NetDB 2007, a workshop co-located with NSDI 2007.
- I also worked on a virtual machine performance over network file systems study, with John P. John and Brian Bershad. Our goal was to establish whether virtual machines can be run over network file systems (specifically NFS and AFS). A report shows our findings.
Publications, Talks, Demos, and Posters
- Roxana Geambasu, Cherie Cheung, Alexander Moshchuk, Steven D. Gribble, and Henry M. Levy. The Organization and Sharing of Web-Service Objects with Menagerie. To appear in Proceedings of the 15th International World Wide Web Conference (WWW 2008), Beijing, China, April 2008.
Paper: [PDF],[HTML]; [BIBTEX]; Slides: [PPT], [PDF].
- Roxana Geambasu, Andrew Birrell, and John MacCormick. Experiences with Formal Specification of Fault-Tolerant Storage Systems. To appear in Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN-DCCS 2008), Anchorage, Alaska, June 2008.
Paper: [PDF].
- Roxana Geambasu, Magdalena Balazinska, Steven D. Gribble, and Henry M. Levy. HomeViews: Peer-to-Peer Middleware for Personal Data Sharing Applications. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pages 235--246, Beijing, China, June 2007.
Paper: [PDF],[PS],[HTML]; slides: [PPT]; [BIBTEX].
- Roxana Geambasu, Tanya Bragin, Jaeyeon Jung, Magdalena Balazinska. ``On-Demand View Materialization and Indexing for Network Forensic Analysis.'' In Proceedings of the Third International Workshop on Networking Meets Databases (NetDB), Boston, MA, April 2007.
Paper: [PDF],[HTML]; slides: [PPT]; [BIBTEX].
-
``Specification and comparison of fault-tolerant storage systems.'' Microsoft Research talk, 2007.
Slides: [PPT].
-
``A Web of personal files.'' Talk, poster, and demo at UW CSE Annual Industrial Affiliates, Oct. 2006.
Slides: [PPT]; Poster: [PDF] (note that HomeViews' name was SharedViews back then :)).
-
``Capability access control for P2P data sharing.'' UW Qualifying paper, June 2006.
Report: [PDF]; slides: [PDF],[PPT].
-
``Study of virtual machine performance over network file systems.'' Technical report, June 2006.
Report: [PDF].
Ph.D. Progress
Quals
I passed my Quals as of spring 2007. On Aug 30 2006, I gave my Quals talk on the HomeViews system (the system was called SharedViews back then). See publications for the report and talk.
Graduate Coursework