Break-out Topics and Talks
Wednesday, October 19, 2016
|10:00 - 10:30am||Registration and coffee/breakfast
Gates Commons (691)
|10:30 - 11:10am||Welcome and Overview by Ed Lazowska and Hank Levy + various faculty on research areas
Gates Commons (691)
11:15am - 12:20pm
|Computing for Development
|Vision and Computer Graphics at UW
|1:00 - 1:30pm||Keynote Talk
1:30 - 2:35pm
|Systems Architecture and Networking Session 1
|Programming Languages Session 1
|Innovations in Mobile Systems
2:40 - 3:45pm
|Deep Learning Session 1
|Programming Languages Session 2
3:50 - 4:55pm
|Deep Learning Session 2
|System Architecture and Networking Session 2
|Physiological Sensing for Health and Input
|5:00 - 5:30pm||Open House Reception
|5:30 - 5:45pm||CSE2 Overview
|5:45 - 7:00pm||Poster Session and Lab Tours
various labs and locations around the building
|7:15 - 7:45pm||Program: Madrona Prize, People's Choice Awards
- 11:15-11:20: Introduction and Overview, Richard Ladner
- 11:20-11:40: Equitable pedestrian wayfinding, Anat Caspi
As pedestrians, we each experience the built environment differently. Our physical abilities greatly impact our access to the world and resources around us. Equitable pedestrian wayfinding is crucial for a barrier-free city, where people with different abilities can independently access customized, relevant, and up-to-date routing information. Pedestrians present heterogeneous information requirements consisting of static and transient information ranging from elevation changes to curb ramps to transient sidewalk surface conditions. However, such data, including the location of sidewalks, are generally unavailable in a user-consumable format. Moreover, existing routing solutions primarily optimize for distance, offering routes with steep inclines that are unusable by many manual wheelchair users. A data model for equitable pedestrian wayfinding must flexibly support an annotated pedestrian network: a connected graph model that can be visualized and populated with data to parametrize a personalizable cost function. In this talk, we will present a set of capabilities (some existing, some in progress) that enable ability-based pedestrian routing.
The Taskar Center for Accessible Technology (TCAT) at the University of Washington Department of Computer Science & Engineering develops and deploys technologies that increase independence and improve quality of life for individuals with motor and speech impairments.
- 11:40-12:00: Making K-12 CS Education Accessible, Richard Ladner and Lauren Milne
In this talk we give the background on the Computer Science for All Initiative proposed by the White House in January 2016. We review the demographics and needs of students with disabilities in K-12 and present a summary of current accessible tools and curricula for these children. We then present our research on making block languages accessible on touch screens for children in K-5.
- 12:00-12:20: An ASL to English Dictionary the Improves with use, Richard Ladner
Students learning American Sign Language (ASL) have trouble searching for the meaning of unfamiliar signs. ASL signs can be differentiated by a small set of simple features including hand shape, orientation, location, and movement. In a feature-based ASL-to-English dictionary, users search for a sign by providing a query, which is a set of observed features. Because there is natural variability in the way signs are executed, and observations are error-prone, an approach other than exact matching of features is needed. We propose ASL-Search, an ASL-to-English dictionary that adapts to users' queries to improve search over time.
- 11:15-11:20: Introduction and Overview, Richard Anderson
- 11:20-11:40: Security of Mobile Apps for Financial Services, Fahad Pervaiz
Digital money drives modern economies, and the global adoption of mobile phones has enabled a wide range of digital financial services in the developing world. Where there is money, there must be security, yet prior work on mobile money has identified discouraging vulnerabilities in the current ecosystem. Our work explores how real are these problems through large scale analysis of existing financial apps. Furthermore, we conducted series of interviews with developers and designers in Africa and South America to have better understanding of their security practices.
- 11:40-12:00: Crowdsourcing text transcription in developing countries, Aditya Vashistha
Speech transcription is an expensive service, with high turnaround time for audio files containing languages spoken in developing countries and regional accents of well-represented languages like English. We present Respeak — a voice-based, crowd-powered system that capitalizes on the strengths of crowdsourcing and automatic speech recognition (instead of typing) to transcribe such audio files. We created Respeak and optimized its design through a series of cognitive experiments. We deployed it with 25 college students in India who completed 5464 micro-transcription tasks, transcribing 55 minutes of widely-varied audio content, and collectively earning USD 46 as mobile airtime. The system aligned the transcript generated by five randomly selected users to transcribe audio files with a word error rate (WER) of 10.6%. The cost of speech transcription was 0.83 USD/minute with a turnaround time of 39.8 hours, substantially less than industry standards. Our findings suggest that Respeak improves the quality of speech transcription while enhancing the earning potential of low-income populations in resource-constrained settings.
- 12:00-12:20: Community Cellular, Kurtis Heimerl
Despite the narrative of ubiquitous cellular coverage throughout the world, hundreds of millions of people still remain outside the range of traditional networks. The reason for this is primarily economic; incumbents cannot profitably serve the most rural parts of the world. Our solution for this is a new network architecture called a Community Cellular Network (CCN). CCNs are owned and operated by local agents who are able to more efficiently operate and maintain infrastructure in their communities. We describe and example CCN system and our current and future deployments of CCNs.
- 11:15-11:20: Introduction and Overview, Brian Curless
- 11:20-11:40: Dreambit, Ira Kemelmacher-Schlizerman
I'll describe and demo the Dreambit system that lets one imagine how they may look with different hairstyles, hair colors, hats, etc. just from a single photo.
- 11:40-12:00: Realistic Editing of Indoor Spaces, Edward Zhang
Many compelling virtual and augmented reality experiences involve photorealistic rerendering of real-world scenes. However, the most exciting VR/AR applications also involve changing the scene (adding virtual objects, removing real objects or changing the properties of existing objects). While existing capture systems can give highly photorealistic results, they cannot support these meaningful scene edits. We present a system that takes as input an RGBD video sequence of an indoor scene, and creates a complete description of an "empty room" scene model. This scene model enables realistic scene edits, allowing us to not only render the room devoid of all clutter, but also to add furniture, change material properties, and even to relight the scene, all with realistic global illumination effects.
- 12:00-12:20: Making Object Detection Useful, Joseph Redmon
Object detection algorithms are increasingly accurate but also increasingly computationally expensive. We focus on making object detection useful by looking at two main questions: "how can we make detection more efficient by design?" and "how can we take existing algorithms and scale them down even further?". Two years ago, object detection took 20 seconds per image on a 2,500 core GPU. Now, we can do it in 30 milliseconds on a desktop CPU. This opens up a range of applications for object detection on mobile devices and low-power, embedded systems.
- 1:30-1:35: Introduction and Overview, Xi Wang
- 1:35-1:55: Diamond: Automating Data Management and Storage for Wide-area, Reactive Applications, Irene Zhang
Users of today's popular wide-area apps (e.g., Twitter, Google Docs, and Words with Friends) no longer save and reload when updating shared data; instead, these applications are reactive, providing the illusion of continuous synchronization across mobile devices and the cloud. Achieving this illusion poses a complex distributed data management problem for programmers. This talk presents the first reactive data management service, called Diamond, which provides persistent cloud storage, reliable synchronization between storage and mobile devices, and automated execution of application code in response to shared data updates. We demonstrate that Diamond greatly simplifies the design of reactive applications, strengthens distributed data sharing guarantees, and supports automated reactivity with low performance overhead.
- 1:55-2:15: Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering, Jialin Li
Distributed applications use replication, implemented by protocols like Paxos, to ensure data availability and transparently mask server failures. This paper presents a new approach to achieving replication in the data center without the performance cost of traditional methods. Our work carefully divides replication responsibility between the network and protocol layers. The network orders requests but does not ensure reliable delivery -- using a new primitive we call ordered unreliable multicast (OUM). Implementing this primitive can be achieved with near-zero-cost in the data center. Our new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination. The resulting system not only outperforms both latency- and throughput-optimized protocols on their respective metrics, but also yields throughput within 2% and latency within 16 us of an unreplicated system -- providing replication without the performance cost.
- 2:15-2:35: Push-Button verification of File Systems via Crash Refinement, Helgi Sigurbjarnarson
The file system is an essential operating system component for persisting data on storage devices. Writing bug-free file systems is non-trivial, as they must correctly implement and maintain complex on-disk data structures even in the presence of system crashes and reorderings of disk operations. This talk presents Yggdrasil, a toolkit for writing file systems with push-button verification: Yggdrasil requires no manual annotations or proofs about the implementation code, and it produces a counterexample if there is a bug. Yggdrasil achieves this automation through a novel definition of file system correctness called crash refinement, which requires the set of possible disk states produced by an implementation (including states produced by crashes) to be a subset of those allowed by the specification. Crash refinement is amenable to fully automated satisfiability modulo theories (SMT) reasoning, and enables developers to implement file systems in a modular way for verification. With Yggdrasil, we have implemented and verified the Yxv6 journaling file system, the Ycp file copy utility, and the Ylog persistent log. Our experience shows that the ease of proof and counterexample-based debugging support make Yggdrasil practical for building reliable storage applications.
- 1:30-1:35: Introduction and Overview, Ras Bodik
- 1:35-1:55: Data Structure Synthesis, Calvin Loncaric
Many applications require specialized data structures not found in the standard libraries, but implementing new data structures by hand is tedious and error-prone. This paper presents a novel approach for synthesizing efficient implementations of complex collection data structures from high-level specifications that describe the desired retrieval operations. Our approach handles a wider range of data structures than previous work, including structures that maintain an order among their elements or have complex retrieval methods. We have prototyped our approach in a data structure synthesizer called Cozy. Four large, real-world case studies compare structures generated by Cozy against handwritten implementations in terms of correctness and performance. Structures synthesized by Cozy match the performance of handwritten data structures while avoiding human error.
- 1:55-2:15: Programming by Demonstration for Web Scraping, Sarah Chasins
Although more and more data is available on the web every day, existing tools for non-coders are not sufficient to scrape many of the large and complicated datasets represented online. We are building a new tool that combines existing work on wrapper induction and our own work on robust record and replay to help non-programmers build custom scraping scripts. We use PBD techniques to produce draft programs based on a user's demonstration of how to scrape the first row of a dataset. These draft programs can already collect datasets that no previous end-user tool could scrape. Next, we plan to express these programs in an accessible visual language that allows non-programmers to read and edit their scripts. We hope this will empower users to produce an even wider array of customized web automation scripts.
- 2:15-2:35: Synthesizing Structurally Rich SQL Queries from Input-Output Examples, Chenglong Wang
Relational databases have a wide range of users in different fields, and many of them are non-professional programmers who constantly suffer from querying databases using SQL queries. In order to help end-users to formulate SQL queries, we propose a programming-by-example system for SQL that synthesizes SQL queries from input-output examples provided by end-users. Our system is developed based on a efficient synthesis algorithm using abstract enumerative search, which scales well to solve over hundred real world benchmarks collected from online help forum Stack Overflow.
- 1:30-1:35: Introduction and Overview, Shyam Gollakota
- 1:35-1:50: Bringing Internet Connectivity to Implanted Devices, Vikram Iyer
We introduce inter-technology backscatter, a novel approach that transforms wireless transmissions from one technology to another, on the air. Specifically, we show for the first time that Bluetooth transmissions can be used to create Wi-Fi and ZigBee-compatible signals using backscatter communication. Since Bluetooth, Wi-Fi and ZigBee radios are widely available, this approach enables a backscatter design that works using only commodity devices. We build prototype backscatter hardware using an FPGA and experiment with various Wi-Fi, Bluetooth and ZigBee devices. Our experiments show we can create 2–11 Mbps Wi-Fi standards-compliant signals by backscattering Bluetooth transmissions. To show the generality of our approach, we also demonstrate generation of standards-compliant ZigBee signals by backscattering Bluetooth transmissions. Finally, we build proof-of-concepts for previously infeasible applications including the first contact lens form-factor antenna prototype and an implantable neural recording interface that communicate directly with commodity devices such as smartphones and watches, thus enabling the vision of Internet connected implanted devices.
- 1:50-2:05: Next big leap in backscatter communication, Vamsi Talla
This talk overturns the conventional wisdom that backscatter is limited to short-ranges and presents the first wide-area backscatter communication system.
- 2:05-2:20: Sending passwords over the human body, Merhdad Hessar
We show for the first time that commodity devices can be used to generate wireless data transmissions that are confined to the human body. Specifically, we show that commodity input devices such as fingerprint sensors and touchpads can be used to transmit information to only wireless receivers that are in contact with the body. We characterize the propagation of the resulting transmissions across the whole body and run experiments with ten subjects to demonstrate that our approach generalizes across different body types and postures. We also evaluate our communication system in the presence of interference from other wearable devices such as smartwatches and nearby metallic surfaces. Finally, by modulating the operations of these input devices, we demonstrate bit rates of up to 50 bits per second over the human body.
- 2:20-2:35: Fine grained finger tracking using active sonar, Rajalakshmi Nandakumar
FingerIO is a novel fine-grained finger tracking solution that transforms any space around off-the-shelf smartphones or smartwatches into an interactive surfaces. FingerIO does not require instrumenting the finger with sensors and works even in the presence of occlusions between the finger and the device. We achieve this by transforming the device into an active sonar system that transmits inaudible sound signals and tracks the echoes of the finger at its microphones. To achieve subcentimeter level tracking accuracies, we present an innovative approach that uses a modulation technique common in wireless communication called Orthogonal Frequency Division Multiplexing (OFDM). Our evaluation shows that FingerIO can achieve 2-D finger tracking with an average accuracy of 8 mm using the built-in microphones and speaker of an Android smartphone. It also tracks subtle finger motion around the device, when the phone is inside a pocket. Finally, we prototype a smart watch form-factor FingerIO device and show that it can extend the interaction space to a 0.5 X 0.25 m2 region on either side of the device and work even when it is fully occluded from the finger.
- 2:40-2:45: Introduction and Overview, Luke Zettlemoyer
- 2:45-2:55: Situation Recognition: Structured Prediction with Deep Networks, Mark Yatskar
We introduce situation recognition, the problem of producing a concise summary of the situation an image depicts including: (1) the main activity (e.g., clipping), (2) the participating actors, objects, substances, and locations (e.g., man, shears, sheep, wool, and field) and most importantly (3) the roles these participants play in the activity (e.g., the man is clipping, the shears are his tool, the wool is being clipped from the sheep, and the clipping is in a field). We also propose structured prediction models that can be used to show, in activity-centric images, situation-driven prediction of objects and activities outperforms independent object and activity recognition. Furthermore, we show how compositional models can be used tackle problems of structured sparsity inherent in representations of situations.
- 2:55-3:05: Learning to be Obama, Supasorn Suwajanakorn
We all very well know President Obama -- how he looks, how he talks, how he sounds like. But what makes up his persona? Can we teach a computer someone's persona and ultimately replicate that person with a lively 3D model that acts just like them? In this talk, I will demonstrate our first effort toward that goal and a novel system capable of visually generating Obama's speech given only raw audio. Unlike others, our lip-sync technique does not require any hand-crafted animation, speech recognition, traditional mouth motion modeling, or a driver video sequence and operates purely from audio input using recurrent neural network trained on 20 hours of Obama weekly addresses.
- 3:05-3:25: Summarizing Source Code using a Neural Attention Model, Srini Iyer
High quality source code is often paired with high level summaries of the computation it performs, for example in code documentation or in descriptions posted in online forums. Such summaries are extremely useful for applications such as code search but are expensive to manually author, hence only done for a small fraction of all code that is produced. In this paper, we present the first completely data-driven approach for generating high level summaries of source code. Our model, CODE-NN , uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries. CODE-NN is trained on a new corpus that is automatically collected from StackOverflow, which we release. Experiments demonstrate strong performance on two tasks: (1) code summarization, where we establish the first end-to-end learning results and outperform strong baselines, and (2) code retrieval, where our learned model improves the state of the art on a recently introduced C# benchmark by a large margin.
- 3:25-3:35: Self-supervised Learning of Dense Visual Descriptors, Tanner Schmidt
Robust estimation of correspondences between image pixels is an important problem in robotics, with applications in tracking, mapping, and recognition of objects, environments, and other agents. Correspondence estimation has long been the domain of hand-engineered features, but more recently deep learning techniques have provided powerful tools for learning features from raw data. The drawback of the latter approach is that a vast amount of (labelled, typically) training data is required for learning. We advocate a new approach to learning dense image correspondences in which we harness the power of a strong 3D generative model to automatically label correspondences in video data. A fully-convolutional network is trained using a contrastive loss to produce viewpoint- and lighting-invariant features. As a proof of concept, we collected two datasets: the first depicts the upper torso and head of the same person in widely varied settings, and the second depicts an office as seen on multiple days with objects re-arranged within. Our datasets focus on re-visitation of the same objects and environments, and we show that by training the CNN only from local tracking data, our learned visual descriptor generalizes towards identifying non-labelled correspondences across videos. We furthermore show that our approach to descriptor learning can be used to achieve state-of-the-art single-frame localization results on the MSR 7-scenes dataset without using any training labels identifying correspondences between the separate videos of the same scenes in the dataset.
- 3:35-3:45: Modeling Scene Dynamics using Deep Neural Networks, Arunkumar Byravan
The ability to predict how an environment changes based on forces applied to it is fundamental for a robot to achieve specific goals. For instance, in order to arrange a table, a robot has to reason about where and how to push objects, which requires some understanding of physics. Research has shown that humans learn mental models of physics right from a young age, enabling them to robustly perform complex real world tasks. In this work, we use deep learning to model this concept of "physical intuition", learning a model to predict scene dynamics based on applied actions. Given an input scene and an action, our model identifies objects in the scene and predicts a rigid body motion for each salient object. We present results on multiple simulated scenarios and preliminary results on a robot pushing task which show that our method produces meaningful and interpretable results while outperforming deep flow-based baselines.
- 2:40-2:45: Introduction and Overview, Dan Grossman
- 2:45-3:05: Ouroboros Secure Kernel Extensibility for Linux, Jared Roesch and Luke Nelson
Modern operating systems increasingly utilize embedded languages to give user space more control over kernel behavior. For example, Linux has recently introduced an extended BPF (eBPF) interpreter, sometimes known as "the universal in-kernel virtual machine," which allows user-space applications to control packet filtering, conduct performance analysis, and specify security policies. This added functionality also increases the surface area for attacks on the kernel by malicious applications: a number of bugs have been found in eBPF, the consequences of which range from kernel crashes to whole-system compromises. This work presents Ouroboros, a formally verified eBPF interpreter that eliminates many security risks associated with in-kernel interpreters.
- 3:05-3:25: Disciplined Locking: No More Data Races, Michael D. Ernst
A locking discipline prevents certain concurrency errors by indicating which locks must be held when a given operation occurs. For instance, a lock may protect accesses to a shared resource, preventing race conditions that can result in corrupted data. The @GuardedBy annotation (from Java Concurrency In Practice) is a common way to express a locking discipline in a Java. This talk will show that in the presence of aliasing (two references to the same object), it does not prevent concurrency errors. We introduce a value-based interpretation that is consistent with the JVM and prevents concurrency errors. We also present a tool that verifies @GuardedBy annotations, giving you a practical way to prevent concurrency errors in your code.
- 3:25-3:45: Getting Memory Consistency Right, James Bornholt
If you’ve ever written low-level concurrent code, you might have run into the issue of memory consistency: hardware (or compilers) reordering your memory operations to do things you don’t expect. Relaxed memory consistency is unintuitive, subtle, varies between platforms, and is often undocumented or underspecified. These perils engender bugs in fundamental systems code like kernels and compilers. To bring some clarity to this problem, we’ve been working on a tool called MemSynth that automatically synthesizes specifications for relaxed memory consistency models. MemSynth starts from simple concrete examples of relaxed behavior and builds them up to a full, formal specification, which can then be used to verify the correctness of, or make repairs to, concurrent code. MemSynth can synthesize specifications for the x86 and PowerPC architectures in under 30 seconds, and can automatically probe these specifications for unresolved ambiguities.
- 2:40-2:45: Introduction and Overview, James Fogarty
- 2:45-2:57: Crumbs: Lightweight Daily Food Challenges to Promote Engagement and Mindfulness, Daniel Epstein
Many people struggle with efforts to make healthy behavior changes, such as healthy eating. Several existing approaches promote healthy eating, but present high barriers and yield limited engagement. As a lightweight alternative approach to promoting mindful eating, we introduce and examine crumbs: daily food challenges completed by consuming one food that meets the challenge. We examine crumbs through developing and deploying the iPhone application Food4Thought. In a 3-week field study with 61 participants, crumbs supported engagement and mindfulness while offering opportunities to learn about food. Our 2x2 study compared nutrition versus non-nutrition crumbs coupled with social versus non-social features. Nutrition crumbs often felt more purposeful to participants, but non-nutrition crumbs increased mindfulness more than nutrition crumbs.
- 2:57-3:09: Supporting Patient-Provider Collaboration to Identify Individual Triggers using Food/Symptom Journals, Jessica Schroeder
Although patient-generated health data has the potential to help healthcare providers improve diagnoses and personalize treatment recommendations, the patient and provider often struggle to interpret the data once it has been collected. Using the example of irritable bowel syndrome, we created two interactive visualizations to help patients and providers collaboratively investigate a patient’s food and symptom data and identify symptom triggers. We then examined the visualizations in interviews with pairs of patients and providers. We found that collaboratively reviewing such visualizations can help patients and providers better explain their experiences and recommendations, and can foster mutual trust in their relationship.
- 3:09-3:21: TummyTrials: Using Self-Experimentation to Detect Individualized Food Triggers, Ravi Karkar
Diagnostic self-tracking, the recording of personal information to diagnose or manage a health condition, is a common practice, especially for people with chronic conditions. However, people often lack knowledge and skills needed to design and conduct scientifically rigorous experiments, and current tools provide little support. To address these shortcomings and explore opportunities for diagnostic self tracking, we designed, developed, and evaluated a mobile app that applies a self experimentation framework to support patients suffering from irritable bowel syndrome (IBS) in identifying their personal food triggers. TummyTrials aids a person in designing, executing, and analyzing self experiments to evaluate whether a specific food triggers their symptoms. We examined the feasibility of this approach in a field study with 15 IBS patients, finding that participants could use the tool to reliably undergo a self-experiment. However, we also discovered an underlying tension between scientific validity and the lived experience of self experimentation. We discuss challenges of applying clinical research methods in everyday life, motivating a need for the design of self experimentation systems to balance rigor with the uncertainties of everyday life.
- 3:21-3:33: Making Sense of Sleep Sensors, Ruth Ravichandran
Sleep is difficult for people to track manually because it is an unconscious activity. The ability to sense sleep has lowered the barriers to track an important aspect of our health. Although sleep sensing is widely available, its usefulness and potential to promote healthy sleep behaviors has not been fully realized. To understand how to improve sleep sensing devices, we surveyed 87 and interviewed 12 people who currently use or have previously used sleep sensors, interviewed 5 sleep medical experts, and conducted an in-depth qualitative analysis of 6986 reviews of the most popular commercial sleep sensing technologies. We found that the feedback provided by current sleep sensing technologies affects users’ perception of their sleep and creates goals that come in tension with aspects of health experts recommend that are modifiable behaviors key to improving sleep. Our research provides design guidelines to improve the feedback of sleep sensing technologies that bridges the gap between expert goals and user goals to help effect positive change for sleep.
- 3:33-3:45: From Personal Informatics to Family Informatics: Understanding Family Practices around Health Monitoring, Laura Pina
In families composed of parents and children, the health of parents and children is often interrelated: the health of children can have an impact on the health of parents, and vice versa. However, the design of health tracking technologies typically focuses on individual self-tracking and self management, not yet addressing family health in a unified way. To examine opportunities for family-centered health informatics, we interviewed 14 typically healthy families, interviewed 10 families with a child with a chronic condition, and conducted three participatory design sessions with children aged 7 to 11. Although we identified similarities between family centered tracking and personal self-tracking, we also found families want to: (1) identify ripple effects between family members; (2) consider both caregivers and children as trackers to support distributing the burdens of tracking across family members; and (3) identify and pursue health guidelines that consider the state of their family (e.g., specific health guidelines for families that include with a child with a chronic condition). We contribute to expanding the design lens from self tracking to family-centered health tracking.
- 3:50-3:55: Introduction and Overview, Dieter Fox
- 3:55-4:10: Where did that thing go? Deep Object Tracking In Video, Daniel Gordon
Tracking is a crucial part of many computer vision and robotics applications. Sometimes the objects are known beforehand; this is especially true in robotics settings in a controlled lab environment. However, sometimes the objects are unknown. We propose two real-time systems that use deep neural networks to track known and unknown objects in videos. These approaches work better than previously proposed deep methods, achieving the best known results on object tracking in Imagenet Video, while still operating at 30-75 frames per second. Furthermore, we explore the effects of occlusion, a known stumbling point of many tracking systems, on our method and compare favorably in our new "simulated flying objects" benchmark.
- 4:10-4:25: Submodular sum-product networks for scene understanding, Abram Friesen
Scene understanding is a challenging problem requiring the simultaneous detection, segmentation, and recognition of all objects in a scene. While advances in deep learning have greatly improved accuracy in each of these tasks, modern approaches to scene understanding are unable to understand and reason about high-level relationships among objects, such as constituency and subcategorization. Such approaches are akin to trying to understand natural language using only part-of-speech tagging, instead of inferring a parse tree from a probabilistic context-free grammar (PCFG). Unfortunately, parsing images using a direct application of PCFGs is highly intractable. In this paper, we define submodular sum-product networks (SSPNs), which are sum-product networks where the weight of each sum node is given by a conditional random field with submodular potentials. By combining submodularity, which permits efficient image segmentation, with sum-product networks, a class of tractable deep probabilistic models that subsumes PCFGs, we are able to define an efficient and convergent inference algorithm for finding the (approximate) best parse of an image. Using this algorithm, we show that SSPNs can be learned discriminatively from data and when combined with modern convnet-based features the whole system can be trained end-to-end. Empirically, SSPNs achieve promising results on multiple semantic segmentation tasks, which we attribute to their ability to represent and reason about the high-level structure of objects in the scene.
- 4:25-4:40: Towards Perceiving and Manipulating Liquids using Deep Learning, Connor Schenck
Liquids are ubiquitous in human environments, yet little research has been done on how robots can interact with and reason about liquids from grounded sensory data. In this work, we combine the recent successes of deep learning with this relatively unexplored topic in order to see if robots can robustly perceive and manipulate liquids. We investigate two primary research questions: "Is it possible for robots to perceive and reason about liquids?" and "Can a robot use this in a control task involving liquids?" We show that not only can robots perceive and reason about liquids, but that they can also use these perceptions to solve a pouring task with even a relatively simplistic controller. Our work clearly shows that robots can robustly perceive and manipulate liquids in real, human environments.
- 4:40-4:55: Global Neural CCG Parsing with Optimality Guarantees, Kenton Lee
We introduce the first global recursive neural parsing model with optimality guarantees during decoding. To support global features, we give up dynamic programs and instead search directly in the space of all possible subtrees. Although this space is exponentially large in the sentence length, we show it is possible to learn an efficient A* parser. We augment existing parsing models, which have informative bounds on the outside score, with a global model that has loose bounds but only needs to model non-local phenomena. The global model is trained with a novel objective that encourages the parser to search both efficiently and accurately. The approach is applied to CCG parsing, improving state-of-the-art accuracy by 0.4 F1. The parser finds the optimal parse for 99.9% of held-out sentences, exploring on average only 190 subtrees.
- 3:50-3:55: Introduction and Overview, Arvind Krishnamurthy
- 3:55-4:15: IncBricks: Enabling In-network Computations with a Programmable Network Middlebox, Ming Liu
The emergence of programmable network devices and the increasing data traffic of datacenters motivate our in-network computation idea. By offloading computing operations onto intermediate networking devices (e.g., switches, middleboxes), one can (1) serve network requests on the fly with low latency; (2) reduce datacenter traffic and mitigate network congestion; (3) save energy by running servers in a low-power mode. However, since (1) existing switch technology doesn’t provide general computing capabilities, and (2) commodity datacenter networks are complex (e.g., hierarchical fat-tree topologies, multipath communication), enabling in-network computation inside a datacenter is challenging. In this talk, we present IncBricks, a hardware-software co- designed system that allows doing in-network computations on a programmable networking middlebox. As a Memcached accelerator, our prototype lowers GET latency by over 25% and doubles throughput for 64 byte values in a common cluster configuration. When doing computation on cached values, IncBricks provides 3 times more throughput and a third of the latency of client-side computation.
- 4:15-4:35: High Performance Packet Processing with FlexNIC, Antoine Kaufmann
The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing subsystems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current commodity network interface cards (NICs). We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. FlexNIC allows services to install packet processing rules into the NIC, which then executes simple operations on packets while exchanging them with host memory. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC, where it can be done flexibly and at high speed. We quantify the potential benefits of FlexNIC by emulating the proposed FlexNIC functionality with existing hardware or in software. We show that significant gains in application performance are possible, in terms of both latency and throughput, for several widely used applications, including a key-value store, a stream processing system, and an intrusion detection system.
- 4:35-4:55: In-camera processing hardware for computer vision and virtual reality, Amrita Mazumdar
Cameras are becoming the universal sensor, supporting sophisticated vision and machine-learning algorithms and critical for many modern applications. Camera systems are increasingly mobile and latency-driven, expected to deliver live virtual reality video or perform complex image classification without draining battery life. In this talk, we discuss two camera system designs that push the extremes of energy and performance scaling, and explore the tradeoffs between the computation and communication. One system is designed to detect and authenticate faces, operating solely on energy harvested from RFID readers. The other processes video from a 16-camera rig to produce real-time 3D-360 virtual reality video. We evaluate a wide range of hardware acceleration strategies, from GPUs to FPGAs and ASIC designs, and consider the space of design points both inside the camera node and in the cloud. Exploring when it is most efficient to process in-camera or offload computation allows us to design camera systems that achieve 100x lower-power face authentication and 10x faster virtual reality content generation than conventional architectures.
- 3:50-3:55: Introduction and Overview, Shwetak Patel
- 3:55-4:15: HemaApp: Noninvasive Blood Screening of Hemoglobin using Smartphone Cameras, Edward Wang
HemaApp is a smartphone application that noninvasively monitors blood hemoglobin concentration using the smartphone’s camera and various lighting sources. Hemoglobin measurement is a standard clinical tool commonly used for screening anemia and assessing a patient’s response to iron supplement treatments. Given a light source shining through a patient’s finger, we perform a chromatic analysis, analyzing the color of their blood to estimate hemoglobin level. We evaluate HemaApp on 31 patients ranging from 6 – 77 years of age, yielding a 0.82 rank order correlation with the gold standard blood test. In screening for anemia, HemaApp achieve a sensitivity and precision of 85.7% and 76.5%. Both the regression and classification performance compares favorably with our control, an FDA-approved noninvasive hemoglobin measurement device. We also evaluate and discuss the effect of using different kinds of lighting sources.
- 4:15-4:35: Ocular Symptom Detection using Smartphones, Alex Mariakakis
Medical specialists often make clinical judgments through observation. Those judgments can be coarse-grained or highly qualitative. Furthermore, there can be disagreements between multiple specialists observing the exact same patient. By applying computer vision and machine learning to videos captured from a smartphone camera, we can automate and systematize such observations for specialists and non-experts without the need to purchase specialized equipment. This talk will highlight two ongoing projects that screen for different conditions of the eye with minimal instrumentation. The first project estimates jaundice in the sclera and can be used as a screening tool for conditions like pancreatic cancer. The second project measures the pupil's response to a light stimulus, a test that is used as a proxy for assessing significant head trauma.
- 4:35-4:45: EyeContact: Scleral Coil Eye Tracking for Virtual Reality, Eric Whitmire
Eye tracking is a technology of growing importance for mobile and wearable systems, particularly for newly emerging virtual and augmented reality applications (VR and AR). Current eye tracking solutions for wearable AR and VR headsets rely on optical tracking and achieve a typical accuracy of 0.5° to 1°. We investigate a high temporal and spatial resolution eye tracking system based on magnetic tracking using scleral search coils. This technique has historically relied on large generator coils several meters in diameter or requires a restraint for the user’s head. We propose a wearable scleral search coil tracking system that allows the user to walk around, and eliminates the need for a head restraint or room-sized coils. Our technique involves a unique placement of generator coils as well as a new calibration approach that accounts for the less uniform magnetic field created by the smaller coils. Using this technique, we can estimate the orientation of the eye with a mean calibrated accuracy of 0.094°.