Research Showcase Agenda
Tuesday, November 14, 2023
|10:00 - 10:30am
|Registration and coffee
Singh Gallery (4th floor Gates Center)
|10:30 - 11:10am
|Welcome and Overview by Magda Balazinska and Shwetak Patel + various faculty on research areas
Zillow Commons (4th floor Gates Center)
11:15am - 12:20pm
|ABET feedback session
Gates Center, Room 371
|Edge and Mobile Intelligence
Gates Center, Zillow Commons
|12:25 - 1:25pm
|Lunch + Keynote Talk:
Open Language Model (OLMo): The science of language models and language models for science,
Hanna Hajishirzi, Paul G. Allen School of Computer Science & Engineering
Microsoft Atrium in the Allen Center
1:30 - 2:35pm
Gates Center, Room 271
Gates Center, Room 371
Gates Center, Zillow Commons
2:40 - 3:45pm
|Natural Language Processing
Gates Center, Room 271
|Tools for Intelligent Transportation
Gates Center, Room 371
|Computing for Health
Gates Center, Zillow Commons
3:50 - 4:55pm
|Systems and Networking
Gates Center, Room 271
|Graphics and Audio
Gates Center, Room 371
|Computing for Sustainability
Gates Center, Zillow Commons
|5:00 - 7:00pm
|Open House: Reception + Poster Session
Microsoft Atrium in the Allen Center
|7:15 - 7:30pm
|Program: Madrona Prize, People's Choice Awards
Microsoft Atrium in the Allen Center
- 11:15-11:20: Introduction and Overview, Crystal Eney
- 11:20-12:20: Discussion
Join our ABET Faculty coordinator and Vice Director of the Allen School along with our Director of Student Services and Program Operations Specialist working on our accreditation to provide feedback on how students from the Allen School are contributing to industry. Come learn about what's new in CSE education and what ideas folks have for any growth areas where our students seem to be consistently struggling.
- 11:15-11:20: Introduction and Overview, Malek Itani
- 11:20-11:35: Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables, Malek Itani
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output.
- 11:35-11:50: NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras, Bandhav Veluri
We present NeuriCam, a novel deep learning-based system to achieve video capture from low-power dual-mode IoT camera systems. Our idea is to design a dual-mode camera system where the first mode is low power (1.1 mW) but only outputs grey-scale, low resolution and noisy video and the second mode consumes much higher power (100 mW) but outputs color and higher resolution images. To reduce total energy consumption, we heavily duty cycle the high power mode to output an image only once every second. The data for this camera system is then wirelessly sent to a nearby plugged-in gateway, where we run our real-time neural network decoder to reconstruct a higher-resolution color video. To achieve this, we introduce an attention feature filter mechanism that assigns different weights to different features, based on the correlation between the feature map and the contents of the input frame at each spatial location. We design a wireless hardware prototype using off-the-shelf cameras and address practical issues including packet loss and perspective mismatch. Our evaluations show that our dual-camera approach reduces energy consumption by 7x compared to existing systems. Further, our model achieves an average greyscale PSNR gain of 3.7 dB over prior single and dual-camera video super-resolution methods and 5.6 dB RGB gain over prior color propagation methods.
- 11:50-12:05: ClearBuds: wireless binaural earbuds for learning-based speech enhancement, Maruchi Kim
We present ClearBuds, the first hardware and software system that utilizes a neural network to enhance speech streamed from two wireless earbuds. Real-time speech enhancement for wireless earbuds requires high-quality sound separation and background cancellation, operating in real-time and on a mobile phone. Clear-Buds bridges state-of-the-art deep learning for blind audio source separation and in-ear mobile systems by making two key technical contributions: 1) a new wireless earbud design capable of operating as a synchronized, binaural microphone array, and 2) a lightweight dual-channel speech enhancement neural network that runs on a mobile device. Our neural network has a novel cascaded architecture that combines a time-domain conventional neural network with a spectrogram-based frequency masking neural network to reduce the artifacts in the audio output. Results show that our wireless earbuds achieve a synchronization error less than 64 μs and our network has a runtime of 21.4 ms on an accompanying mobile phone. In-the-wild evaluation with eight users in previously unseen indoor and outdoor multipath scenarios demonstrates that our neural network generalizes to learn both spatial and acoustic cues to perform noise suppression and background speech removal. In a user-study with 37 participants who spent over 15.4 hours rating 1041 audio samples collected in-the-wild, our system achieves improved mean opinion score and background noise suppression.
- 12:05-12:20: Creating speech zones with self-distributing acoustic swarms, Tuochao Chen
Imagine being in a crowded room with a cacophony of speakers and having the ability to focus on or remove speech from a specific 2D region. This would require understanding and manipulating an acoustic scene, isolating each speaker, and associating a 2D spatial context with each constituent speech. However, separating speech from a large number of concurrent speakers in a room into individual streams and identifying their precise 2D locations is challenging, even for the human brain. Here, we present the first acoustic swarm that demonstrates cooperative navigation with centimeter-resolution using sound, eliminating the need for cameras or external infrastructure. Our acoustic swarm forms a self-distributing wireless microphone array, which, along with our attention-based neural network framework, lets us separate and localize concurrent human speakers in the 2D space, enabling speech zones. Our evaluations showed that the acoustic swarm could localize and separate 3-5 concurrent speech sources in real-world unseen reverberant environments with median and 90-percentile 2D errors of 15 cm and 50 cm, respectively. Our system enables applications like mute zones (parts of the room where sounds are muted), active zones (regions where sounds are captured), multi-conversation separation and location-aware interaction.
- 1:30-1:35: Introduction and Overview, Katharina Reinecke
- 1:35-1:47: What Makes Online Communities 'Better'? Characterizing Community Member Values on Reddit, Galen Weld
Making online social communities ‘better’ is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another. However, community values are challenging to measure as they are rarely explicitly stated. In this work, we measure community values through two large-scale surveys of community values. Through qualitative and quantitative analyses of survey responses and a quantitative analysis of publicly available reddit data, we characterize the values that are important to community members, and how these values vary within and across communities. We show that communities are deeply varied in their values, and that there is no "one size fits all" solution to improving communities. We show that community members disagree about how safe their communities are, and that community moderators want their communities to be 56.7% less democratic than non-moderator community members. These findings have important implications, including suggesting that care must be taken to protect vulnerable community members, and that participatory governance strategies may be difficult to implement. Accurate and scalable modeling of community values enables research and governance which is tuned to each community's different values. We make our taxonomy and data public to inform community design and governance
- 1:47-1:59: Building Tools for Social Alignment of AI, Quan Ze (Jim) Chen
Researchers and practitioners are increasingly building and deploying large scale AI-backed systems, like large language models and content moderation classifiers, that are need to make decisions on socially defined concepts like whether it's ethical to give assistance on a user's query or whether some content is inappropriate for a community. To do this, we also increasingly rely on collecting, training, and testing against judgments collected from individuals and groups of people on these concepts. However, the ways we ask human decision-makers for input, and the data representations we allow them to answer through are often dated, having traditionally been built under assumptions that individuals are certain about their decisions, and that it is possible to arrive at a universally shared "correct" answer. In this talk, I will talk about two lines of my work in building tools to pave the way for social alignment of AI: (1) creating new representations of answers through uncertainty-aware annotation tooling; followed by (2) constructing and applying"case repositories" as a new way to ask for preferences around orders that also simulates open ended answers. I conclude with a discussion on how new tooling around judgments will be increasingly important for producing data for the future of AI alignment involving socially defined concepts.
- 1:59-2:11: Breaking Barriers with AI: Addressing Linguistic Disparities for AAVE Speakers, Jeffrey Basoah
Large language models, what powers natural language processors, are increasingly under scrutiny for potentially exhibiting linguistic preferences that align with specific groups, contributing to group disparities and fairness issues in technology (Blodgett et al., 2016; Deas et al., 2023; Groenwold et al., 2020). Users of African-American Vernacular English (AAVE) already encounter discrimination in various aspects of their daily lives, such as employment, housing, healthcare, the legal system, and even classrooms (Mengesha et al., 2021). This paper explores the discourse on fairness and equity in technology, with a particular focus on AI and its impact on marginalized communities. Members of marginalized communities possess unique lived experiences that are rarely reflected in their interactions with AI, raising questions about who establishes societal standards and the often inadequate consideration of diverse cultural backgrounds. Our study addresses disparities in AI performance, specifically with AI-supported writing technologies (ASWT), for AAVE speakers, stemming from historical underrepresentation. Our study investigates the psychological and experiential effects of linguistic disparities in ASWT for these users. Through a multi-faceted approach, we aim to understand user perceptions, expectations, and experiences. Our findings reveal significant psychological and experiential impact of these disparities, highlighting the need for deeper engagement between technologists and affected communities to prevent systematic discrimination.
- 2:11-2:23: From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models, Shangbin Feng
Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.
- 2:22-2:35: Gendered Mental Health Stigma in Masked Language Models, Inna Wanyin Lin
Mental health stigma prevents many individuals from receiving the appropriate care, and social psychology studies have shown that mental health tends to be overlooked in men. In this talk, I will talk about our work where we investigate gendered mental health stigma in masked language models. In doing so, we operationalize mental health stigma by developing a framework grounded in psychology research: we use clinical psychology literature to curate prompts, then evaluate the models' propensity to generate gendered words. We find that masked language models capture societal stigma about gender in mental health: models are consistently more likely to predict female subjects than male in sentences about having a mental health condition (32% vs. 19%), and this disparity is exacerbated for sentences that indicate treatment-seeking behavior. Furthermore, we find that different models capture dimensions of stigma differently for men and women, associating stereotypes like anger, blame, and pity more with women with mental health conditions than with men. In showing the complex nuances of models' gendered mental health stigma, we demonstrate that context and overlapping dimensions of identity are important considerations when assessing computational models' social biases.
- 1:30-1:35: Introduction and Overview, Jon Froehlich
- 1:35-1:50: Notably Inaccessible — Data Driven Understanding of Data Science Notebook (In)Accessibility, Venkatesh Potluri
Computational notebooks, tools that facilitate storytelling through exploration, data analysis, and information visualization, have become the widely accepted standard in the data science community across academic and industry settings. While there is extensive research to learn how data scientists use computational notebooks, identify their pain points, and enable collaborative data science practices, very little is known about the various accessibility barriers experienced by blind and visually impaired (BVI) users using these notebooks. In this talk, I will present findings from our large scale systematic analysis of 100000 Jupyter notebooks to identify various accessibility challenges in published notebooks. Accessibility barriers are caused by the tools used, infrastructures available, and authoring practices that are followed to create and share these notebooks. I will discuss recommendations to improve accessibility of the data artifacts of a notebook, suggest authoring practices, and propose changes to infrastructure to make notebooks accessible.
- 1:50-2:05: Jod: Examining the Design and Implementation of a Videoconferencing Platform for Mixed Hearing Groups, Anant Mittal
Videoconferencing usage has surged in recent years, but current platforms present significant accessibility barriers for the 430 million d/Deaf or hard of hearing people worldwide. Informed by prior work examining accessibility barriers in current videoconferencing platforms, we designed and developed Jod, a videoconferencing platform to facilitate communication in mixed hearing groups. Key features include support for customizing visual layouts and a notification system to request attention and influence behavior. Using Jod, we conducted six mixed hearing group sessions with 34 participants, including 18 d/Deaf or hard of hearing participants, 10 hearing participants, and 6 sign language interpreters. We found participants engaged in visual layout rearrangements based on their hearing ability and dynamically adapted to the changing group communication context, and that notifications were useful but raised a need for designs to cause fewer interruptions. We further provide insights for future videoconferencing designs.
- 2:05-2:20: An Autoethnographic Case Study of Generative Artificial Intelligence's Utility for Accessibility, Kate Glazko
With the recent rapid rise in Generative Artificial Intelligence (GAI) tools, it is imperative that we understand their impact on people with disabilities, both positive and negative. However, although we know that AI in general poses both risks and opportunities for people with disabilities, little is known specifically about GAI in particular. To address this, we conducted a three-month autoethnography of our use of GAI to meet personal and professional needs as a team of researchers with and without disabilities. Our findings demonstrate a wide variety of potential accessibility-related uses for GAI while also highlighting concerns around verifiability, training data, ableism, and false promises.
- 2:20-2:35: Principles for Designing Nonvisual Trip Planning: The AccessMap Multimodal NonVisual Interface, Kunal Mehta
In the realm of applications for navigation and trip planning, there is widespread availability of wayfinding and prospective trip exploration interfaces. However, there exists a notable gap in work concerning the design of these interfaces for users relying on screen readers and other assistive technologies within this specific context. To bridge this void, we propose a design space that caters to the nonvisual aspects of interaction design in the domain of navigation and trip planning applications. This design space delineates four principles, each aimed at exploring challenges and opportunities unique to this interaction design area. Our purpose is to offer guidance for the creation of innovative interaction methods, providing both a framework and pluggable React library for researchers and practitioners engaged in the development of nonvisual features for navigation and trip planning applications. In this work, we demonstrate the use of these interaction design methods in a web application called AccessMap Multimodal.
- 1:30-1:35: Introduction and Overview, Maya Cakmak
- 1:35-1:47: Towards General Single-Utensil Food Acquisition with Human-Informed Actions, Ethan Gordon
Food acquisition with common general-purpose utensils is a necessary component of robot applications like in-home assistive feeding. Learning acquisition policies in this space is difficult in part because any model will need to contend with extensive state and actions spaces. Food is extremely diverse and generally difficult to simulate, and acquisition actions like skewers, scoops, wiggles, and twirls can be parameterized in myriad ways. However, food’s visual diversity can belie a degree of physical homogeneity, and many foods allow flexibility in how they are acquired. Our key insight is that a potent subset of actions can be sufficient to acquire a wide variety of food items. In this talk, we present a methodology for identifying such a subset from limited human trajectory data. We first develop a continuous action space of robot acquisition trajectories that capture the variety of human food acquisition techniques. By mapping human trajectories into this space and using clustering for spatially-diverse sampling, we construct a discrete set of 11 actions. We demonstrate that this set is capable of acquiring a variety of food items with a ≥ 80% success rate, a rate that users have said is sufficient for in-home robot-assisted feeding. Furthermore, since this set is so small, we also show that we can use online learning to determine a sufficiently optimal action for a previously-unseen food item over the course of a single meal.
- 1:47-1:59: Learning to Generalize with Limited Demonstrations, Qiuyu Chen
Robot learning methods have the potential for widespread generalization across tasks, environments, and objects. However, these methods require large diverse datasets that are expensive to collect in real-world robotics settings or training on large-scale simulation environments that are provided by costly hand-design engineering efforts. For robot learning to generalize, we must be able to leverage sources of data or priors beyond the robot's own experience. Toward this goal, I present our efforts in bridging the data scarcity gap by creating novel and diverse data that helps robots generalize to unseen scenarios. We show various methods to automatically generate new scenes with minimal human effort. These generated data that display visual realism and complexity of the real world are used to bootstrap the robot policy and improve the zero-shot deployment.
- 1:59-2:11: Learning to Grasp in Clutter with Interactive Visual Failure Prediction, Michael Murray
Modern warehouses process millions of unique objects which are often stored in densely packed containers. To automate tasks in this environment, a robot must be able to pick diverse objects from highly cluttered scenes. Real-world learning is a promising approach, but executing picks in the real world is time-consuming, can induce costly failures, and often requires extensive human intervention, which causes operational burden and limits the scope of data collection and deployments. In this work, we leverage interactive probes to visually evaluate grasps in clutter without fully executing picks, a capability we refer to as Interactive Visual Failure Prediction (IVFP). This enables autonomous verification of grasps during execution to avoid costly downstream failures as well as autonomous reward assignment, providing supervision to continuously shape and improve grasping behavior as the robot gathers experience in the real world, without constantly requiring human intervention. Through experiments on a Stretch RE1 robot, we study the effect that IVFP has on performance - both in terms of effective data throughput and success rate, and show that this approach leads to grasping policies that outperform policies trained with human supervision alone, while requiring significantly less human intervention.
- 2:11-2:23: TerrainNet: Visual Modeling of Complex Terrains for High-speed, Off-road Navigation, Xiangyun Meng
Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.
- 2:23-2:35: DYNAMO-GRASP: DYNAMics-aware Optimization for GRASP Point Detection in Suction Grippers, Boling Yang
In this research, we introduce a novel approach to the challenge of suction grasp point detection. Our method, exploiting the strengths of physics-based simulation and data-driven modeling, accounts for object dynamics during the grasping process, markedly enhancing the robot’s capability to handle previously unseen objects and scenarios in real-world settings. We benchmark DYNAMO-GRASP against established approaches via comprehensive evaluations in both simulated and real-world environments. DYNAMO-GRASP delivers improved grasping performance with greater consistency in both simulated and real-world settings. Remarkably, in real-world tests with challenging scenarios, our method demonstrates a success rate improvement of up to 48% over SOTA methods. Demonstrating a strong ability to adapt to complex and unexpected object dynamics, our method offers robust generalization to real-world challenges. The results of this research set the stage for more reliable and resilient robotic manipulation in intricate real-world situations.
- 2:40-2:45: Introduction and Overview, Niloofar Mireshghallah
- 2:45-3:05: Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory, Niloofar Mireshghallah
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and we expect them to reason about what to share in their outputs, for what purpose and with whom, in a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
- 3:05-3:25: Self-reflective Language Models with Retrieval, Akari Asai
Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM’s quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.
- 3:25-3:45: QLoRA: Finetuning ChatGPT-quality Large Language Models on Your Personal Desktop Computer, Tim Dettmers
Finetuning raw large language models into chatbots is very expensive, with the largest Llama 2 model requiring 840 GB of GPU memory or 17 GPUs to be fine-tuned. In this talk, I present QLoRA, which compresses the large language model (LLM) to 4-bit before finetuning. QLoRA maintains 16-bit chatbot quality while only requiring a single GPU for finetuning. This enables chatbots that rival ChatGPT and only require a $3,000 desktop computer for development. The talk will be self-contained, providing the basics to understand the QLoRA approach. In the talk, I will highlight which factors are important to create high-quality chatbots.
- 2:40-2:45: Introduction and Overview, Anat Caspi
- 2:45-2:57: Tools to support Accessibility-first Intelligent Transportation Agenda, Anat Caspi
Let's discuss some fundamental gaps in equitable and accessible intelligent transportation systems. Standardized Transportation data production and consumption, along with data tooling ecosystems to support producers and consumers, can improve outcomes for all travelers. A comprehensive Accessibility-first intelligent transportation agenda can help data producers address common problems: Tooling, quality assurance and best practices to collect, vet, maintain and disseminate Multimodal Accessibility-forward Transportation Data at scale.
- 2:57-3:09: The Transportation Data Equity Initiative- an open data sharing platform for resource-constrained transportation data streams, Suresh Devalapalli
The TDEI (transportation data equity initiative) is a project conducted at the Paul G. Allen school, sponsored by the ITS4US Program, US Department of Transportation. We improve equitable access in travel services, travel environments and in transportation data. We build open-source data collection and vetting tools, transportation data digital infrastructure, and governance frameworks that enable public-private transportation data sharing and interoperability.
- 3:09-3:21: AccessMap Multimodal, Kunal Mehta / Wisam Yasen
Trip planners have made significant improvements in the lives of many travelers, though neglecting to take the diverse needs of pedestrians and public transit users into account. AccessMap Multimodal integrates detailed transit, pedestrian and transit station path data to create a seamless travel experience for travelers including those with mobility constraints and vision disabilities.
- 3:21-3:33: OpenSidewalks, Ricky Zhang
Many transportation decisions can be improved on the basis of accurate pedestrian network data, including sidewalk/crossing interactions, and connectivity with the road networks of other modes of travel. A connected pedestrian path network is vital to transportation activities, as sidewalks and crossings connect pedestrians to other modes of transportation. However, information about these paths' location and connectivity is often missing or inaccurate in city planning systems and wayfinding applications, causing severe information gaps and errors for planners and pedestrians. Our work introduces machine learning methods to perform this collection at scale creating open, shareable pedestrian graph data. We demonstrate the efficacy of the method in collecting data in 6 U.S. Counties, and downstream implications of this work.
- 3:33-3:45: “I never realized sidewalks were a big deal”: A Case Study of a Community-Driven Sidewalk Accessibility Assessment using Project Sidewalk, Chu Li
In this project, we examine opportunities for community-driven digital civics for sidewalk accessibility assessment through a deployment study of a crowdsourcing tool called Project Sidewalk in Oradell, New Jersey. We explore Project Sidewalk’s potential as a platform for civic learning and service, specifically assessing whether it could be an effective tool for youth to learn about urban accessibility, disability, and human mobility. As part of this study, we designed introductory materials, facilitated hybrid mapathons, and conducted post-study interviews with Scout members, educators, healthcare professionals, and other community members to gather insights about the project and learning experiences. Our findings demonstrate that community-driven digital civics can support accessibility advocacy and education, raise community awareness, and drive pro-social behavioral change in participants with respect to urban accessibility.
- 2:40-2:45: Introduction and Overview, Su-In Lee
- 2:45-3:00: GigaPath: real-world pathology foundation model with long-context, multimodal learning from a billion tissue patches,Hanwen Xu
There has been rising interest in pretraining pathology foundation models from whole-slide images, but prior work tends to focus on public images with limited availability of patient information. It poses challenges given its sheer size, as a gigapixel whole-slide image may comprise 70,121 tissue patches. Prior pathology models often resort to subsampling a small portion, thus missing out global context in complex tasks. We introduce GigaPath, a pathology foundation model pretrained on diverse, multimodal real-world data from a large US Health Network. GigaPath adopts two-stage curriculum-learning combining a tile encoder learning with Dino from 1 billion tissue patches and a slide encoder learning with Masked Autoencoder from 171,189 whole slides. To assess GigaPath in real-world applications, we leverage the associated longitudinal patient records that provide disease diagnosis, tumor genomics, and survival, and construct a comprehensive multimodal pathology benchmark. We show that GigaPath can significantly increase predictive accuracy for 33 clinical tasks.
- 3:00-3:15: A deep generative model for the analysis of single-cell methylomic data,Ethan Weinberger
Single-cell DNA methyolme profiling platforms based on bisulfite sequencing techniques promise to enable the exploration of epigenomic heterogeneity at an unprecedented resolution. However, substantial noise resulting from technical limitations of these platforms can impede downstream analyses of the data. Here we present methylVI, a deep generative model that learns probabilistic representations of single-cell methylation data which explicitly account for the unique characteristics of bisulfite-sequencing-derived methylomic data. After initially validating the quality of our model's fit, we proceed to demonstrate how methylVI can facilitate common downstream analysis tasks, including integrating data collected using different sequencing platforms and producing denoised methylome profiles. Our implementation of methylVI is publicly available at https://github.com/suinleelab/methylVI.
- 3:15-3:30: Developing and Validating Beacon: A Portable Device for Self-Administering a Measure of Critical Flicker Frequency, Richard Li
Beacon is a device to enable at-home self-measurement of cognitive function by patients with cirrhosis via critical flicker frequency. We reflect on our multi-year journey taking Beacon from an initial proof-of-concept prototype to a platform supporting at-home measurement. We share our experiences and perspectives on iteratively refining a hardware-software platform through multiple clinical validation studies. Specifically, we share findings from a study with 153 patients, both validating Beacon against a gold standard device and establishing that patients considered Beacon much more usable. We then report results from an at-home study with 15 patients, showing successful patient self-measurement over a 6-week deployment and finding stability in gathered measurements. We aim to both shed light on aspects of translation that are often unreported in early publications and to encourage greater dissemination from the community of HCI researchers interested in translating their research innovations toward clinical practice.
- 3:30-3:45: Thermal Earring: Low-power Wireless Earring for Longitudinal Earlobe Temperature Sensing, Shirley Xue
We present Thermal Earring, a first-of-its-kind wireless, smart earring that enables a reliable wearable solution for continuous temperature monitoring. We develop a hardware prototype in the form factor of real earrings measuring a maximum width of 11.3 mm and a length of 31 mm, weighing 335 mg, and with a battery life of one month. We investigated the earlobe temperature’s real-world use cases by gathering data from 5 febrile patients and 20 healthy participants, and demonstrated Thermal Earring's ability in fever detection. Further, we observed in our user testing that the relative change in earlobe temperature can identify activities such as eating and exercise, as well as, stressful events such as public speaking and exams. Rather than attempting to convert earlobe temperature into core body temperature, which generally remains around 37 °C (98.6 °F) except during fever, our focus centered on exploring novel applications based on relative changes in earlobe temperature within everyday contexts.
- 3:50-3:55: Introduction and Overview, Theano Stavrinos
- 3:55-4:15: Punica: Serving multiple LoRA finetuned LLMs at the cost of one, Lequn Chen
Low-rank adaptation (LoRA) has become an important and popular method to adapt pre-trained models to specific domains. We present Punica, a system to serve multiple LoRA models in a shared GPU cluster. Punica contains a new CUDA kernel design that allows batching of GPU operations for different LoRA models. This allows a GPU to hold only a single copy of the underlying pre-trained model when serving multiple, different LoRA models, significantly enhancing GPU efficiency in terms of both memory and computation. Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster. With a fixed-sized GPU cluster, our evaluations show that Punica achieves 12x higher throughput in serving multiple LoRA models compared to state-of-the-art LLM serving systems while only adding 2ms latency per token. Punica is open source at https://github.com/punica-ai/punica
- 4:15-4:35: Respect the ORIGIN! A Best-case Evaluation of Connection Coalescing, Sudheesh Singanamalla
Connection coalescing, enabled by HTTP/2, permits a client to use an existing connection to request additional resources at the connected hostname. The potential for requests to be coalesced is hindered by the practice of domain sharding introduced by HTTP/1.1, because subresources are scattered across subdomains in an effort to improve performance with additional connections. When this happens, HTTP/2 clients invoke additional DNS queries and new connections to retrieve content that is available at the same server. ORIGIN Frames is an HTTP/2 extension standardized by the IETF in 2018 that web servers can use to give explicit indications to the client about the domains that are reachable on the connection. However, no server implementation of ORIGIN Frames exists and only one browser supports them. In this talk, I'll talk about the work behind collecting and characterizing a large dataset of Internet scans and how we use that to model connection coalescing and identify a least-effort set of certificate changes that maximize opportunities for clients to coalesce. In collaboration with a partner CDN to reissue certificates, the talk shows results from building and deploying ORIGIN frame support globally at scale, evaluating and validating our modelling with both passive and active measurement of 5000 real world domains.
- 4:35-4:55: Dissecting Overheads of Service Mesh Sidecars, Xiangfeng Zhu
Service meshes play a central role in the modern application ecosystem by providing an easy and flexible way to connect microservices of a distributed application. However, because of how they interpose on application traffic, they can substantially increase application latency and its resource consumption. We develop a tool called MeshInsight to help developers quantify the overhead of service meshes in deployment scenarios of interest and make informed trade-offs about their functionality vs. overhead. Using MeshInsight, we confirm that service meshes can have high overhead---up to 269% higher latency and up to 163% more virtual CPU cores for our benchmark applications---but the severity is intimately tied to how they are configured and the application workload. IPC (inter-process communication) and socket writes dominate when the service mesh operates as a TCP proxy, but protocol parsing dominates when it operates as an HTTP proxy. MeshInsight also enables us to study the end-to-end impact of optimizations to service meshes. We show that not all seemingly-promising optimizations lead to a notable overhead reduction in realistic settings.
- 3:50-3:55: Introduction and Overview, Ira Kemelmacher-Shlizerman
- 3:55-4:10: Animating Street View, Mengyi Shan
We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing effects. The system achieves these by reconstructing the still image street scene, simulating crowd behavior, and rendering with consistent lighting, visibility, occlusions, and shadows. We demonstrate results on a diverse range of street scenes including regular still images and panoramas.
- 4:10-4:25: HRTF Estimation in the Wild, Vivek Jayaram
Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs differ significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. In this paper we present a new method to measure a listener's personalized HRTF using only headphones and sounds in their environment.
- 4:25-4:40: TryOnDiffusion: A Tale of Two UNets, Luyang Zhu
Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.
- 4:40-4:55: DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion, Johanna Karras
In this work, we present DreamPose, a diffusion-based method for generating realistic animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation.
- 3:50-3:55: Introduction and Overview, Vikram Iyer
- 3:55-4:10: MilliMobile: An Autonomous Battery-free Wireless Microrobot, Zachary Englhardt
We present MilliMobile: a first of its kind battery-free autonomous robot capable of operating on harvested solar and RF power. We challenge the conventional assumption that motion and actuation are beyond the capabilities of battery-free devices and demonstrate completely untethered autonomous operation in realistic indoor and outdoor lighting as well as RF power delivery scenarios. We show first that through miniaturizing a robot to gram scale, we can significantly reduce the energy required to move it. Second, we develop methods to produce intermittent motion by discharging a small capacitor (47--150 μF) to move a motor in discrete steps, enabling motion from as little as 50 μW of power or less. In addition to operating on harvested power, our robot demonstrates sensor and control autonomy by seeking light using onboard photodiodes, and can transmit sensor data wirelessly to a base station over 200 m away.
- 4:10-4:25: Solar-powered shape-changing origami microfliers, Vicente Arroyos
Using wind to disperse microfliers that fall like seeds and leaves can help automate large-scale sensor deployments. Here, we present battery-free microfliers that can change shape in mid-air to vary their dispersal distance. We design origami microfliers using bi-stable leaf-out structures and uncover an important property: a simple change in the shape of these origami structures causes two dramatically different falling behaviors. When unfolded and flat, the microfliers exhibit a tumbling behavior that increases lateral displacement in the wind. When folded inward, their orientation is stabilized, resulting in a downward descent that is less influenced by wind. To electronically transition between these two shapes, we designed a low-power electromagnetic actuator that produces peak forces of up to 200 millinewtons within 25 milliseconds while powered by solar cells. We fabricated a circuit directly on the folded origami structure that includes a programmable microcontroller, Bluetooth radio, solar power harvesting circuit, a pressure sensor to estimate altitude and a temperature sensor. Outdoor evaluations show that our 414 milligram origami microfliers are able to electronically change their shape mid-air, travel up to 98 meters in a light breeze, and wirelessly transmit data via Bluetooth up to 60 meters away, using only power collected from the sun.
- 4:25-4:40: Computational Design of Dense Servers for Immersion Cooling, Milin Kodnongbua
The growing demands for computational power in cloud computing have led to a significant increase in the deployment of high-performance servers, necessitating more efficient cooling solutions, like immersion liquid cooling. Its superior heat exchange capabilities eliminates the need for bulky heatsinks and fans, paving the way for innovative server design aimed at maximizing density. In this work, we present a computational framework to explore designs of servers in three dimensional space, specifically targeting the maximization of server density within immersion cooling tanks. We show that our optimized server designs achieve a reduction in volume utilization, approximately 20% less than traditional flat server designs. This increased density not only optimizes data center floor space usage but also significantly reduces cooling costs, marking a pivotal step forward in sustainable and efficient data center management.
- 4:40-4:55: One Step toward Sustainable Computing: Recyclable Printed Circuit Board for Circular Electronics, Zhihan Zhang
Electronics are integral to modern life; however, at their end-of-life these devices produce environmentally hazardous electronic waste (e-waste). Recycling the ubiquitous printed circuit boards (PCBs) that make up a substantial mass and volume fraction of e-waste is challenging due to their use of irreversibly cured thermoset epoxies. We present a PCB formulation using transesterification vitrimers (vPCBs), and an end-to-end fabrication process compatible with standard manufacturing ecosystems. We create functional prototypes of IoT devices transmitting 2.4 GHz radio signals on vPCBs with electrical and mechanical properties meeting industry standards. Fractures and holes in vPCBs can be repaired while retaining comparable performance over more than four repair cycles. We further demonstrate non-destructive decomposition of transesterification vitrimer composites with solid inclusions and metal attachments by polymer swelling with small molecule solvents. We hypothesize that unlike traditional solvolysis recycling, swelling does not degrade the materials. Through dynamic mechanical analysis we find negligible catalyst loss, minimal changes in storage modulus, and equivalent polymer backbone composition across multiple recycling cycles. We achieve 98% polymer recovery, 100% fiber recovery, and 91% solvent recovery which we reuse to create new vPCBs without degraded performance. Our cradle-to-cradle life-cycle assessment shows substantial environmental impact reduction over conventional PCBs in 11 categories.