Research Projects | Paul G. Allen School of Computer Science & Engineering

Contact

Shyam Gollakota

CSE 568

gshyam

cs.washington.edu

Computer Systems & Networking, Machine Learning, Robotics, Wireless & Sensor Systems

Areas of interest:

Computational health, AI for sound, networks, bio-robotics, wireless, mobile and ubiquitous computing, sensing, security and privacy

Target conversation extraction

Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in human conversations, particularly turn-taking dynamics, which uniquely characterize speakers engaged in conversation and distinguish them from interfering speakers and noise.

Knowledge boosting: Model collaboration across devices for low-latency applications

Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time.

Target speech hearing

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award.

Semantic hearing: Programming acoustic scenes in real-time using hearables

Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues.

Battery-free origami microfliers

Using wind to disperse microfliers that fall like seeds and leaves can help automate large-scale sensor deployments. Here, we present battery-free microfliers that can change shape in mid-air to vary their dispersal distance. We design origami microfliers using bi-stable leaf-out structures and uncover an important property: a simple change in the shape of these origami structures causes two dramatically different falling behaviors. When unfolded and flat, the microfliers exhibit a tumbling behavior that increases lateral displacement in the wind.

Speech separation and 2D localization using distributed microphone arrays

Imagine being in a crowded room with a cacophony of speakers and having the ability to focus on or remove speech from a specific 2D region. This would require understanding and manipulating an acoustic scene, isolating each speaker, and associating a 2D spatial context with each constituent speech. However, separating speech from a large number of concurrent speakers in a room into individual streams and identifying their precise 2D locations is challenging, even for the human brain.

Bringing underwater GPS to smart devices

The emergence of water-proof mobile and wearable devices (e.g., Garmin Descent and Apple Watch Ultra) designed for underwater activities like professional scuba diving, opens up opportunities for underwater networking and localization capabilities on these devices. Here, we present the first underwater acoustic positioning system for smart devices. Unlike conventional systems that use floating buoys as anchors at known locations, we design a system where a dive leader can compute the relative positions of all other divers, without any external infrastructure.

Waveformer: Real-time target sound extraction

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner, while also benefiting from the performance transformer based architectures provide.

New-born hearing screening using smartphones

The World Health Organization estimates that the vast majority of people affected by hearing loss are in low and middle income countries. Hearing loss is particularly harmful for neuro-development if it is left undetected in early childhood. As a result, it is common practice in high-income countries (e.g., USA) to adopt guidelines for universal infant hearing screening and require newborn hearing tests for all babies born in the hospital.

How to send underwater messages using smartphones

Since its inception, underwater digital acoustic communication has required custom hardware that neither has the economies of scale nor is pervasive. We present the first acoustic system that brings underwater messaging capabilities to existing mobile devices like smartphones and smart watches. Our software-only solution leverages audio sensors, i.e., microphones and speakers, ubiquitous in today's devices to enable acoustic underwater communication between mobile devices.

AI for earbuds: Wireless binaural earbuds for speech enhancement

We present ClearBuds, a state-of-the-art hardware and software system for real-time speech enhancement. Our neural network runs completely on an iphone, allowing you to supress unwanted noises while taking phone calls on the go. ClearBuds bridges state-of-the-art deep learning for blind audio source separation and in-ear mobile systems by making two key technical contributions: 1) a new wireless earbud design capable of operating as a synchronized, binaural microphone array, and 2) a lightweight dual-channel speech enhancement neural network that runs on a mobile device.

Middle ear disease diagnosis using earables

Middle ear disorders are one of the most common causes of preventable hearing loss. Unfortunately, while the developing world bears a disproportionate burden of these disorders, it often lacks access to diagnostic tools.

Wind dispersal of battery-free devices for planetary-scale environmental sensing

Plants cover a large fraction of the Earth’s land mass despite most species having limited to no mobility. Many plants have evolved mechanisms to disperse their seeds using the wind. A dandelion seed, for example, can travel as far as a kilometer in dry, windy, and warm conditions. Inspired by this, we demonstrate wind dispersal of battery-free wireless sensing devices. Our millimeter-scale devices are designed on a flexible substrate using programmable, off-the-shelf parts to enable scalability and flexibility for various sensing and computing applications.

Laser speckle using smartphone LiDAR

We present the first system to determine fluid properties using the LiDAR sensors present on modern smartphones. Traditional methods of measuring properties like viscosity require expensive laboratory equipment or a relatively large amount of fluid. In contrast, our smartphone-based method is accessible, contactless and works with just a single drop of liquid. Our design works by targeting a coherent LiDAR beam from the phone onto the liquid.

Blood coagulation testing using smartphones

Frequent blood clot testing is critical for millions of people on lifelong anticoagulation with warfarin. Currently, testing is performed in hospital laboratories or with expensive point-of-care devices limiting the ability to test frequently and affordably. We report a proof-of-concept blood clot testing system that uses the vibration motor and camera on smartphones to track micro-mechanical movements of a copper particle.

AI for contactless cardiology

Cardiology as a field has seen phenomenal technological advances over the last few decades. A number of tools in cardiology however require sensors and/or electrodes on the human body to capture the cardiac signals and diagnose patients. In this project, we aim to create medical tools that use smartphones and smart speakers to contactlessly detect cardiac conditions. We present a proof-of-concept system for acquiring individual heart beats using smart speakers in a fully contact-free manner.

Deep learning for directional hearing

On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements. While neural networks can achieve significantly better performance than traditional beamformers, all existing models fall short of supporting low-latency causal inference on computationally-constrained wearables. We present DeepBeam, a hybrid model that combines traditional beamformers with a custom lightweight neural network.

Battery-free gesture recognition

Existing gesture-recognition systems consume significant power and computational resources that limit how they may be used in low-end devices. We introduce AllSee, the first gesture-recognition system that can operate on a range of computing devices including those with no batteries. AllSee consumes three to four orders of magnitude lower power than state-of-the-art systems and can enable always-on gesture recognition for smartphones and tablets.

Ambient Backscatter

As computing devices become smaller and more numerous, powering them becomes more difficult; wires are often not feasible, and batteries add weight, bulk, cost, and require recharging/replacement that is impractical at large scales. Ambient backscatter communication solves this problem by leveraging existing TV and cellular transmissions, rather than generating their own radio waves. This novel technique enables ubiquitous communication where devices can communicate among themselves at unprecedented scales and in locations that were previously inaccessible.

Wi-Fi gesture recognition

WiSee is a novel interaction interface that leverages ongoing wireless transmissions in the environment (e.g., WiFi) to enable whole-home sensing and recognition of human gestures. Since wireless signals do not require line-of-sight and can traverse through walls, WiSee can enable whole-home gesture recognition using few wireless sources (e.g., a Wi-Fi router and a few mobile devices in the living room).

Random Access MIMO Networks

This project presents 802.11n+, the first fully distributed random access protocol that allows nodes to contend not just for time, but also the concurrent transmissions supported by multiple antennae. In n+, even when the medium is occupied, nodes with more antennae can transmit concurrently without harming ongoing transmissions. Furthermore, such nodes can contend for the medium in a fully distributed way. Our testbed evaluation shows that even for a small network with three competing node pairs, the resulting system about doubles the average network throughput.

Password-Free Wireless Security

This project presents tamper-evident pairing, the first wireless pairing protocol that works in-band, with no pre-shared keys, and protects against MITM attacks. The main innovation is a new key exchange message constructed in a manner that ensures an adversary can neither hide the fact that a message was transmitted, nor alter its payload without being detected. Thus, any attempt by an adversary to interfere with the key exchange translates into the pairing devices detecting either invalid pairing messages or an unacceptable increase in the number of such messages.

Securing Medical Implants

Wireless communication has become an intrinsic part of modern implantable medical devices (IMDs). Recent work, however, has demonstrated that wireless connectivity can be exploited to compromise the confidentiality of IMDs’ transmitted data or to send unauthorized commands to IMDs—even commands that cause the device to deliver an electric shock to the patient. The key challenge in addressing these attacks stems from the difficulty of modifying or replacing already-implanted IMDs.