Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in human conversations, particularly turn-taking dynamics, which uniquely characterize speakers engaged in conversation and distinguish them from interfering speakers and noise.
Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time.
Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award.
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues.
Using wind to disperse microfliers that fall like seeds and leaves can help automate large-scale sensor deployments. Here, we present battery-free microfliers that can change shape in mid-air to vary their dispersal distance. We design origami microfliers using bi-stable leaf-out structures and uncover an important property: a simple change in the shape of these origami structures causes two dramatically different falling behaviors. When unfolded and flat, the microfliers exhibit a tumbling behavior that increases lateral displacement in the wind.
Imagine being in a crowded room with a cacophony of speakers and having the ability to focus on or remove speech from a specific 2D region. This would require understanding and manipulating an acoustic scene, isolating each speaker, and associating a 2D spatial context with each constituent speech. However, separating speech from a large number of concurrent speakers in a room into individual streams and identifying their precise 2D locations is challenging, even for the human brain.
The emergence of water-proof mobile and wearable devices (e.g., Garmin Descent and Apple Watch Ultra) designed for underwater activities like professional scuba diving, opens up opportunities for underwater networking and localization capabilities on these devices. Here, we present the first underwater acoustic positioning system for smart devices. Unlike conventional systems that use floating buoys as anchors at known locations, we design a system where a dive leader can compute the relative positions of all other divers, without any external infrastructure.
We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner, while also benefiting from the performance transformer based architectures provide.
The World Health Organization estimates that the vast majority of people affected by hearing loss are in low and middle income countries. Hearing loss is particularly harmful for neuro-development if it is left undetected in early childhood. As a result, it is common practice in high-income countries (e.g., USA) to adopt guidelines for universal infant hearing screening and require newborn hearing tests for all babies born in the hospital.
Since its inception, underwater digital acoustic communication has required custom hardware that neither has the economies of scale nor is pervasive. We present the first acoustic system that brings underwater messaging capabilities to existing mobile devices like smartphones and smart watches. Our software-only solution leverages audio sensors, i.e., microphones and speakers, ubiquitous in today's devices to enable acoustic underwater communication between mobile devices.
We present ClearBuds, a state-of-the-art hardware and software system for real-time speech enhancement. Our neural network runs completely on an iphone, allowing you to supress unwanted noises while taking phone calls on the go. ClearBuds bridges state-of-the-art deep learning for blind audio source separation and in-ear mobile systems by making two key technical contributions: 1) a new wireless earbud design capable of operating as a synchronized, binaural microphone array, and 2) a lightweight dual-channel speech enhancement neural network that runs on a mobile device.
Middle ear disorders are one of the most common causes of preventable hearing loss. Unfortunately, while the developing world bears a disproportionate burden of these disorders, it often lacks access to diagnostic tools.
Plants cover a large fraction of the Earth’s land mass despite most species having limited to no mobility. Many plants have evolved mechanisms to disperse their seeds using the wind. A dandelion seed, for example, can travel as far as a kilometer in dry, windy, and warm conditions. Inspired by this, we demonstrate wind dispersal of battery-free wireless sensing devices. Our millimeter-scale devices are designed on a flexible substrate using programmable, off-the-shelf parts to enable scalability and flexibility for various sensing and computing applications.
We present the first system to determine fluid properties using the LiDAR sensors present on modern smartphones. Traditional methods of measuring properties like viscosity require expensive laboratory equipment or a relatively large amount of fluid. In contrast, our smartphone-based method is accessible, contactless and works with just a single drop of liquid. Our design works by targeting a coherent LiDAR beam from the phone onto the liquid.
Frequent blood clot testing is critical for millions of people on lifelong anticoagulation with warfarin. Currently, testing is performed in hospital laboratories or with expensive point-of-care devices limiting the ability to test frequently and affordably. We report a proof-of-concept blood clot testing system that uses the vibration motor and camera on smartphones to track micro-mechanical movements of a copper particle.
Cardiology as a field has seen phenomenal technological advances over the last few decades. A number of tools in cardiology however require sensors and/or electrodes on the human body to capture the cardiac signals and diagnose patients. In this project, we aim to create medical tools that use smartphones and smart speakers to contactlessly detect cardiac conditions. We present a proof-of-concept system for acquiring individual heart beats using smart speakers in a fully contact-free manner.
On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements. While neural networks can achieve significantly better performance than traditional beamformers, all existing models fall short of supporting low-latency causal inference on computationally-constrained wearables. We present DeepBeam, a hybrid model that combines traditional beamformers with a custom lightweight neural network.
Existing gesture-recognition systems consume significant power and computational resources that limit how they may be used in low-end devices. We introduce AllSee, the first gesture-recognition system that can operate on a range of computing devices including those with no batteries. AllSee consumes three to four orders of magnitude lower power than state-of-the-art systems and can enable always-on gesture recognition for smartphones and tablets.
As computing devices become smaller and more numerous, powering them becomes more difficult; wires are often not feasible, and batteries add weight, bulk, cost, and require recharging/replacement that is impractical at large scales. Ambient backscatter communication solves this problem by leveraging existing TV and cellular transmissions, rather than generating their own radio waves. This novel technique enables ubiquitous communication where devices can communicate among themselves at unprecedented scales and in locations that were previously inaccessible.
WiSee is a novel interaction interface that leverages ongoing wireless transmissions in the environment (e.g., WiFi) to enable whole-home sensing and recognition of human gestures. Since wireless signals do not require line-of-sight and can traverse through walls, WiSee can enable whole-home gesture recognition using few wireless sources (e.g., a Wi-Fi router and a few mobile devices in the living room).
This project presents 802.11n+, the first fully distributed random access protocol that allows nodes to contend not just for time, but also the concurrent transmissions supported by multiple antennae. In n+, even when the medium is occupied, nodes with more antennae can transmit concurrently without harming ongoing transmissions. Furthermore, such nodes can contend for the medium in a fully distributed way. Our testbed evaluation shows that even for a small network with three competing node pairs, the resulting system about doubles the average network throughput.
This project presents tamper-evident pairing, the first wireless pairing protocol that works in-band, with no pre-shared keys, and protects against MITM attacks. The main innovation is a new key exchange message constructed in a manner that ensures an adversary can neither hide the fact that a message was transmitted, nor alter its payload without being detected. Thus, any attempt by an adversary to interfere with the key exchange translates into the pairing devices detecting either invalid pairing messages or an unacceptable increase in the number of such messages.
Wireless communication has become an intrinsic part of modern implantable medical devices (IMDs). Recent work, however, has demonstrated that wireless connectivity can be exploited to compromise the confidentiality of IMDs’ transmitted data or to send unauthorized commands to IMDs—even commands that cause the device to deliver an electric shock to the patient. The key challenge in addressing these attacks stems from the difficulty of modifying or replacing already-implanted IMDs.