GRAIL: Publications

 

Fast Algorithms for L_infty Problems in Multiview Geometry

Abstract:
Many problems in multi-view geometry, when posed as minimization of the maximum reprojection error across observations, can be solved optimally in polynomial time. We show that these problems are instances of a convex-concave generalized fractional program. We survey the major solution methods for solving problems of this form and present them in a unified framework centered around a single parametric optimization problem. We propose two new algorithms and show that the algorithm proposed by Olsson et al. [21] is a special case of a classical algorithm for generalized fractional programming. The performance of all the algorithms is compared on a variety of datasets, and the algorithm proposed by Gugat [12] stands out as a clear winner. An open source MATLAB toolbox thats implements all the algorithms presented here is made available.

 
Citation:
Agarwal, S., Snavely, N., and Seitz, S. M.. Fast Algorithms for L_infty Problems in Multiview Geometry. CVPR 2008.

 
On-line documents:
Complete article (PDF, 1MB)

 

In Defense of Nearest-Neighbor Based Image Classification

Abstract:
State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric Nearest-Neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NNbased image classifiers useless.
We claim that the effectiveness of non-parametric NNbased image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate "bags-of-words," codebooks). (ii) Computation of 'Image-to-Image' distance, instead of 'Image-to-Class' distance.
We propose a trivial NN-based classifier - NBNN, (Naive-Bayes Nearest-Neighbor), which employs NNdistances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'Image-to- Class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN.
Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101,Caltech-256 and Graz-01).

 
Citation:
Boiman, O., Shechtman, E., and Irani, M. In Defense of Nearest-Neighbor Based Image Classification. CVPR 2008.

 
On-line documents:
Complete article (PDF, 1MB)

 

Summarizing Visual Data Using Bidirectional Similarity

Abstract:
We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good "visual summary" should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa.
The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.

 
Citation:
Simakov, D., Caspi, Y., Shechtman, E., and Irani, M. Summarizing Visual Data Using Bidirectional Similarity. CVPR 2008.

 
On-line documents:
Complete article (PDF, 2.5MB)

 

MySong: Automatic Accompaniment Generation for Vocal Melodies

Abstract:
We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good "visual summary" should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa.
The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.

 
Citation:
Simon, I., Morris, D., and Basu, S.. MySong: Automatic Accompaniment Generation for Vocal Melodies. CHI 2008.

 
On-line documents:
Complete article (PDF, 1MB)

 

Finding Paths through the World's Photos

Abstract:
When a scene is photographed many times by different people, the viewpoints often cluster along certain paths. These paths are largely specific to the scene being photographed, and follow interesting regions and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based rendering. Our approach takes as input a large set of community or personal photos, reconstructs camera viewpoints, and automatically computes orbits, panoramas, canonical views, and optimal paths between views. The scene can then be interactively browsed in 3D using these controls or with six degree-of-freedom free-viewpoint control. As the user browses the scene, nearby views are continuously selected and transformed, using control-adaptive reprojection techniques.

 
Citation:
Snavely, N., Garg, R., Seitz, S. M., and Szeliski, R. Finding Paths through the World's Photos. ACM Transactions on Graphics 27(3), August 2008.

 
On-line documents:
Complete article (PDF, 12MB)

 

Modeling the World from Internet Photo Collections

Abstract:
There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like "Notre Dame" or "Trevi Fountain." This approach, which we call Photo Tourism, has enabled reconstructions of numerous well-known world sites. This paper presents these algorithms and results as a first step towards 3D modeling of the world's well-photographed sites, cities, and landscapes from Internet imagery, and discusses key open problems and challenges for the research community.

 
Citation:
Snavely, N., Seitz, S. M., and Szeliski, R. Modeling the World from Internet Photo Collections. Accepted to IJCV, 2008.

 
On-line documents:
Complete article (PDF, 2MB)

 

Skeletal Graphs for Efficient Structure from Motion

Abstract:
We address the problem of efficient structure from motion for large, unordered, highly redundant, and irregularly sampled photo collections, such as those found on Internet photo-sharing sites. Our approach computes a small skeletal subset of images, reconstructs the skeletal set, and adds the remaining images using pose estimation. Our technique drastically reduces the number of parameters that are considered, resulting in dramatic speedups, while provably approximating the covariance of the full set of parameters. To compute a skeletal image set, we first estimate the accuracy of two-frame reconstructions between pairs of overlapping images, then use a graph algorithm to select a subset of images that, when reconstructed, approximates the accuracy of the full set. A final bundle adjustment can then optionally be used to restore any loss of accuracy.

 
Citation:
Snavely, N., Seitz, S. M., and Szeliski, R. Skeletal graphics for efficient structure from motion. CVPR 2008.

 
On-line documents:
Complete article (PDF, 2MB)

 

Automated Generation of Interactive 3D Exploded View Diagrams

Abstract:
We present a system for creating and viewing interactive exploded views of complex 3D models. In our approach, a 3D input model is organized into an explosion graph that encodes how parts explode with respect to each other. We present an automatic method for computing explosion graphs that takes into account part hierarchies in the input models and handles common classes of interlocking parts. Our system also includes an interface that allows users to interactively explore our exploded views using both direct controls and higher-level interaction modes.

 
Citation:
Li, W., Agrawala, M., Curless, B., and Salesin, D. Automated Generation of Interactive 3D Exploded View Diagrams. ACM Transactions on Graphics 27(3), August 2008.

 
On-line documents:
Complete article (PDF, 4.5MB)
Project page

 

Rectified Surface Mosaics

Abstract:
We approach mosaicing as a camera tracking problem within a known parameterized surface. From a video of a camera moving within a surface, we compute a mosaic representing the texture of that surface, flattened onto a planar image. Our approach works by defining a warp between images as a function of surface geometry and camera pose. Globally optimizing this warp to maximize alignment across all frames determines the camera trajectory, and the corresponding flattened mosaic image. In contrast to previous mosaicing methods which assume planar or distant scenes, or controlled camera motion, our approach enables mosaicing in cases where the camera moves unpredictably through proximal surfaces, such as in medical endoscopy applications.

 
Citation:
Rectified Surface Mosaics. Carroll, R. E. and Seitz, S. M.. IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2007), Rio de Janeiro, Brazil, October 2007.

 
On-line documents:
Complete article (PDF, 2.6MB)

 

A Probabilistic Model for Object Recognition, Segmentation, and Non-Rigid Correspondence

Abstract:
We describe a method for fully automatic object recognition and segmentation using a set of reference images to specify the appearance of each object. Our method uses a generative model of image formation that takes into account occlusions, simple lighting changes, and object deformations. We take advantage of local features to identify, locate, and extract multiple objects in the presence of large viewpoint changes, nonrigid motions with large numbers of degrees of freedom, occlusions, and clutter. We simultaneously compute an object-level segmentation and a dense correspondence between the pixels of the appropriate reference images and the image to be segmented.

 
Citation:
A Probabilistic Model for Object Recognition, Segmentation, and Non-Rigid Correspondence. Simon, I. and Seitz, S. M. Proceedings of CVPR 2007, Minneapolis, Minnesota, June 2007.

 
On-line documents:
Complete article (PDF, 1.6MB)

Scene Summarization for Online Image Collections

Abstract:
We formulate the problem of scene summarization as selecting a set of images that efficiently represents the visual content of a given scene. The ideal summary presents the most interesting and important aspects of the scene with minimal redundancy. We propose a solution to this problem using multi-user image collections from the Internet. Our solution examines the distribution of images in the collection to select a set of canonical views to form the scene summary, using clustering techniques on visual features. The summaries we compute also lend themselves naturally to the browsing of image collections, and can be augmented by analyzing user-specified image tag data. We demonstrate the approach using a collection of images of the city of Rome, showing the ability to automatically decompose the images into separate scenes, and identify canonical views for each scene.

 
Citation:
Scene Summarization for Online Image Collections. Simon, I., Snavely, N. and Seitz, S. M. Proceedings of ICCV 2007, Rio de Janeiro, Brazil, October 2007.

 
On-line documents:
Complete article (PDF, 2.4MB)
Project Page

Multi-View Stereo for Community Photo Collections

Abstract:
We present a multi-view stereo algorithm that addresses the extreme changes in lighting, scale, clutter, and other effects in large online community photo collections. Our idea is to intelligently choose images to match, both at a per-view and per-pixel level. We show that such adaptive view selection enables robust performance even with dramatic appearance variability. The stereo matching technique takes as input sparse 3D points reconstructed from structure-from-motion methods and iteratively grows surfaces from these points. Optimizing for surface normals within a photoconsistency measure significantly improves the matching results. While the focus of our approach is to estimate high-quality depth maps, we also show examples of merging the resulting depth maps into compelling scene reconstructions. We demonstrate our algorithm on standard multi-view stereo datasets and on casually acquired photo collections of famous scenes gathered from the Internet.

 
Citation:
Multi-View Stereo for Community Photo Collections. Goesele, M., Snavely, N., Curless, B., Hoppe, H. and Seitz, S. M. Proceedings of ICCV 2007, Rio de Janeiro, Brazil, October 2007.

 
On-line documents:
Complete article (PDF, 9.2MB)
Project Page

 

Globally Optimal Affine and Metric Upgrades in Stratified Autocalibration

Abstract:
We present a practical, stratified autocalibration algorithm with theoretical guarantees of global optimality. Given a projective reconstruction, the first stage of the algorithm upgrades it to affine by estimating the position of the plane at infinity. The plane at infinity is computed by globally minimizing a least squares formulation of the modulus constraints. In the second stage, the algorithm upgrades this affine reconstruction to a metric one by globally minimizing the infinite homography relation to compute the dual image of the absolute conic (DIAC). The positive semidefiniteness of the DIAC is explicitly enforced as part of the optimization process, rather than as a post-processing step.

For each stage, we construct and minimize tight convex relaxations of the highly non-convex objective functions in a branch and bound optimization framework. We exploit the problem structure to restrict the search space for the DIAC and the plane at infinity to a small, fixed number of branching dimensions, independent of the number of views.

Experimental evidence of the accuracy, speed and scalability of our algorithm is presented on synthetic and real data. MATLAB code for the implementation is made available to the community.

 
Citation:
Globally Optimal Affine and Metric Upgrades in Stratified Autocalibration. Chandraker, M., Agarwal, S., Kriegman, D. and Belongie, S. Proceedings of ICCV 2007, Rio de Janeiro, Brazil, October 2007.

 
On-line documents:
Complete article (PDF, 1.5MB)

Relations, Cards, and Search Templates: User-Guided Web Data Integration and Layout

Abstract:
We present three new interaction techniques for aiding users in collecting and organizing Web content. First, we demonstrate an interface for creating associations between websites, which facilitate the automatic retrieval of related content. Second, we present an authoring interface that allows users to quickly merge content from many different websites into a uniform and personalized representation, which we call a card. Finally, we introduce a novel search paradigm that leverages the relationships in a card to direct search queries to extract relevant content from multipleWeb sources and fill a new series of cards instead of just returning a list of webpage URLs. Preliminary feedback from users is positive and validates our design.

 
Citation:
Relations, Cards, and Search Templates: User-Guided Web Data Integration and Layout. Dontcheva, M., Drucker, S. M., Salesin, D. H. and Cohen, M. F. Proceedings of UIST 2007, Newport, Rhode Island, October 2007.

 
On-line documents:
Complete article (PDF, 9.3MB)

Near-optimal Character Animation with Continuous Control

Abstract:
We present a new model for real-time character animation with multidimensional, interactive control. The underlying motion engine is data-driven, enables rapid transitions, and automatically enforces foot-skate constraints without inverse kinematics. On top of this motion space, our algorithm learns approximately optimal controllers which use a compact basis representation to guide the system through multidimensional state-goal spaces. These controllers enable real-time character animation that fluidly responds to changing user directives and environmental constraints.

 
Citation:
Near-optimal Character Animation with Continuous Control. Treuille, A., Lee, Y., and Popović, Z. ACM Transactions on Graphics 26(3), August 2007.

 
On-line documents:
Complete article (PDF, 0.8MB)
Project Page

Video Watercolorization using Bidirectional Texture Advection

Abstract:
In this paper, we present a method for creating watercolor-like animation, starting from video as input. The method involves two main steps: applying textures that simulate a watercolor appearance; and creating a simplified, abstracted version of the video to which the texturing operations are applied. Both of these steps are subject to highly visible temporal artifacts, so the primary technical contributions of the paper are extensions of previous methods for texturing and abstraction to provide temporal coherence when applied to video sequences. To maintain coherence for textures, we employ texture advection along lines of optical flow. We furthermore extend previous approaches by incorporating advection in both forward and reverse directions through the video, which allows for minimal texture distortion, particularly in areas of disocclusion that are otherwise highly problematic. To maintain coherence for abstraction, we employ mathematical morphology extended to the temporal domain, using filters whose temporal extents are locally controlled by the degree of distortions in the optical flow. Together, these techniques provide the first practical and robust approach for producing watercolor animations from video, which we demonstrate with a number of examples.

 
Citation:
Video Watercolorization using Bidirectional Texture Advection. Adrien Bousseau, Fabrice Neyret, Joëlle Thollot, David Salesin. ACM Transactions on Graphics 26(3), August 2007.

 
On-line documents:
Complete article (PDF, 5.3MB)
Project Page

Active Learning for Real-time Motion Controllers

Abstract:
This paper describes an approach to building real-time highly-controllable characters. A kinematic character controller is built on-the-fly during a capture session, and updated after each new motion clip is acquired. Active learning is used to identify which motion sequence the user should perform next, in order to improve the quality and responsiveness of the controller. Because motion clips are selected adaptively, we avoid the difficulty of manually determining which ones to capture, and can build complex controllers from scratch while significantly reducing the number of necessary motion samples.

 
Citation:
Active Learning for Real-time Motion Controllers. Seth Cooper, Aaron Hertzmann, Zoran Popović. ACM Transactions on Graphics 26(3), August 2007.

 
On-line documents:
Complete article (PDF, 2.4MB)
Project Page

Layered Depth Panoramas

Abstract:
Representations for interactive photorealistic visualization of scenes range from compact 2D panoramas to dataintensive 4D light fields. In this paper, we propose a technique for creating a layered representation from a sparse set of images taken with a hand-held camera. This representation, which we call a layered depth panorama (LDP), allows the user to experience 3D by off-axis panning. It combines the compelling experience of panoramas with limited 3D navigation. Our choice of representation is motivated by ease of capture and compactness. We formulate the problem of constructing the LDP as the recovery of color and geometry in a multi-perspective cylindrical disparity space. We leverage a graph cut approach to sequentially determine the disparity and color of each layer using multi-view stereo. Geometry visible through the cracks at depth discontinuities in a frontmost layer is determined and assigned to layers behind the frontmost layer. All layers are then used to render novel panoramic views with parallax. We demonstrate our approach on a variety of complex outdoor and indoor scenes.

 
Citation:
Layered Depth Panoramas. Ke Colin Zheng, Sing Bing Kang, Michael Cohen, Richard Szeliski. CVPR 2007, Minneapolis, Minnesota.

 
On-line documents:
Complete article (PDF, 4.4MB)
Project Page

Soft Scissors: An Interactive Tool for Realtime High Quality Matting

Abstract:
We present Soft Scissors, an interactive tool for extracting alpha mattes of foreground objects in realtime. We recently proposed a novel offline matting algorithm capable of extracting high-quality mattes for complex foreground objects such as furry animals [Wang and Cohen 2007]. In this paper we both improve the quality of our offline algorithm and give it the ability to incrementally update the matte in an online interactive setting. Our realtime system efficiently estimates foreground color thereby allowing both the matte and the final composite to be revealed instantly as the user roughly paints along the edge of the foreground object. In addition, our system can dynamically adjust the width and boundary conditions of the scissoring paint brush to approximately capture the boundary of the foreground object that lies ahead on the scissor's path. These advantages in both speed and accuracy create the first interactive tool for high quality image matting and compositing.

 
Citation:
Soft Scissors: An Interactive Tool for Realtime High Quality Matting. Jue Wang, Maneesh Agrawala and Michael Cohen. ACM Transactions on Graphics 26(3), August 2007.

 
On-line documents:
Complete article (PDF, 5.6MB)

Simultaneous Matting and Compositing

Abstract:
Recent work in matting, hole filling, and compositing allows image elements to be mixed in a new composite image. Previous algorithms for matting foreground elements have assumed that the new background for compositing is unknown. We show that, if the new background is known, the matting algorithm has more freedom to create a successful matte by simultaneously optimizing the matting and compositing operations.

We propose a new algorithm, that integrates matting and compositing into a single optimization process. The system is able to compose foreground elements onto a new background more efficiently and with less artifacts compared with previous approaches. In our examples, we show how one can enlarge the foreground while maintaining the wide angle view of the background. We also demonstrate composing a foreground element on top of similar backgrounds to help remove unwanted portions of the background or to re-scale or re-arrange the composite. We compare and contrast our method with a number of previous matting and compositing systems.

 
Citation:
Simultaneous Matting and Compositing. Jue Wang and Michael Cohen. CVPR 2007, Minneapolis, Minnesota.

 
On-line documents:
Complete article (PDF, 6.8MB)

Optimized Color Sampling for Robust Matting

Abstract:
Image matting is the problem of determining for each pixel in an image whether it is foreground, background, or the mixing parameter, "alpha," for those pixels that are a mixture of foreground and background. Matting is inherently an ill-posed problem. Previous matting approaches either use naive color sampling methods to estimate foreground and background colors for unknown pixels, or use propagation-based methods to avoid color sampling under weak assumptions about image statistics. We argue that neither method itself is enough to generate good results for complex natural images.

We analyze the weaknesses of previous matting approaches, and propose a new robust matting algorithm. In our approach we also sample foreground and background colors for unknown pixels, but more importantly, analyze the confidence of these samples. Only high confidence samples are chosen to contribute to the matting energy function which is minimized by a Random Walk. The energy function we define also contains a neighborhood term to enforce the smoothness of the matte. To validate the approach, we present an extensive and quantitative comparison between our algorithm and a number of previous approaches in hopes of providing a benchmark for future matting research.

 
Citation:
Optimized Color Sampling for Robust Matting. Jue Wang and Michael Cohen. CVPR 2007, Minneapolis, Minnesota.

 
On-line documents:
Complete article (PDF, 3.2MB)

Principal Curvature-Based Region Detector for Object Recognition

Abstract:
This paper presents a new structure-based interest region detector called Principal Curvature-Based Regions (PCBR) which we use for object class recognition. The PCBR interest operator detects stable watershed regions within the multi-scale principal curvature image. To detect robust watershed regions, we "clean" a principal curvature image using a combination of grayscale morphological closing and a new "eigenvector flow" hysteresis thresholding. Robustness across scales is achieved by selecting the maximal stable regions across consecutive scales. PCBR typically detects distinctive patterns distributed evenly on the objects and it shows significant robustness to local intensity perturbations and intra-class variations. We evaluate PCBR both qualitatively (through visual inspection) and quantitatively (by measuring repeatability and classification accuracy in real-world object-class recognition problems). Experiments on different benchmark datasets show that PCBR is comparable or superior to state-of-art detectors for both feature matching and object recognition problems. Moreover, we demonstrate the application of PCBR to symmetry detection.

 
Citation:
Principal Curvature-Based Region Detector for Object Recognition. Hongli Deng, Wei Zhang, Eric Mortensen, Thomas Dietterich, Linda Shapiro. CVPR 2007, Minneapolis, Minnesota.

 
On-line documents:
Complete article (PDF, 3.0MB)

Using Photographs to Enhance Videos of a Static Scene

Abstract:
We present a framework for automatically enhancing videos of a static scene using a few photographs of the same scene. For example, our system can transfer photographic qualities such as high resolution, high dynamic range and better lighting from the photographs to the video. Additionally, the user can quickly modify the video by editing only a few still images of the scene. Finally, our system allows a user to remove unwanted objects and camera shake from the video. These capabilities are enabled by two technical contributions presented in this paper. First, we make several improvements to a state-of-the-art multiview stereo algorithm in order to compute view-dependent depths using video, photographs, and structure-from-motion data. Second, we present a novel image-based rendering algorithm that can re-render the input video using the appearance of the photographs while preserving certain temporal dynamics such as specularities and dynamic scene lighting.

 
Citation:
Using Photographs to Enhance Videos of a Static Scene. Pravin Bhat, C. Lawrence Zitnick, Noah Snavely, Aseem Agarwala, Maneesh Agrawala, Michael Cohen, Brian Curless, Sing Bing Kang. Eurographics Symposium on Rendering 2007.

 
On-line documents:
Complete article (PDF, 20.0MB)
Project Page

Automated Insect Identification through Concatenated Histograms of Local Appearance Features

Abstract:
Abstract This paper describes a computer vision approach to automated rapid-throughput taxonomic identification of stonefly larvae. The long-term goal of this research is to develop a cost-effective method for environmental monitoring based on automated identification of indicator species. Recognition of stonefly larvae is challenging because they are highly articulated, they exhibit a high degree of intraspecies variation in size and color, and some species are difficult to distinguish visually, despite prominent dorsal patterning. The stoneflies are imaged via an apparatus that manipulates the specimens into the field of view of a microscope so that images are obtained under highly repeatable conditions. The images are then classified through a process that involves (a) identification of regions of interest, (b) representation of those regions as SIFT vectors [1], (c) classification of the SIFT vectors into learned "features" to form a histogram of detected features, and (d) classification of the feature histogram via state-of-the-art ensemble classification algorithms. The steps (a) to (c) compose the concatenated feature histogram (CFH) method. We apply three region detectors for part (a) above, including a newly developed principal curvature-based region (PCBR) detector. This detector finds stable regions of high curvature via a watershed segmentation algorithm. We compute a separate dictionary of learned features for each region detector, and then concatenate the histograms prior to the final classification step.

We evaluate this classification methodology on a task of discriminating among four stonefly taxa, two of which, Calineuria and Doroneuria, are difficult even for experts to discriminate. The results show that the combination of all three detectors gives four-class accuracy of 82% and three-class accuracy (pooling Calineuria and Doroneuria) of 95%. Each region detector makes a valuable contribution. In particular, our new PCBR detector is able to discriminate Calineuria and Doroneuria much better than the other detectors.

 
Citation:
Automated Insect Identification through Concatenated Histograms of Local Appearance Features: Feature Vector Generation and Region Detection for Deformable Objects. Enrique Larios, Hongli Deng, Wei Zhang, Matt Sarpola, Jenny Yuen, Robert Paasch, Andrew Moldenke, David Lytle, Salvador Ruiz Correa, Eric Mortensen, Linda Shapiro, and Tom Dietterich. In Machine Vision and Applications, 2007.

 
On-line documents:
Complete article (PDF, 0.9MB)

Interactive Cutaway Illustrations of Complex 3D Models

Abstract:
We present a system for authoring and viewing interactive cutaway illustrations of complex 3D models using conventions of traditional scientific and technical illustration. Our approach is based on the two key ideas that 1) cuts should respect the geometry of the parts being cut, and 2) cutaway illustrations should support interactive exploration. In our approach, an author instruments a 3D model with auxiliary parameters, which we call "rigging," that define how cutaways of that structure are formed. We provide an authoring interface that automates most of the rigging process. We also provide a viewing interface that allows viewers to explore rigged models using high-level interactions. In particular, the viewer can just select a set of target structures, and the system will automatically generate a cutaway illustration that exposes those parts. We have tested our system on a variety of CAD and anatomical models, and our results demonstrate that our approach can be used to create and view effective interactive cutaway illustrations for a variety of complex objects with little user effort.

 
Citation:
Interactive Cutaway Illustration of Complex 3D Models. Wilmot Li, Lincoln Ritter, Maneesh Agrawala, Brian Curless, David Salesin. ACM Transactions on Graphics 26(3), August 2007.

 
On-line documents:
Complete article (PDF, 15.0MB)
Project page

A Theory of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF / Lighting Transfer and Image Consistency

Abstract:
This paper develops a theory of frequency domain invariants in computer vision. We derive novel identities using spherical harmonics, which are the angular frequency domain analog to common spatial domain invariants such as reflectance ratios. These invariants are derived from the spherical harmonic convolution framework for reflection from a curved surface. Our identities apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. For this case, we derive a novel identity, independent of the specific lighting configurations or BRDFs, that allows us to directly estimate the fourth image if the other three are available. The identity can also be used as an invariant to detect tampering in the images.

While this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image, to detect tampering or image splicing.

 
Citation:
A Theory Of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF / Lighting Transfer and Image Consistency. Dhruv Mahajan, Ravi Ramamoorthi, and Brian Curless. To appear, IEEE Pattern Analysis and Machine Intelligence.

 
On-line documents:
Complete article (PDF, 3.0MB)

Devices That Tell On You: Privacy Trends in Consumer Ubiquitous Computing

Abstract:
We analyze three new consumer electronic gadgets in order to gauge the privacy and security trends in mass-market UbiComp devices. Our study of the Slingbox Pro uncovers a new information leakage vector for encrypted streaming multimedia. By exploiting properties of variable bitrate encoding schemes, we show that a passive adversary can determine with high probability the movie that a user is watching via her Slingbox, even when the Slingbox uses encryption. We experimentally evaluated our method against a database of over 100 hours of network traces for 26 distinct movies.

Despite an opportunity to provide significantly more location privacy than existing devices, like RFIDs, we find that an attacker can trivially exploit the Nike+iPod Sport Kit's design to track users; we demonstrate this with a GoogleMaps-based distributed surveillance system. We also uncover security issues with the way Microsoft Zunes manage their social relationships.

We show how these products' designers could have significantly raised the bar against some of our attacks. We also use some of our attacks to motivate fundamental security and privacy challenges for future UbiComp devices.

 
Citation:
Devices That Tell On You: Privacy Trends in Consumer Ubiquitous Computing. T. Scott Saponas, Jonathan Lester, Carl Hartung, Sameer Agarwal and Tadayoshi Kohno, to appear USENIX Security 2007.

 
On-line documents:
Complete article (PDF, 1.5MB)
Project Page

 

Generalized Non-metric Multidimensional Scaling

Citation:
Generalized Non-metric Multidimensional Scaling. Sameer Agarwal, Josh Wills, Lawrence Cayton, Gert Lanckriet, David Kriegman and Serge Belongie. AISTATS 2007, San Juan, Puerto Rico.

 
On-line documents:
Complete article (PDF, 0.9MB)

ShadowCuts: Photometric Stereo with Shadows

Abstract:
We present an algorithm for performing Lambertian photometric stereo in the presence of shadows. The algorithm has three novel features. First, a fast graph cuts based method is used to estimate per pixel light source visibility. Second, it allows images to be acquired with multiple illuminants, and there can be fewer images than light sources. This leads to better surface coverage and improves the reconstruction accuracy by enhancing the signal to noise ratio and the condition number of the light source matrix. The ability to use fewer images than light sources means that the imaging effort grows sublinearly with the number of light sources. Finally, the recovered shadow maps are combined with shading information to perform constrained surface normal integration. This reduces the low frequency bias inherent to the normal integration process and ensures that the recovered surface is consistent with the shadowing configuration.

The algorithm works with as few as four light sources and four images. We report results for light source visibility detection and high quality surface reconstructions for synthetic and real datasets.

 
Citation:
ShadowCuts: Photometric Stereo with Shadows. Manmohan Chandraker, Sameer Agarwal, David Kriegman. CVPR 2007, Minneapolis, Minnesota.

 
On-line documents:
Complete article (PDF, 1.6MB)

Autocalibration via Rank-Constrained Estimation of the Absolute Quadric

Abstract:
We present an autocalibration algorithm for upgrading a projective reconstruction to a metric reconstruction by estimating the absolute dual quadric. The algorithm enforces the rank degeneracy and the positive semidefiniteness of the dual quadric as part of the estimation procedure, rather than as a post-processing step. Furthermore, the method allows the user, if he or she so desires, to enforce conditions on the plane at infinity so that the reconstruction satisfies the chirality constraints.

The algorithm works by constructing low degree polynomial optimization problems, which are solved to their global optimum using a series of convex linear matrix inequality relaxations. The algorithm is fast, stable, robust and has time complexity independent of the number of views. We show extensive results on synthetic as well as real datasets to validate our algorithm.

 
Citation:
Autocalibration via Rank-Constrained Estimation of the Absolute Quadric. Manmohan Chandraker, Sameer Agarwal, Fredrik Kahl, David Nistér, David Kriegman. CVPR 2007, Minneapolis, Minnesota.

 
On-line documents:
Complete article (PDF, 0.2MB)

Stylizing 2.5-D Video

Abstract:
In recent years considerable interest has been given to non-photorealistic rendering of photographs, video, and 3D models for illustrative or artistic purposes. Conventional 2D inputs such as photographs and video are easy to create and capture, while 3D models allow for a wider variety of stylization techniques, such as cross-hatching. In this paper, we propose using video with depth information (2.5D video) to combine the advantages of 2D and 3D input. 2.5D video is becoming increasingly easy to capture, and with the additional depth information, stylization techniques that require shape information can be applied. However, because 2.5D video contains only limited shape information and 3D correspondence over time is unknown, it is difficult to create temporally coherent stylized animations directly from raw 2.5D video. In this paper, we present techniques for processing 2.5D video to overcome these drawbacks, and demonstrate several styles that can be created using these techniques.

 
Citation:
Stylizing 2.5D video. Noah Snavely, C. Lawrence Zitnick, Sing Bing Kang, Michael Cohen. In Proc. Symposium on Non-Photorealistic Animation and Rendering (NPAR) 2006, pages 63-69.

 
On-line documents:
Complete article (PDF, 0.8MB)

Summarizing Personal Web Browsing Sessions

Abstract:
We describe a system, implemented as a browser extension, that enables users to quickly and easily collect, view, and share personal Web content. Our system employs a novel interaction model, which allows a user to specify webpage extraction patterns by interactively selecting webpage elements and applying these patterns to automatically collect similar content. Further, we present a technique for creating visual summaries of the collected information by combining user labeling with predefined layout templates. These summaries are interactive in nature: depending on the behaviors encoded in their templates, they may respond to mouse events, in addition to providing a visual summary. Finally, the summaries can be saved or sent to other users to continue the research at another place or time. Informal evaluation shows that our approach works well for popular websites, and that users can quickly learn this interaction model for collecting Web content.

 
Citation:
Mira Dontcheva, Steven Drucker, Geraldine Wade, David Salesin and Michael F. Cohen. Summarizing Personal Web Browsing Sessions. Proceedings of ACM UIST 2006.

 
On-line documents:
Complete article (PDF, 5.0MB)
Project Page

Painting With Texture

Abstract:
We present an interactive texture painting system that allows the user to author digital images by painting with a palette of input textures. At the core of our system is an interactive texture synthesis algorithm that generates textures with natural-looking boundary effects and alpha information as the user paints. Furthermore, we describe an intuitive layered painting model that allows strokes of texture to be merged, intersected and overlapped while maintaining the appropriate boundaries between texture regions. We demonstrate the utility and expressiveness of our system by painting several images using textures that exhibit a range of different boundary effects.

 
Citation:
Lincoln Ritter, Wilmot Li, Maneesh Agrawala, Brian Curless, David Salesin. Paitning With Texture. Proceedings of the 17th Eurographics Symposium on Rendering, 2006.

 
On-line documents:
Complete article (PDF, 1.7MB)
Project Page

Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis

Abstract:
We present a method for learning a model of human body shape variation from a corpus of 3D range scans. Our model is the first to capture both identity-dependent and pose-dependent shape variation in a correlated fashion, enabling creation of a variety of virtual human characters with realistic and non-linear body deformations that are customized to the individual. Our learning method is robust to irregular sampling in pose-space and identity space, and also to missing surface data in the examples. Our synthesized character models are based on standard skinning techniques and can be rendered in real time.

 
Citation:
Brett Allen, Brian Curless, Zoran Popović, Aaron Hertzmann. Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis. Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2006, pp. 147-156.

 
On-line documents:
Complete article (PDF, 1.9MB)
Project Page

Gaze-Based Interaction for Semi-Automatic Photo Cropping

Abstract:
We present an interactive method for cropping photographs given minimal information about the location of important content, provided by eye tracking. Cropping is formulated in a general optimization framework that facilitates adding new composition rules, as well as adapting the system to particular applications. Our system uses fixation data to identify important content and compute the best crop for any given aspect ratio or size, enabling applications such as automatic snapshot recomposition, adaptive documents, and thumbnailing. We validate our approach with studies in which users compare our crops to ones produced by hand and by a completely automatic approach. Experiments show that viewers prefer our gaze-based crops to uncropped images and fully automatic crops.

 
Citation:
Anthony Santella, Maneesh Agrawala, Doug DeCarlo, David H. Salesin, Michael F. Cohen. Gaze-Based Interaction for Semi-Automatic Photo Cropping. ACM Human Factors in Computing Systems (CHI), 2006, pp. 771-780.

 
On-line documents:
Complete article (PDF, 2.2MB)
Project Page

Photo Tourism: Exploring Photo Collections in 3D

Abstract:
We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

 
Citation:
Noah Snavely, Steven M. Seitz, Richard Szeliski. Photo Tourism: Exploring Photo Collections in 3D. ACM Transactions on Graphics 25(3) (ACM SIGGRAPH 2006), July 2006.

 
On-line documents:
Complete article (PDF, 1.7MB)
Project Page

The Cartoon Animation Filter

Abstract:
We present the "Cartoon Animation Filter," a simple filter that takes an arbitrary input motion signal and modulates it in such a way that the output motion is more "alive" or "animated." The filter adds a smoothed, inverted, and (sometimes) time shifted version of the second derivative (the acceleration) of the signal back into the original signal. Almost all parameters of the filter are automated. The user only needs to set the desired strength of the filter. The beauty of the animation filter lies in its simplicity and generality. We apply the filter to motions ranging from hand drawn trajectories, to simple animations within PowerPoint presentations, to motion captured DOF curves, to video segmentation results. Experimental results show that the filtered motion exhibits anticipation, follow-through, exaggeration and squash-and-stretch effects which are not present in the original input motion data.

 
Citation:
Jue Wang, Steven M. Drucker, Maneesh Agrawala, Michael F. Cohen. The Cartoon Animation Filter. ACM Transactions on Graphics 25(3) (ACM SIGGRAPH 2006), July 2006.

 
On-line documents:
Complete article (PDF, 0.6MB)
Project Page

Composition of Complex Optimal Multi-Character Motions

Abstract:
This paper presents a physics-based method for creating complex multi-character motions from short singlecharacter sequences. We represent multi-character motion synthesis as a spacetime optimization problem where constraints represent the desired character interactions. We extend standard spacetime optimization with a novel timewarp parameterization in order to jointly optimize the motion and the interaction constraints. In addition, we present an optimization algorithm based on block coordinate descent and continuations that can be used to solve large problems multiple characters usually generate. This framework allows us to synthesize multi-character motion drastically different from the input motion. Consequently, a small set of input motion dataset is sufficient to express a wide variety of multi-character motions.

 
Citation:
C. Karen Liu, Aaron Hertzmann, Zoran Popović. Composition of Complex Optimal Multi-Character Motions. ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2006.

 
On-line documents:
Complete article (PDF, 3.5MB)
Project Page

Photographing Long Scenes with Multi-Viewpoint Panoramas

Abstract:
We present a system for producing multi-viewpoint panoramas of long, roughly planar scenes, such as the facades of buildings along a city street, from a relatively sparse set of photographs captured with a handheld still camera that is moved along the scene. Our work is a significant departure from previous methods for creating multiviewpoint panoramas, which composite thin vertical strips from a video sequence captured by a translating video camera, in that the resulting panoramas are composed of relatively large regions of ordinary perspective. In our system, the only user input required beyond capturing the photographs themselves is to identify the dominant plane of the photographed scene; our system then computes a panorama automatically using Markov Random Field optimization. Users may exert additional control over the appearance of the result by drawing rough strokes that indicate various high-level goals. We demonstrate the results of our system on several scenes, including urban streets, a river bank, and a grocery store aisle.

 
Citation:
Aseem Agarwala, Maneesh Agrawala, Michael F. Cohen, David H. Salesin, Richard Szeliski. Photographing Long Scenes with Multi-Viewpoint Panoramas. ACM Transactions on Graphics 25(3) (ACM SIGGRAPH 2006), July 2006.

 
On-line documents:
Complete article (PDF, 4.3MB)
Project Page

Volumetric Density Capture From a Single Image

Abstract:
We propose a new approach to capture the volumetric density of scattering media instantaneously with a single image. The volume is probed with a set of laser lines and the scattered intensity is recorded by a conventional camera. We then determine the density along the laser lines taking the scattering properties of the media into account. A specialized interpolation technique reconstructs the full density field in the volume. We apply the technique to capture the volumetric density of participating media such as smoke.

 
Citation:
Christian Fuchs, Tongbo Chen, Michael Goesele, Holger Theisel, Hans-Peter Seidel. Volumetric Density Capture From a Single Image, Proceedings of the International Workshop on Volume Graphics 2006, July 2006.

 
On-line documents:
Complete article (PDF, 3.5MB)

Model Reduction for Real-time Fluids

Abstract:
We present a new model reduction approach to fluid simulation, enabling large, real-time, detailed flows with continuous user interaction. Our reduced model can also handle moving obstacles immersed in the flow. We create separate models for the velocity field and for each moving boundary, and show that the coupling forces may be reduced as well. Our results indicate that surprisingly few basis functions are needed to resolve small but visually important features such as spinning vortices.

 
Citation:
Adrien Treuille, Andrew Lewis, Zoran Popović. Model Reduction for Real-time Fluids, ACM Transactions on Graphics 25(3) (SIGGRAPH 2006), July 2006.

 
On-line documents:
Complete article (PDF, 3MB)
Project Page

Continuum Crowds

Abstract:
We present a real-time crowd model based on continuum dynamics. In our model, a dynamic potential field simultaneously integrates global navigation with moving obstacles such as other people, efficiently solving for the motion of large crowds without the need for explicit collision avoidance. Simulations created with our system run at interactive rates, demonstrate smooth flow under a variety of conditions, and naturally exhibit emergent phenomena that have been observed in real crowds.

 
Citation:
Adrien Treuille, Seth Cooper, Zoran Popović. Continuum Crowds, ACM Transactions on Graphics 25(3) (SIGGRAPH 2006), July 2006.

 
On-line documents:
Complete article (PDF, 3.4MB)
Project Page

Schematic Storyboards for Video Visualization and Editing

Abstract:
We present a method for visualizing short video clips in a single static image, using the visual language of storyboards. These schematic storyboards are composed from multiple input frames and annotated using outlines, arrows, and text describing the motion in the scene. The principal advantage of this storyboard representation over standard representations of video generally either a static thumbnail image or a playback of the video clip in its entirety is that it requires only a moment to observe and comprehend but at the same time retains much of the detail of the source video. Our system renders a schematic storyboard layout based on a small amount of user interaction.We also demonstrate an interaction technique to scrub through time using the natural spatial dimensions of the storyboard. Potential applications include video editing, surveillance summarization, assembly instructions, composition of graphic novels, and illustration of camera technique for film studies.

 
Citation:
Dan B Goldman, Brian Curless, David H. Salesin, Steven M. Seitz. Schematic Storyboarding for Video Visualization and Editing, ACM Transactions on Graphics 25(3), (SIGGRAPH 2006), July 2006.

 
On-line documents:
Complete article (PDF, 6MB)
Project Page

Spatio-Angular Resolution Tradeoff in Integral Photography

Abstract:
An integral camera samples the 4D light field of a scene within a single photograph. This paper explores the fundamental tradeoff between spatial resolution and angular resolution that is inherent to integral photography. Based on our analysis we divide previous integral camera designs into two classes depending on how the 4D light field is distributed (multiplexed) over the 2D sensor. Our optical treatment is mathematically rigorous and extensible to the broader area of light field research. We argue that for many real-world scenes it is beneficial to sacrifice angular resolution for higher spatial resolution. The missing angular resolution is then interpolated using techniques from computer vision. We have developed a prototype integral camera that uses a system of lenses and prisms as an external attachment to a conventional camera. We have used this prototype to capture the light fields of a variety of scenes. We show examples of novel view synthesis and refocusing where the spatial resolution is significantly higher than is possible with previous designs.

 
Citation:
Todor Georgiev, Ke Colin Zheng, Brian Curless, David H. Salesin, Shree Nayar, Chintan Intwala. Spatio-Angular Resolution Tradeoff in Integral Photography, Proceedings of Eurographics Symposium on Rendering, 2006.

 
On-line documents:
Complete article (PDF, 0.6MB)
Project Page

Multi-View Stereo Revisited

Abstract:
We present an extremely simple yet robust multi-view stereo algorithm and analyze its properties. The algorithm first computes individual depth maps using a window-based voting approach that returns only good matches. The depth maps are then merged into a single mesh using a straightforward volumetric approach. We show results for several datasets, showing accuracy comparable to the best of the current state of the art techniques and rivaling more complex algorithms.

 
Citation:
Michael Goesele, Steven M. Seitz and Brian Curless. Multi-View Stereo Revisited, Proceedings of CVPR 2006, New York, NY, USA, June 2006.

 
On-line documents:
Complete article (PDF, 5.3MB)

Mesostructure from Specularity

Abstract:
We describe a simple and robust method for surface mesostructure acquisition. Our method builds on the observation that specular reflection is a reliable visual cue for surface mesostructure perception. In contrast to most photometric stereo methods, which take specularities as outliers and discard them, we propose a progressive acquisition system that captures a dense specularity field as the only information for mesostructure reconstruction. Our method can efficiently recover surfaces with fine-scale geometric details from complex real-world objects with a wide variety of reflection properties, including translucent, low albedo, and highly specular objects. We show results for a variety of objects including human skin, dried apricot, orange, jelly candy, black leather and dark chocolate.

 
Citation:
Tongbo Chen, Michael Goesele and Hans-Peter Seidel. Mesostructure from Specularity, Proceedings of CVPR 2006, New York, NY, USA, June 2006.

 
On-line documents:
Complete article (PDF, 4.0MB)
Project Page

Piecewise Image Registration in the Presence of Multiple Large Motions

Abstract:
We present a technique for computing a dense pixel correspondence between two images of a scene containing multiple large, rigid motions. We model each motion with either a homography (for planar objects) or a fundamental matrix. The various motions in the scene are first extracted by clustering an initial sparse set of correspondences between feature points; we then perform a multi-label graph cut optimization which assigns each pixel to an independent motion and computes its disparity with respect to that motion. We demonstrate our technique on several example scenes and compare our results with previous approaches.

 
Citation:
Pravin Bhat, Ke Colin Zheng, Noah Snavely, Aseem Agarwala, Maneesh Agrawala, Michael F. Cohen and Brian Curless. Piecewise Image Registration in the Presence of Multiple Large Motions, Proceedings of CVPR 2006, New York, NY, USA, June 2006.

 
On-line documents:
Complete article (PDF, 0.8MB)

A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms

Abstract:
This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.

 
Citation:
Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms, Proceedings of CVPR 2006, New York, NY, USA, June 2006.

 
On-line documents:
Complete article (PDF, 1.8MB)
Project Page

A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency

Abstract:
We develop new mathematical results based on the spherical harmonic convolution framework for reflection from a curved surface. We derive novel identities, which are the angular frequency domain analogs to common spatial domain invariants such as reflectance ratios. They apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. Denote the spherical harmonic coefficients by Blight,materiallm, where the subscripts refer to the spherical harmonic indices, and the superscripts to the lighting (1 or 2) and object or material (again 1 or 2). We derive a basic identity, B1,1lmB2,2lm = B1,2lmB2,1lm, independent of the specific lighting configurations or BRDFs. While this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image, to detect tampering or image splicing.

 
Citation:
Dhruv Mahajan, Ravi Ramamoorthi and Brian Curless. A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency, in Proceedings of the Ninth European Conference on Computer Vision (ECCV 2006), Graz, Austria, May 2006.

 
On-line documents:
Complete article (PDF, 16.0MB)

 

Audio Analogies: Creating new music from an existing performance by concatenative synthesis

Abstract:
This paper describes a method for creating new music by concatenative synthesis. Given a MIDI score and an audio recording of an example piece of monophonic music, our method synthesizes audio to correspond with a new MIDI score. The algorithm we use is based on concatenative synthesis, commonly used for generating speech. Two versions of our algorithm are explored, one in which individual notes from the example piece are concatenated, and one in which pairs of adjacent notes from the example piece are concatenated. We examine the range of example pieces and target scores for which each version of our algorithm yields good results. Our underlying framework remains general enough to be applicable to other problems, such as rendering a stylized version of the target score, and other types of sound analogies.

 
Citation:
Audio Analogies: Creating new music from an existing performance by concatenative synthesis. Simon, I., Basu, S., Salesin, D. H. and Agrawala, M. Proceedings of ICMC 2005, Barcelona, Spain.

 
On-line documents:
Complete article (PDF, 0.4MB)

Dance reveals symmetry especially in young men

Abstract:
Dance is a common part of human courtship. Is it just for fun or does it carry a hidden message? This question was tackled in a population -- Jamaican -- where dance is particularly important. One property that dance might reflect is bodily symmetry, often used in evolutionary studies to measure developmental stability and genetic quality. A study using motion capture cameras to create video images of the dancers reveals a strong link between symmetry and dancing ability. The effect is stronger for men than for women, and women rate dances by symmetrical men relatively more positively than do men. It works both ways; symmetrical men value symmetry in women dancers more highly than less symmetrical men. In Jamaica at least, it seems that dance is a factor in sexual selection and reveals important information about the dancer. Freeze-frame images on the cover (by William M. Brown) show a symmetrical male dancer in action.

 
Citation:
William M. Brown, Lee Cronk, Keith Grochow, Amy Jacobson, C. Karen Liu, Zoran Popović, Robert Trivers. Dance reveals symmetry especially in young men. Nature 438(7071), 22 Dec 2005, pp. 1148-1150.

 
On-line documents:
Complete article (PDF, 0.2MB)
Project Page

A Theory of Inverse Light Transport

Abstract:
In this paper we consider the problem of computing and removing interreflections in photographs of real scenes. Towards this end, we introduce the problem of inverse light transport -- given a photograph of an unknown scene, decompose it into a sum of n-bounce images, where each image records the contribution of light that bounces exactly n times before reaching the camera. We prove the existence of a set of interreflection cancelation operators that enable computing each n-bounce image by multiplying the photograph by a matrix. This matrix is derived from a set of "impulse images" obtained by probing the scene with a narrow beam of light. The operators work under unknown and arbitrary illumination, and exist for scenes that have arbitrary spatially-varying BRDFs. We derive a closedform expression for these operators in the Lambertian case and present experiments with textured and untextured Lambertian scenes that confirm our theory's predictions.

 
Citation:
Steven M. Seitz, Yasuyuki Matsushita and Kiriakos N. Kutulakos. A Theory of Inverse Light Transport, in Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.

 
On-line documents:
Complete article (PDF, 4.0MB)

Vignette and Exposure Calibration and Compensation

Abstract:
We discuss calibration and removal of "vignetting" (radial falloff) and exposure (gain) variations from sequences of images. Unique solutions for vignetting, exposure and scene radiances are possible when the response curve is known. When the response curve is unknown, an exponential ambiguity prevents us from recovering these parameters uniquely. However, the vignetting and exposure variations can nonetheless be removed from the images without resolving this ambiguity. Applications include panoramic image mosaics, photometry for material reconstruction, imagebased rendering, and preprocessing for correlation-based vision algorithms.

 
Citation:
Dan B Goldman and Jiun-Hung Chen. Vignette and Exposure Calibration and Compensation, in Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.

 
On-line documents:
Complete article (PDF, 6.0MB)

Shape and Spatially-Varying BRDFs From Photometric Stereo

Abstract:
This paper describes a photometric stereo method designed for surfaces with spatially-varying BRDFs, including surfaces with both varying diffuse and specular properties. Our method builds on the observation that most objects are composed of a small number of fundamental materials. This approach recovers not only the shape but also material BRDFs and weight maps, yielding compelling results for a wide variety of objects. We also show examples of interactive lighting and editing operations made possible by our method.

 
Citation:
Dan B Goldman, Brian Curless, Aaron Hertzmann and Steven M. Seitz. Shape and Spatially-Varying BRDFs From Photometric Stereo, in Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.

 
On-line documents:
Complete article (PDF, 6.0MB)

Parameter Estimation for MRF Stereo

Abstract:
This paper presents a novel approach for estimating parameters for MRF-based stereo algorithms. This approach is based on a new formulation of stereo as a maximum a posterior (MAP) problem, in which both a disparity map and MRF parameters are estimated from the stereo pair itself. We present an iterative algorithm for the MAP estimation that alternates between estimating the parameters while fixing the disparity map and estimating the disparity map while fixing the parameters. The estimated parameters include robust truncation thresholds, for both data and neighborhood terms, as well as a regularization weight. The regularization weight can be either a constant for the whole image, or spatially-varying, depending on local intensity gradients. In the latter case, the weights for intensity gradients are also estimated. Experiments indicate that our approach, as a wrapper for existing stereo algorithms, moves a baseline belief propagation stereo algorithm up six slots in the Middlebury rankings.

 
Citation:
Li Zhang and Steven M. Seitz. Parameter Estimation for MRF Stereo, in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego CA, June 2005.

 
On-line documents:
Complete article (PDF, 1.0MB)
Project Web Page

Interactive Video Cutout

Abstract:
We present an interactive system for efficiently extracting foreground objects from a video. We extend previous min-cut based image segmentation techniques to the domain of video with four new contributions. We provide a novel painting-based user interface that allows users to easily indicate the foreground object across space and time. We introduce a hierarchical mean-shift preprocess in order to minimize the number of nodes that min-cut must operate on. Within the min-cut we also define new local cost functions to augment the global costs defined in earlier work. Finally, we extend 2D alpha matting methods designed for images to work with 3D video volumes. We demonstrate that our matting approach preserves smoothness across both space and time. Our interactive video cutout system allows users to quickly extract foreground objects from video sequences for use in a variety of applications including compositing onto new backgrounds and NPR cartoon style rendering.

 
Citation:
Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, Michael F. Cohen. ACM Transactions on Graphics 24(3), July 2005.

 
On-line documents:
Complete article (PDF, 60MB)

Animating Pictures with Stochastic Motion Textures

Abstract:
In this paper, we explore the problem of enhancing still pictures with subtly animated motions. We limit our domain to scenes containing passive elements that respond to natural forces in some fashion. We use a semi-automatic approach, in which a human user segments the scene into a series of layers to be individually animated. Then, a "stochastic motion texture" is automatically synthesized using a spectral method, i.e., the inverse Fourier transform of a filtered noise spectrum. The motion texture is a time-varying 2D displacement map, which is applied to each layer. The resulting warped layers are then recomposited to form the animated frames. The result is a looping video texture created from a single still image, which has the advantages of being more controllable and of generally higher image quality and resolution than a video texture created from a video source. We demonstrate the technique on a variety of photographs and paintings.

 
Citation:
Yung-Yu Chuang, Dan B Goldman, Ke Colin Zheng, Brian Curless, David H. Salesin, Richard Szeliski. ACM Transactions on Graphics 24(3), July 2005.

 
On-line documents:
Complete article (PDF, 1.3MB)
Project web page

Learning Physics-based Motion Style with Nonlinear Inverse Optimization

Abstract:
This paper presents a novel physics-based representation of realistic character motion. The dynamical model incorporates several factors of locomotion derived from the biomechanical literature, including relative preferences for using some muscles more than others, elastic mechanisms at joints due to the mechanical properties of tendons, ligaments, and muscles, and variable stiffness at joints depending on the task. When used in a spacetime optimization framework, the parameters of this model define a wide range of styles of natural human movement.

Due to the complexity of biological motion, these style parameters are too difficult to design by hand. To address this, we introduce Nonlinear Inverse Optimization, a novel algorithm for estimating optimization parameters from motion capture data. Our method can extract the physical parameters from a single short motion sequence. Once captured, this representation of style is extremely flexible: motions can be generated in the same style but performing different tasks, and styles may be edited to change the physical properties of the body.

 
Citation:
C. Karen Liu, Aaron Hertzmann, Zoran Popović. ACM Transactions on Graphics 24(3), July 2005.

 
On-line documents:
Complete article (PDF, 954KB)
Project web page

Panoramic Video Textures

Abstract:
This paper describes a mostly automatic method for taking the output of a single panning video camera and creating a panoramic video texture (PVT): a video that has been stitched into a single, wide field of view and that appears to play continuously and indefinitely. The key problem in creating a PVT is that although only a portion of the scene has been imaged at any given time, the output must simultaneously portray motion throughout the scene. Like previous work in video textures, our method employs min-cut optimization to select fragments of video that can be stitched together both spatially and temporally. However, it differs from earlier work in that the optimization must take place over a much larger set of data. Thus, to create PVTs, we introduce a dynamic programming step, followed by a novel hierarchical min-cut optimization algorithm. We also use gradient-domain compositing to further smooth boundaries between video fragments. We demonstrate our results with an interactive viewer in which users can interactively pan and zoom on high-resolution PVTs.

 
Citation:
Aseem Agarwala, Ke Colin Zheng, Chris Pal, Maneesh Agrawala, Michael Cohen, Brian Curless, David H. Salesin, Richard Szeliski. ACM Transactions on Graphics 24(3), July 2005.

 
On-line documents:
Complete article (PDF, 954KB)
Project web page

Physically Based Rigging for Deformable Characters

Abstract:
In this paper we introduce a framework for instrumenting ("rigging") characters that are modeled as dynamic elastic bodies, so that their shapes can be controlled by an animator. Because the shape of such a character is determined by physical dynamics, the rigging system cannot simply dictate the shape as in traditional animation. For this reason, we introduce forces as the building blocks of rigging. Rigging forces guide the shape of the character, but are combined with other forces during simulation. Forces have other desirable features: they can be combined easily and simulated at any resolution, and since they are not tightly coupled with the surface geometry, they can be more easily transferred from one model to another. Our framework includes a new pose-dependent linearization scheme for elastic dynamics, which ensures a correspondence between forces and deformations, and at the same time produces plausible results at interactive speeds. We also introduce a novel method of handling collisions around creases.

 
Citation:
Steve Capell, Matthew Burkhart, Brian Curless, Tom Duchamp, and Zoran Popović. Proceedings of ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2005.
Extended version: Steve Capell, Matthew Burkhart, Brian Curless, Tom Duchamp, and Zoran Popović. Graphical Models, vol. 69, p. 71-87, 2007.

 
On-line documents:
Complete article (PDF, 4MB)
Project web page
If you would like an electronic copy of the extended version for non-commercial research and educational use only, please email Steve Capell (see the Grail people page).

Interactive, Image-Based Exploded View Diagrams

Abstract:
We present a system for creating interactive exploded view diagrams using 2D images as input. This imagebased approach enables us to directly support arbitrary rendering styles, eliminates the need for building 3D models, and allows us to leverage the abundance of existing static diagrams of complex objects.We have developed a set of semi-automatic authoring tools for quickly creating layered diagrams that allow the user to specify how the parts of an object expand, collapse, and occlude one another.We also present a viewing system that lets users dynamicallylter the information presented in the diagram by directly expanding and collapsing the exploded view and searching for individual parts. Our results demonstrate that a simple 2.5D diagram representation is powerful enough to enable a useful set of interactions and that, with the right authoring tools, effective interactive diagrams in this format can be created from existing static illustrations with a small amount of effort.

 
Citation:
Wilmot Li, Maneesh Agrawala, David H. Salesin. Interactive Image-Based Exploded View Diagrams, Graphics Interface 2004, May 2004.

 
On-line documents:
PDF
Project Web Page

Example-Based Stereo with General BRDFs

Abstract:
This paper presents an algorithm for voxel-based reconstruction of objects with general reflectance properties from multiple calibrated views. It is assumed that one or more reference objects with known geometry are imaged under the same lighting and camera conditions as the object being reconstructed. The unknown object is reconstructed using a radiance basis inferred from the reference objects. Each view may have arbitrary, unknown distant lighting. If the lighting is calibrated, our model also takes into account shadows that the object casts upon itself. To our knowledge, this is the first stereo method to handle general, unknown, spatially-varying BRDFs under possibly varying, distant lighting, and shadows. We demonstrate our algorithm by recovering geometry and surface normals for objects with both uniform and spatially-varying BRDFs. The normals reveal fine-scale surface detail, allowing much richer renderings than the voxel geometry alone.

 
Citation:
Treuille, Adrien, Hertzmann, Aaron, Seitz, Steven M. Example-Based Stereo with General BRDFs, 8th European Conference on Computer Vision (ECCV 2004), Prague, Czech Republic, May 2004.

 
On-line documents:
PDF

Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops

Abstract:
This paper presents an approach for tracking paper documents on the desk over time and automatically linking them to the corresponding electronic documents using an overhead video camera. We demonstrate our system in the context of two scenarios, paper tracking and photo sorting. In the paper tracking scenario, the system tracks changes in the stacks of printed documents and books on the desk and builds a complete representation of the spatial structure of the desktop. When users want to nd a printed document buried in the stacks, they can query the system based on appearance, keywords, or access time. The system also provides a remote desktop interface for directly browsing the physical desktop from a remote location. In the photo sorting scenario, users sort printed photographs into physical stacks on the desk. The system automatically recognizes the photographs and organizes the corresponding digital photographs into separate folders according to the physical arrangement. Our framework provides a way to unify the physical and electronic desktops without the need for a specialized physical infrastructure except for a video camera.

 
Citation:
Kim, Jiwon, Seitz, Steven M. and Agrawala, Maneesh. Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops, UIST 2004, Santa Fe, New Mexico, USA, October 2004.

 
On-line documents:
PDF
Project Page

Momentum-based Parameterization of Dynamic Character Motion

Abstract:
This paper presents a system for rapid editing of highly dynamic motion capture data. The heart of this system is an optimization algorithm that can transform the captured motion so that it satisfies high-level user constraints while enforcing that the linear and angular momentum of the motion remain physically plausible. Unlike most previous approaches to motion editing, our algorithm does not require pose specification or model reduction, and the user only need specify high-level changes to the input motion. To preserve the similar dynamic behavior of the input motion, we introduce a spline-based parameterization that matches the linear and angular momentum pattern of the motion capture data. Because our algorithm enables rapid convergence by presenting a good initial state of the optimization, the user can efficiently generate a large family of realistic motions from a single input motion. The algorithm can then populate the dynamic space of motions by simple interpolation, effectively parameterizing the space of realistic motions. We show how this framework can be used to produce an effective interface for rapid creation of dynamic animations, as well as to drive the dynamic motion of a character in real-time.

 
Citation:
Abe, Y., Liu, C. K., Popović, Z.. Momentum-based Parameterization of Dynamic Character Motion, ACM SIGGRAPH / Eurographics Symposium on Computer Animation, August 2004.

 
On-line documents:
PDF
Project page

Flow-based Video Synthesis and Editing

Abstract:
This paper presents a novel algorithm for synthesizing and editing video of natural phenomena that exhibit continuous flow patterns. The algorithm analyzes the motion of textured particles in the input video along user-specified flow lines, and synthesizes seamless video of arbitrary length by enforcing temporal continuity along a second set of user-specified flow lines. The algorithm is simple to implement and use. We used this technique to edit video of waterfalls, rivers, flames, and smoke.

 
Citation:
Bhat, Kiran S., Seitz, Steven M., Hodgins, Jessica K., Khosla, Pradeep K.. Flow-based Video Synthesis and Editing, ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (2.6MB)
Project page

Video Tooning

Abstract:
We describe a system for transforming an input video into a highly abstracted, spatio-temporally coherent cartoon animation with a range of styles. To achieve this, we treat video as a space-time volume of image data. We have developed an anisotropic kernel mean shift technique to segment the video data into contiguous volumes. These provide a simple cartoon style in themselves, but more importantly provide the capability to semi-automatically rotoscope semantically meaningful regions.

In our system, the user simply outlines objects on keyframes. A mean shift guided interpolation algorithm is then employed to create three dimensional semantic regions by interpolation between the keyframes, while maintaining smooth trajectories along the time dimension. These regions provide the basis for creating smooth two dimensional edge sheets and stroke sheets embedded within the spatio-temporal video volume. The regions, edge sheets, and stroke sheets are rendered by slicing them at particular times. A variety of styles of rendering are shown. The temporal coherence provided by the smoothed semantic regions and sheets results in a temporally consistent non-photorealistic appearance.

 
Citation:
Wang, Jue, Xu, Yingqing, Shum, Heung-Yeung, Cohen, Michael F. Video Tooning, ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (4.0MB)

Spacetime Faces: High-Resolution Capture for Modeling and Animation

Abstract:
We present an end-to-end system that goes from video sequences to high resolution, editable, dynamically controllable face models. The capture system employs synchronized video cameras and structured light projectors to record videos of a moving face from multiple viewpoints. A novel spacetime stereo algorithm is introduced to compute depth maps accurately and overcome over-fitting deficiencies in prior work. A new template fitting and tracking procedure fills in missing data and yields point correspondence across the entire sequence without using markers. We demonstrate a data-driven, interactive method for inverse kinematics that draws on the large set of fitted templates and allows for posing new expressions by dragging surface points directly. Finally, we describe new tools that model the dynamics in the input sequence to enable new animations, created via key-framing or texture-synthesis techniques.

 
Citation:
Zhang, Li, Snavely, Noah, Curless, Brian, Seitz, Steven M.. Spacetime Faces: High-Resolution Capture for Modeling and Animation. ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (10.3MB)
Project page

Fluid Control using the Adjoint Method

Abstract:
We describe a novel method for controlling physics-based fluid simulations through gradient-based nonlinear optimization. Using a technique known as the adjoint method, derivatives can be computed efficiently, even for large 3D simulations with millions of control parameters. In addition, we introduce the first method for the full control of free-surface liquids. We show how to compute adjoint derivatives through each step of the simulation, including the fast marching algorithm, and describe a new set of control parameters specifically designed for liquids.

 
Citation:
McNamara, Antoine, Treuille, Adrien, Popović, Zoran, Stam, Jos. Fluid Control using the Adjoint Method, ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (4.0MB)
Project page

Interactive Digital Photomontage

Abstract:
We describe an interactive, computer-assisted framework for combining parts of a set of photographs into a single composite picture, a process we call "digital photomontage." Our framework makes use of two techniques primarily: graph-cut optimization, to choose good seams within the constituent images so that they can be combined as seamlessly as possible; and gradient-domain fusion, a process based on Poisson equations, to further reduce any remaining visible artifacts in the composite. Also central to the framework is a suite of interactive tools that allow the user to specify a variety of high-level image objectives, either globally across the image, or locally through a painting-style interface. Image objectives are applied independently at each pixel location and generally involve a function of the pixel values (such as "maximum contrast") drawn from that same location in the set of source images. Typically, a user applies a series of image objectives iteratively in order to create a finished composite. The power of this framework lies in its generality; we show how it can be used for a wide variety of applications, including "selective composites" (for instance, group photos in which everyone looks their best), relighting, extended depth of field, panoramic stitching, clean-plate production, stroboscopic visualization of movement, and time-lapse mosaics.

 
Citation:
Agarwala, Aseem, Dontcheva, Mira, Agrawala, Maneesh, Drucker, Steven, Colburn, Alex, Curless, Brian, Salesin, David H., Cohen, Michael. Interactive Digital Photomontage, ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (6.0MB)
Project page

Keyframe-Based Tracking for Rotoscoping and Animation

Abstract:
We describe a new approach to rotoscoping --- the process of tracking contours in a video sequence --- that combines computer vision with user interaction. In order to track contours in video, the user specifies curves in two or more frames; these curves are used as keyframes by a computer-vision-based tracking algorithm. The user may interactively refine the curves and then restart the tracking algorithm. Combining computer vision with user interaction allows our system to track any sequence with significantly less effort than interpolation-based systems --- and with better reliability than  pure  computer vision systems. Our tracking algorithm is cast as a spacetime optimization problem that solves for time-varying curve shapes based on an input video sequence and user-specified constraints. We demonstrate our system with several rotoscoped examples. Additionally, we show how these rotoscoped contours can be used to help create cartoon animation by attaching user-drawn strokes to the tracked contours.

 
Citation:
Agarwala, Aseem, Hertzmann, Aaron, Salesin, David H., Seitz, Steven. Keyframe-Based Tracking for Rotoscoping and Animation, ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (2.4MB)
Project page

Style-based Inverse Kinematics

Abstract:
We present an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in realtime. Training the model on different input data leads to different styles of IK. The model is represented as a probability distribution over the space of all possible poses. This means that our IK system can generate any pose, but prefers poses that are most similar to the space of poses in the training data. We represent the probability with a novel model called a Scaled Gaussian Process Latent Variable Model. The parameters of the model are all learned automatically; no manual tuning is required for the learning component of the system. We additionally describe a novel procedure for interpolating between styles.

Our style-based IK can replace conventional IK, wherever it is used in computer animation and computer vision. We demonstrate our system in the context of a number of applications: interactive character posing, trajectory keyframing, real-time motion capture with missing markers, and posing from a 2D image.

 
Citation:
Grochow, Keith, Martin, Steven L., Hertzmann, Aaron, and Popović, Zoran. Style-based Inverse Kinematics, ACM Transactions on Graphics 23(3), July 2004.

 
On-line documents:
PDF (1.4MB)
Project page

On Creating Animated Presentations

Abstract:
Computers are used to display visuals for millions of live presentations each day, and yet only the tiniest fraction of these make any real use of the powerful graphics hardware available on virtually all of today s machines. In this paper, we describe our efforts toward harnessing this power to create better types of presentations: presentations that include meaningful animation as well as at least a limited degree of interactivity. Our approach has been iterative, alternating between creating animated talks using available tools, then improving the tools to better support the kinds of talk we wanted to make. Through this cyclic design process, we have identified a set of common authoring paradigms that we believe a system for building animated presentations should support. We describe these paradigms and present the latest version of our script-based system for creating animated presentations, called SLITHY. We show several examples of actual animated talks that were created and given with versions of SLITHY, including one talk presented at SIGGRAPH 2000 and four talks presented at SIGGRAPH 2002. Finally, we describe a set of design principles that we have found useful for making good use of animation in presentation.

 
Citation:
Zongker, Douglas E. and Salesin, David H.. On Creating Animated Presentations, Eurographics / ACM SIGGRAPH Symposium on Computer Animation, July 2003.

 
On-line documents:
PDF (1.0MB)

Adaptive Grid-Based Document Layout

Abstract:
Grid-based page designs are ubiquitous in commercially printed publications, such as newspapers and magazines. Yet, to date, no one has invented a good way to easily and automatically adapt such designs to arbitrarily-sized electronic displays. The difficult of generalizing grid-based designs explains the generally inferior nature of on-screen layouts when compared to their printed counterparts, and is arguably one of the greatest remaining impediments to creating on-line reading experiences that rival those of ink on paper. In this work, we present a new approach to adaptive grid-based document layout, which attempts to bridge this gap. In our approach, an adaptive layout style is encoded as a set of grid-based templates that know how to adapt to a range of page sizes and other viewing conditions. These templates include various types of layout elements (such as text, figures, etc.) and define, through constraint-based relationships, just how these elements are to be laid out together as a function of both the properties of the content itself, such as a figure's size and aspect ratio, and the properties of the viewing conditions under which the content is being displayed. We describe an XML-based representation for our templates and content, which maintains a clean separation between the two. We also describe the various parts of our research prototype system: a layout engine for formatting the page; a paginator for determining a globally optimal allocation of content amongst the pages; and a graphical user interface for interactively creating adaptive templates. We also provide numerous examples demonstrating the capabilities of this prototype, including this paper, itself, which has been laid out with our system.

 
Citation:
Jacobs, C., Li, W., Schrier, E., Bargeron, D., and Salesin, D.. Adaptive Grid-Based Document Layout, ACM Transactions on Graphics 22(3) (Proceedings of ACM SIGGRAPH 2003), July 2003, pp. 838-847.

 
On-line documents:
PDF (8.6MB)

Shape and Motion under Varying Illumination: Unifying Structure from Motion, Photometric Stereo, and Multi-view Stereo

Abstract:
This paper presents an algorithm for computing optical flow, shape, motion, lighting, and albedo from an image sequence of a rigidly-moving Lambertian object under distant illumination. The problem is formulated in a manner that subsumes structure from motion, multi-view stereo, and photometric stereo as special cases. The algorithm utilizes both spatial and temporal intensity variation as cues: the former constrains flow and the latter constrains surface orientation; combining both cues enables dense reconstruction of both textured and texture-less surfaces. The algorithm works by iteratively estimating affine camera parameters, illumination, shape, and albedo in an alternating fashion. Results are demonstrated on videos of hand-held objects moving in front of a fixed light and camera.

 
Citation:
Zhang, L., Curless, B., Hertzmann, A. and Seitz, Steven M.. Shape and Motion under Varying Illumination: Unifying Structure from Motion, Photometric Stereo, and Multi-view Stereo, Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV), Nice France, October 2003.

 
On-line documents:
PDF
Project Web Page