CSE P576

Attendance via livestream or watching posted lecture recordings [link] is possible, however these are the less-preferred modalities.

Instructors

Course Description

A masters course in computer vision, emphasizing fundamentals of geometry and image formation as well as deep learning and image understanding.

Projects

Books

Resources

Course Overview

Date	Lecture	Description	Notes and Resources
9/29	Introduction		[CVA2] Ch.1
	Image Formation	Geometric and Photometric Image Formation, Pinhole Camera, Lenses, Sensors, Colour, Gamma, DCT, Image Coding	[CVA2] Ch.2
10/6	Filtering and Pyramids	Linear + Non-Linear Filtering, Correlation, Convolution, Gaussian + Laplacian Pyramids, Sampling and Aliasing	[CVA2] Ch. 3.2, 3.5
	Features and Matching	Detection, Correspondence, Edges, Corners, Regions, Patch Matching, SIFT, Shape Context, Learning Features	[CVA2] Ch. 7 Project 1 start
10/13	Planar Geometry	2D Transforms: Euclidean, Similarity, Affine, Projective, Camera Models: Perspective, Projective, Linear, Viewing planes, Lines and Camera Rotation	[CVA2] Ch. 3.6
	RANSAC	Least Squares 2-view Alignment, Outliers, Robust Line Fitting, RANSAC, Minimal Subsets	[CVA2] Ch. 8.1, 8.2
10/20	Epipolar Geometry	Epipolar Lines, Plane Constraint, Fundamental/Essential Matrix, 8 point algorithm, Triangulation, 2-view SFM	Project 2 start [CVA2] Ch. 11.3
	Multiview Alignment and SFM	Multiview Alignment, Residuals, Error Function, Structure from Motion, Bundle Adjustment, Pose Estimation, Triangulation	[CVA2] Ch. 8.3, 8.4, 11.4 [Panorama stitching by Brown & Lowe] [ORB-SLAM by Mur-Artal et al.]
10/25	Project 1 due
10/27	Stereo	Stereo matching, local + global, multiview stereo, plane sweep, volumetric, depth map merging, photometric stereo	[CVA2] Ch. 12
	Depth + Flow	Depth imaging + fusion, signed distance functions, non-rigid matching, optical flow, Lucas Kanade algorithm	[CVA2] Ch. 13.[1,2,3,5], Ch. 9.1; PlaneSweep ipynb, LucasKanade ipynb. Notebooks by Steven Lovegrove, Richard Newcombe
11/3	Linear Classification	Visual classification intro, object recognition, instance, category, classification vs detection, linear classification, 2-class, N-class, linear and softmax regression	[CVA2] Ch. 6.1, 6.2; [ESL] Ch. 2.3 Project 3 start
	Visual Classification 2	Fundamentals and Pre-Deep Learning Classification, Bayesian classifiers, Gaussian distributions, PCA, LDA, Decision Forests, Visual words, SVMs	[DL] Ch. 5
11/8	Project 2 due
11/10	Neural Networks	Feature extraction, end to end learning, multiple linear layers, activation functions, biological neurons, space warping, universal approximation, convex optimization	[CVA2] Ch. 5.3, 5.4.0, 5.4.1; [DL] Ch. 6; [Slides for Week 7 by Justin Johnson]
	Backpropagation	Chain rule, computational gradients, forward/reverse mode autodiff, upstream/local gradients, flat backprop, modular design, scalar/vector/tensor backprop, matrix multiplication example
	Convolutional Networks	Convolutional layers, activation maps, dimension mappings, receptive fields, strides, pooling, LeNet5 example
11/17	Advanced CNNs	CNN building blocks, dropout, batch norm, factorized convolutions, residual connections, popular architectures: AlexNet, VGG, GoogLeNet, Resnet, MobileNet, SE-Net	Project 4 start: assignment PDF and starter code
	Object Detection	Motivation + applications, sliding windows, anchor based detection, single-stage and two-stage architectures, evaluation metrics, IoU, precision-recall, mAP, practical tips	[CVA2] Ch. 6.3 [Slides for Week 8 by Jonathan Huang]
11/22	Project 3 due
11/24	NO CLASS
12/1	Tracking, Part 1	Motivation, probabilistic formulation, linear dynamical systems, multiple-hypothesis tracking (MHT), Bayesian filtering with CONDENSATION and RJ MCMC	[Course on Tracking at Linköping University] [Course on SLAM and Tracking at University of Freiburg]
	Tracking, Part 2	Case studies: tracking as online learning, correlation filters, tracking with Siamese networks, graph-theoretic formulations
12/8	Vision and Language	"Visual Tracking and Retrieval by Natural Language Descriptions" Guest lecture by Qi (Fred) Feng, Boston University	[CVA2] Ch. 6.6 [Vision & Language] [Real-time Tracking with NL] [Siamese Natural Language Tracker]
	Deep Learning in 3D	Single-view, 2-view, multi-view depth, deep learning with points, meshes, voxels, SDFs, neural scene representation and rendering
	Project 4 due		[Buried in Syllabus, Prize Remains Unfound]

Computer Vision

CSE P576 // Autumn 2021

Meeting Information