

In previous section:
This section: multiple views


Structure and depth are inherently ambiguous from single views


Structure and depth are inherently ambiguous from single views

What cues help us to perceive 3D shape and depth??

Images from same point of view, different camera parameters

3D shape / depth estimates









If stereo were critical for depth perception, navigation, recognition, etc., then this would be a problem

Structure: given projections of the same 3D point in two or more images, compute the 3D coordinates of that point

Stereo correspondence: given a point in one of the images, where could its corresponding points be in the other images?

Motion: given a set of corresponding points in two or more images, compute the camera parameters

Rough analogy with human visual system:

Pupil/iris: control amount of light passing through lens
Retina: contains sensor cells, where image is formed
Fovea: highest concentration of cones
Human eyes fixate on point in space: rotate so that corresponding images form in centers of fovea

Disparity occurs when eyes fixate on one object; others appear at different visual angles


Béla Julesz 1960: Do we identify local brightness patterns before fusion (monocular process) or after (binocular)?
To test, pair of synthetic images obtained by randomly spraying block dots on white objects


Take two pictures of the same subject from two slightly different viewpoints and display so that each eye sees only one of the images
Invented by Sir Charles Wheatstone, 1838








Autostereograms exploit disparity as depth cue using single image. (single image random dot stereogram, single image stereogram)

Stereo: shape from "motion" between two views
We'll need to consider:








Extrinsic parameters: camera frame 1 \(\leftrightarrow\) camera frame 2
Intrinsic parameters: image coordinates relative to camera \(\leftrightarrow\) pixel coordinates
We'll assume for now that these parameters are given and fixed
Assume parallel optical axes, known camera parameters (i.e., calibrated cameras)

Assume parallel optical axes, known camera parameters (i.e., calibrated cameras)
What is expression for \(Z\)?
![]() |
Similar triangles \((p_l, p, p_r)\) and \((o_l, p, o_r)\): \[\frac{T + x_l - x_r}{Z - f} = \frac{T}{Z}\] \[Z = f \frac{T}{x_r - x_l}\] Disparity: \(x_r-x_l\) |
So if we could find the corresponding points in two images, we could estimate relative depth...



\[(x',y') = (x+D(x,y), y)\]

