In previous section:
This section: multiple views
Structure and depth are inherently ambiguous from single views
Structure and depth are inherently ambiguous from single views
What cues help us to perceive 3D shape and depth??
Images from same point of view, different camera parameters
3D shape / depth estimates
If stereo were critical for depth perception, navigation, recognition, etc., then this would be a problem
Structure: given projections of the same 3D point in two or more images, compute the 3D coordinates of that point
Stereo correspondence: given a point in one of the images, where could its corresponding points be in the other images?
Motion: given a set of corresponding points in two or more images, compute the camera parameters
Rough analogy with human visual system:
Pupil/iris: control amount of light passing through lens
Retina: contains sensor cells, where image is formed
Fovea: highest concentration of cones
Human eyes fixate on point in space: rotate so that corresponding images form in centers of fovea
Disparity occurs when eyes fixate on one object; others appear at different visual angles
Béla Julesz 1960: Do we identify local brightness patterns before fusion (monocular process) or after (binocular)?
To test, pair of synthetic images obtained by randomly spraying block dots on white objects
Take two pictures of the same subject from two slightly different viewpoints and display so that each eye sees only one of the images
Invented by Sir Charles Wheatstone, 1838
Autostereograms exploit disparity as depth cue using single image. (single image random dot stereogram, single image stereogram)
Stereo: shape from "motion" between two views
We'll need to consider:
Extrinsic parameters: camera frame 1 \(\leftrightarrow\) camera frame 2
Intrinsic parameters: image coordinates relative to camera \(\leftrightarrow\) pixel coordinates
We'll assume for now that these parameters are given and fixed
Assume parallel optical axes, known camera parameters (i.e., calibrated cameras)
Assume parallel optical axes, known camera parameters (i.e., calibrated cameras)
What is expression for \(Z\)?
Similar triangles \((p_l, p, p_r)\) and \((o_l, p, o_r)\): \[\frac{T + x_l - x_r}{Z - f} = \frac{T}{Z}\] \[Z = f \frac{T}{x_r - x_l}\] Disparity: \(x_r-x_l\) |
So if we could find the corresponding points in two images, we could estimate relative depth...
\[(x',y') = (x+D(x,y), y)\]