The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precision-recall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are (1) that cue combination can be performed adequately with a simple linear model, and (2) that a proper treatment of texture is required to detect boundaries in natural images.

Figure 1: Two decades of boundary detection. The performance of our
boundary detector compared to classical boundary detection methods and
to the human subjects' performance. A precision-recall curve is shown
for five boundary detectors: (1) Gaussian derivative (GD); (2)
Gaussian derivative with hysteresis thresholding (GD+H), the Canny
detector; (3) A detector based on the second moment matrix (2MM); (4)
our grayscale detector that combines brightness and texture (BG+TG);
and (5) our color detector that combines brightness, color, and
texture (BG+CG+TG). Each detector is represented by its
precision-recall curve, which measures the tradeoff between
accuracy and noise as the detector's threshold varies. Shown in the
caption is each curve's F-measure, valued from zero to one. The
F-measure is a summary statistic for a precision-recall curve. The
points on the plot show the precision and recall of each ground truth
human segmentation when compared to the other humans. The median F
measure for the human subjects is 0.80. The solid curve shows the
F=0.80 curve, representing the frontier of human performance for this
task.

Figure 2: Local image features. In each row, the first panel shows an
image patch. The following panels show feature profiles along the
patch's horizontal diameter. The features are raw image intensity, brightness gradient BG,
color gradient CG, raw texture gradient TG, and
localized texture gradient TG. The vertical red line in
each profile marks the patch center. The scale of each feature has
been chosen to maximize performance on the set of training images--2%
of the image diagonal (5.7 pixels) for CG and TG, and 1% of the
image diagonal (3 pixels) for BG. The challenge is to combine these
features in order to detect and localize boundaries.

Figure 3: Boundary images for three grayscale detectors. Columns 2-4 show
P_b images for the Canny Detector, the second moment matrix (2MM), and our
brightness+texture detector (BG+TG). The human segmentations are
shown for comparison. The BG+TG detector benefits from operating at a
large scale without sacrificing localization and the suppression of
edges on the interior of textured regions.