The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precision-recall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are (1) that cue combination can be performed adequately with a simple linear model, and (2) that a proper treatment of texture is required to detect boundaries in natural images.
Figure 1: Two decades of boundary detection. The performance of our boundary detector compared to classical boundary detection methods and to the human subjects' performance. A precision-recall curve is shown for five boundary detectors: (1) Gaussian derivative (GD); (2) Gaussian derivative with hysteresis thresholding (GD+H), the Canny detector; (3) A detector based on the second moment matrix (2MM); (4) our grayscale detector that combines brightness and texture (BG+TG); and (5) our color detector that combines brightness, color, and texture (BG+CG+TG). Each detector is represented by its precision-recall curve, which measures the tradeoff between accuracy and noise as the detector's threshold varies. Shown in the caption is each curve's F-measure, valued from zero to one. The F-measure is a summary statistic for a precision-recall curve. The points on the plot show the precision and recall of each ground truth human segmentation when compared to the other humans. The median F measure for the human subjects is 0.80. The solid curve shows the F=0.80 curve, representing the frontier of human performance for this task.
Figure 2: Local image features. In each row, the first panel shows an image patch. The following panels show feature profiles along the patch's horizontal diameter. The features are raw image intensity, brightness gradient BG, color gradient CG, raw texture gradient TG, and localized texture gradient TG. The vertical red line in each profile marks the patch center. The scale of each feature has been chosen to maximize performance on the set of training images--2% of the image diagonal (5.7 pixels) for CG and TG, and 1% of the image diagonal (3 pixels) for BG. The challenge is to combine these features in order to detect and localize boundaries.
Figure 3: Boundary images for three grayscale detectors. Columns 2-4 show P_b images for the Canny Detector, the second moment matrix (2MM), and our brightness+texture detector (BG+TG). The human segmentations are shown for comparison. The BG+TG detector benefits from operating at a large scale without sacrificing localization and the suppression of edges on the interior of textured regions.