Lubomir Bourdev and Jitendra Malik
The Human Annotation Tool is a tool that allows one to annotate people - where their arms and legs are, what their 3D pose is, which body parts are occluded, etc. A database of annotated people would be invaluable for creating computer vision algorithms to detect and localize people.
You may run a copy of the tool by clicking on the above image and agreeing to all the disclaimers. You need to have Java and a reasonably good graphics card. The tool is tested on Mac OS X and on Windows.
The tool supports two kinds of annotations - labeling joints and extracting the 3D pose, and labeling the regions of the body (hair, face, upper clothes, etc).
Here is a video tutorial that shows you how to create the 3D pose:

The picture above shows all the controls. To annotate, please follow these steps.
To jump to a given annotation, put its index in the "Current Entry" box. Use the arrows to go to an image containing a person that is not annotated. Pan and zoom to the person. Press the New Entry button and choose Male, Female or Child.
Move the mouse over the location of each keypoint and press the corresponding key, indicated with a picture of the body part. The right shoulder, elbow and wrist correspond to keys Q, A, and Z, and the left ones - to W, S and X. The right hip, knee and ankle correspond to E, D, and C, and the left ones - to R, F and V. Select the ears with T and U, the eyes with G and H, and the nose with Y. You may pick and drag keypoints to adjust their locations.
If a keypoint is occluded or falls outside the image but you have a rough guess where it should be, mark it as best as you can. Leave keypoints unmarked if you have no idea where they should lie. Both shoulders, however, must be specified.
Shoulders, Elbows, Wrists, Hips, Knees, Ankles: Approximate limbs as cylinders. The joint location in 3D is the intersection of the axes of adjoining cylinders.
Left vs Right: The keypoint is labelled as left or right from the point of view of the labelled person, not based its location in the image. For example, if the person is facing the camera his or her left keypoints lie on the right in image space
Nose Tip: The location is the tip of the nose, regardless of frontal or profile view.
![]()
Eyes: In frontal view it is the midpoint of the two eye corners. The eye location does not depend on the pupils.
In profile view it is the tip of the eye surface.
Even if the eye is closed, we estimate the tip of the eye surface, ignoring the eyelids.
Ears: The tip of the tragus (the small pointed eminence of the external ear).
. ![]()
When you hover with the mouse over a keypoint, use the red keys to specify keypoint properties or to delete it.
Press N to mark a keypoint as occluded. Occluded keypoints are shown in green. The general rule is that a keypoint is visible if and only if the ray from the keypoint location reaches the camera, with the following exceptions:
Shoulders, Elbows, Wrists, Hips, Knees, Feet: The clothes corresponding to the area of the joint do not hide the joint.
The joint may, however, be hidden by the torso or limbs of the same
person. For example, in right profile view, the left joints of the body
are often occluded by the torso and/or by the right limbs.
Most keypoints have a reference keypoint and, for each keypoint, we need to specify if it is closer to the camera, roughly equidistant, or further away than its reference keypoint. By default keypoints are considered equidistant. When you hover over a keypoint, you will see a segment connecting it to its reference keypoint. Use the B key to toggle between the three depth states of the keypoint. When a keypoint is marked Far (or Near), it is shown smaller (or larger) than usual. The segment to its reference keypoint also changes.
It is a good idea to press the Save button to record all of your changes. It is now time to adjust the keypoints to approximate the correct 3D pose of the person. Orbit the right window using the left mouse button to see the person from another viewpoint. The 3D pose is usually far from correct initially. You may need to adjust a few keypoints manually and see how that affects the 3D pose. A few tips:
Be sure to press the Save button before going to the next annotation or you will lose all of your changes!
A pose labelling is acceptable if:
Some images have associated precomputed segmentations. When a segmentation is available, you can switch to region labelling mode using the "Pose3D / Segmentation" radio button.

In this view the left window remains the image itself, while the right window shows the image broken down ito segments and the associated labels for each segment. You can pan and zoom the image from either window. The segment under the cursor is hilighted in red, as shown in the picture. Furthermore, its label is shown in the info bar at the bottom - in the above example the cursor is over a region labelled "LowerClothes". The labeled region selected by this segment is shown in the left image (not displayed in this example). Some files contain thousands of small segments and it would be too time consuming to label each of them. The HAT tool allows us to hierarchically merge segments into fewer large ones and label them at once. You can use the up/down arrow keys to subdivide or merge segments. Holding down the Shift button while pressing up/down arrow makes larger segmentation steps. Here is the above example at three different segmentation levels:

A preferred way to label images containing thousands of small segments is to start labelling at low subdivision levels and then increase the subdivision and refine the labelled regions.
Segments are labelled by pressing a key while the mouse is over the segment. Most of the keys are shown on the picture to the right and are as follows:
The following keys are not shown in the illustration on the right:
Other keys:
Note the following:
By default the tool uses demo images and cannot save them. You can also download the H3D dataset and use the tool to browse it. You need to place H3D in c:\hat on Windows or /hat on the Mac. In particular C:\hat\data\people\image_list.txt (or /hat/data/people/image_list.txt on the Mac) must be a valid file. If you have placed H3D correctly and you restart the annotation tool, it will recognize H3D and allow you to browse its images.
The images used in HAT are taken from Flickr under the Creative Commons Attribution license. It allows for redistribution and derivative work for non-commercial or commercial purposes as long as the authors are attributed accordingly. Please see the license for more detail.
If you find bugs or have suggestions on how to improve the tool, please email me at lbourdev-at-eecs.berkeley.edu. Your feedback is much appreciated!
Camillo J. Taylor. "Reconstruction of Articulated Objects from Point Correspondences in a Single Image" : Computer Vision and Image Understanding, Volume 80, No. 3 pp 349-363 Dec. 2000