How to Create Tfrecords from Partial Pascal VOC XML Annotation Format for Object Detection

Maryam Bibi
2 min readMay 19, 2021
Pascal VOC XML Annotation Format

Difference between object detection annotation formats

Difference Between COCO and Pascal VOC Data Format.

Commonly, you see your dataset annotated in one of the above formats. In that case, it’s pretty straightforward to convert your annotations into tfrecords. For converting Pascal VOC annotations into tfrecords, you can use this script, and for COCO format annotations, use this.

Partial Pascal VOC XML Annotation

In my case, I encountered a dataset that is annotated in partial Pascal VOC format. Here’s what I mean by partial Pascal VOC format annotations.

As we see, if our annotations are in XML file format, then bounding box values are suppose to be like (x-top-left, y-top-left, x-bottom-right, y-bottom-right). But in my case, I got in COCO format.

Dataset 1: Annotation file for one image.
Dataset 2: One annotation file for whole training dataset.

Partial Pascal VOC Format to Pascal VOC Format

For converting these XML labels, I reused the Lyudmil Vladimirov script and made the following changes accordingly in the create_tf_example function to make it suitable for Pascal VOC format.

for index, row in group.rectangle.iterrows():
xmin.append(row['X']/imgwidth)
xmax.append((row['X']+row['Width'])/imgwidth)
ymin.append(row['Y']/imgheight)
ymax.append((row['Y']+row['Height'])/imgheight)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))

In the original script, direct normalization was applied to bounding boxes.

for index, row in group.object.iterrows(): 
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))

Summary

If you have Pascal VOC annotations with COCO bounding boxes, add the given Width value to X and Height value to Y to attain xmax and ymax values. In this way, you will get [xmin, xmax, ymin, ymax] from [X, Y, Width, Height].

Feel free to comment if you have any query.

Thanks!

--

--