How to Create Tfrecords from Partial Pascal VOC XML Annotation Format for Object Detection

Difference between object detection annotation formats

Commonly, you see your dataset annotated in one of the above formats. In that case, it’s pretty straightforward to convert your annotations into tfrecords. For converting Pascal VOC annotations into tfrecords, you can use this script, and for COCO format annotations, use this.
Partial Pascal VOC XML Annotation
In my case, I encountered a dataset that is annotated in partial Pascal VOC format. Here’s what I mean by partial Pascal VOC format annotations.

As we see, if our annotations are in XML file format, then bounding box values are suppose to be like (x-top-left, y-top-left, x-bottom-right, y-bottom-right). But in my case, I got in COCO format.


Partial Pascal VOC Format to Pascal VOC Format
For converting these XML labels, I reused the Lyudmil Vladimirov script and made the following changes accordingly in the create_tf_example function to make it suitable for Pascal VOC format.
for index, row in group.rectangle.iterrows():
xmin.append(row['X']/imgwidth)
xmax.append((row['X']+row['Width'])/imgwidth)
ymin.append(row['Y']/imgheight)
ymax.append((row['Y']+row['Height'])/imgheight)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
In the original script, direct normalization was applied to bounding boxes.
for index, row in group.object.iterrows():
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
Summary
If you have Pascal VOC annotations with COCO bounding boxes, add the given Width value to X and Height value to Y to attain xmax and ymax values. In this way, you will get [xmin, xmax, ymin, ymax] from [X, Y, Width, Height].
Feel free to comment if you have any query.
Thanks!