Converting Custom Annotations to YOLO Keypoint Format - Writing

The corner annotator outputs a single corner_annotations.json file. Each entry maps an image path to four corner coordinates in pixels.

{
  "boards/game_042.jpg": {
    "corners": [[142, 88], [571, 92], [568, 521], [138, 518]]
  },
  "boards/game_043.jpg": {
    "corners": [[98, 112], [540, 105], [545, 548], [95, 552]]
  }
}

YOLO needs something completely different. One .txt file per image, with normalised coordinates, in a specific directory layout. Getting from one to the other is tedious enough to warrant its own script.

What YOLO pose format looks like

Each label file contains one line per detected object. For keypoint (pose) models, the format is:

<class> <cx> <cy> <w> <h> <kp1_x> <kp1_y> <kp1_v> <kp2_x> <kp2_y> <kp2_v> ...

All coordinates are normalised to 0-1 relative to image dimensions. cx/cy is the bounding box centre, w/h is its size. Each keypoint has x, y, and a visibility flag (0 = not labelled, 1 = labelled but occluded, 2 = visible).

For a Go board with four corner keypoints, one line looks like:

0 0.52 0.49 0.63 0.71 0.21 0.14 2 0.83 0.15 2 0.82 0.85 2 0.20 0.84 2

The conversion function

The core conversion takes pixel corners and image dimensions, returns a YOLO-formatted string.

def to_yolo_format(corners, img_w, img_h):
    xs = [c[0] for c in corners]
    ys = [c[1] for c in corners]

    cx = (min(xs) + max(xs)) / 2 / img_w
    cy = (min(ys) + max(ys)) / 2 / img_h
    w = (max(xs) - min(xs)) / img_w
    h = (max(ys) - min(ys)) / img_h

    keypoints = []
    for corner in corners:
        kx = corner[0] / img_w
        ky = corner[1] / img_h
        keypoints.extend([kx, ky, 2])

    return f"0 {cx} {cy} {w} {h} {' '.join(map(str, keypoints))}"

The bounding box is computed from the min/max of the corner coordinates. It’s not a tight fit around the board edges, it’s the axis-aligned rectangle that contains all four corners. Close enough for YOLO’s purposes.

Building the dataset folder

YOLO expects a rigid directory structure: images/train, images/val, labels/train, labels/val. Each label file must have the same name as its image, just with a .txt extension.

import json
import shutil
import cv2
from pathlib import Path
from sklearn.model_selection import train_test_split

with open("corner_annotations.json") as f:
    annotations = json.load(f)

# Filter out missing files
valid = {k: v for k, v in annotations.items() if Path(k).exists()}

paths = list(valid.keys())
train_paths, val_paths = train_test_split(paths, test_size=0.15, random_state=42)

output = Path("corner_dataset")
for split, split_paths in [("train", train_paths), ("val", val_paths)]:
    img_dir = output / "images" / split
    lbl_dir = output / "labels" / split
    img_dir.mkdir(parents=True, exist_ok=True)
    lbl_dir.mkdir(parents=True, exist_ok=True)

    for path in split_paths:
        img = cv2.imread(path)
        h, w = img.shape[:2]
        corners = valid[path]["corners"]

        name = Path(path).stem
        shutil.copy(path, img_dir / f"{name}.jpg")

        label = to_yolo_format(corners, w, h)
        (lbl_dir / f"{name}.txt").write_text(label)

The script skips any entries where the image file is missing. This means I can annotate aggressively, delete bad images later, and re-run the conversion without manual cleanup.

The dataset config

YOLO also needs a dataset.yaml that describes the dataset.

path: corner_dataset
train: images/train
val: images/val

kpt_shape: [4, 3]  # 4 keypoints, 3 values each
names:
  0: board

kpt_shape: [4, 3] tells YOLO there are 4 keypoints per object, each with 3 values (x, y, visibility). The single class board is all we need.

Running it

python3 prepare_yolo_keypoints.py
# Dataset created: corner_dataset
#     Train: 226 images
#     Val: 41 images
#     Skipped: 0 (missing files)
#     Total: 267
#     Config: corner_dataset/dataset.yaml

The whole point of this script is that it’s re-runnable. Annotate more images, run it again, and the dataset rebuilds. No manual file shuffling, no remembering which images are in which split. The split is deterministic (fixed random seed), so adding new images doesn’t reshuffle existing ones unless the total count changes significantly.