Corner Annotation, Board Scraping, and Training YOLO - Writing

SAM 3 segments stones well, but the pipeline still needed to know where the board is. Specifically, it needed the four corners of the playing grid to map detected stones to intersections. No pre-trained model I tried could do this reliably.

Why corner detection failed

I tried several approaches. Harris corner detection found too many candidates. The board has 361 intersections, each one a potential corner. Filtering for the four outermost corners meant guessing which of hundreds of points were the right ones.

Contour detection with cv2.findContours worked on clean, high-contrast images where the board edge was obvious. On real photos with wooden tables, partial board visibility, or boards placed on similar-coloured surfaces, the board contour blended into the background.

SAM 3’s segmentation could isolate the board as a region, but the mask boundary was approximate. The corners of a segmentation mask don’t align precisely with the grid corners, and a few pixels of error at the corners means several intersections of error across the board.

I needed ground truth data. That meant manual annotation.

Building a corner annotator

The annotator is a simple Python tool using OpenCV’s GUI functions. It loads an image, lets you click the four board corners in order (top-left, top-right, bottom-right, bottom-left), and saves the coordinates to a JSON file.

import cv2
import json

corners = []

def click_handler(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN and len(corners) < 4:
        corners.append([x, y])
        cv2.circle(param, (x, y), 5, (0, 255, 0), -1)
        if len(corners) > 1:
            cv2.line(param, tuple(corners[-2]), tuple(corners[-1]), (0, 255, 0), 2)
        if len(corners) == 4:
            cv2.line(param, tuple(corners[3]), tuple(corners[0]), (0, 255, 0), 2)
        cv2.imshow("Annotate", param)

img = cv2.imread("board.jpg")
cv2.imshow("Annotate", img)
cv2.setMouseCallback("Annotate", click_handler, img)
cv2.waitKey(0)

Each annotation takes about 5 seconds. Click four points, press a key, move to the next image. The output is a JSON file per image with the corner coordinates and the image path.

{
  "image": "boards/game_042.jpg",
  "corners": [[142, 88], [571, 92], [568, 521], [138, 518]]
}

I also added a preview step that applies the perspective transform so I could verify each annotation visually. If the warped image showed a clean square grid, the corners were right.

pts_src = np.float32(corners)
pts_dst = np.float32([[0,0], [600,0], [600,600], [0,600]])
M = cv2.getPerspectiveTransform(pts_src, pts_dst)
warped = cv2.warpPerspective(img, M, (600, 600))
cv2.imshow("Warped", warped)

Scraping Go board images

200 manually photographed boards wasn’t enough variety. I built a scraper to pull Go board images from the internet.

The sources were Go forums, game review sites, and image search results. The scraper uses requests and BeautifulSoup for HTML pages, plus direct API calls where available.

import requests
from bs4 import BeautifulSoup
from pathlib import Path

def scrape_go_images(url, output_dir):
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, "html.parser")
    images = soup.find_all("img")

    for i, img in enumerate(images):
        src = img.get("src") or img.get("data-src")
        if not src or not any(ext in src for ext in [".jpg", ".png", ".jpeg"]):
            continue
        if not src.startswith("http"):
            continue
        try:
            img_data = requests.get(src, timeout=10).content
            Path(output_dir).mkdir(parents=True, exist_ok=True)
            with open(f"{output_dir}/board_{i:04d}.jpg", "wb") as f:
                f.write(img_data)
        except Exception:
            continue

Not every image was usable. Many were digital board renders (not real photos), cropped close-ups of specific positions, or too low resolution. I filtered manually, keeping only photos of physical boards where all four corners were visible.

After filtering, I had about 1,200 board images from varied sources. Different board types (wooden, plastic, cloth), different stone types (glass, slate, plastic), different lighting, different angles. Combined with my own 200 photos, that gave me roughly 1,400 images to annotate.

Annotating at scale

1,400 images at 5 seconds each is about two hours of clicking. I built a batch mode into the annotator that queues images and tracks progress.

import glob

images = sorted(glob.glob("boards/*.jpg"))
annotations = {}

for path in images:
    if path in annotations:
        continue
    corners.clear()
    img = cv2.imread(path)
    cv2.imshow("Annotate", img)
    cv2.setMouseCallback("Annotate", click_handler, img)
    key = cv2.waitKey(0)
    if key == ord("s") and len(corners) == 4:
        annotations[path] = corners.copy()
    elif key == ord("q"):
        break

with open("annotations.json", "w") as f:
    json.dump(annotations, f)

Skip with n, save with s, quit with q. Progress saves after each image so crashes don’t lose work.

From annotations to YOLO format

The annotator outputs corner_annotations.json. YOLO needs normalised bounding boxes with keypoints in a rigid folder structure. I wrote a conversion script that bridges the two formats, handling train/val splits and missing file cleanup. It’s re-runnable, so adding more annotations just means running it again.

Training

I trained a YOLOv8 pose model, which supports keypoint detection natively. The four board corners are the keypoints.

# dataset.yaml
path: /data/go-boards
train: images/train
val: images/val
test: images/test

kpt_shape: [4, 3]  # 4 keypoints, 3 values each (x, y, visibility)
names:
  0: board

from ultralytics import YOLO

model = YOLO("yolov8m-pose.pt")
results = model.train(
    data="dataset.yaml",
    epochs=100,
    imgsz=640,
    batch=16,
    device=0,
)

Training ran on a rented A100 via Vast.ai. 100 epochs on 1,400 images took about 40 minutes.

Results

The trained model detects board corners with a mean pixel error of about 3 pixels on the test set. That translates to sub-intersection accuracy after perspective correction, which is good enough to reliably map stones to grid positions.

On images similar to the training data (full board visible, reasonable lighting), corner detection is nearly perfect. The remaining failure cases are extreme angles where the board is heavily foreshortened, and images where parts of the board are occluded by hands or bowls.

The full pipeline now runs in two stages: YOLO for corner detection and perspective correction, then SAM 3 for stone segmentation on the warped image. The warped image is always a clean top-down view, which makes the stone detection much more reliable than running SAM on the original angled photo.