Pascal VOC 2012 · VisDrone 2019 · PPO · EfficientNet-B0
Modern object detectors produce accurate class predictions but imprecise bounding boxes — even small coordinate errors cause disproportionate drops in overlap-based metrics, especially for small or densely packed objects. MARLNet addresses this through a lightweight iterative refinement agent trained with Proximal Policy Optimisation (PPO) that learns to adjust detector proposals toward the ground truth through a series of coordinate updates.
A constant-velocity motion prior is incorporated into the 268-dimensional observation state, and an action smoothness penalty regularises the reward function. This combination produces stable training across all regularisation strengths tested on Pascal VOC 2012 and VisDrone 2019, achieving consistent gains in detection success rate at IoU ≥ 0.5 over the base detector while averaging only 1.0–1.2 refinement steps at inference.
MARLNet inference pipeline. A Faster R-CNN base detector provides initial proposals. At each refinement step, the current box defines a crop processed by a frozen EfficientNet-B0 backbone (256-dim features). The 268-dim state — encoding the proposal, a constant-velocity kinematic prediction, the previous action, and the crop feature — is passed to a shared actor-critic MLP. The actor outputs a coordinate adjustment and a binary stopping signal. If the trigger fires the refined box is returned; otherwise the loop repeats. The smoothness penalty λ‖at − at−1‖ encourages directionally consistent refinement trajectories. Tmax = 15; average steps at inference: 1.0–1.2.
PPO-based motion-aware RL agent with CV motion prior in state and action smoothness penalty in reward. Stable training across all λ values on two datasets.
Consistent detection success rate gains over the base detector on both VOC (+0.011) and VisDrone (+0.025) under precision-critical evaluation.
Representational ceiling characterised: crop-feature agents sharing a backbone with their detector cannot exceed it. Confirmed via global+local observation ablation.
Identifies reward interference causing training collapse when CV deviation penalty combines with absolute IoU term. Smoothness penalty resolves it entirely.
All methods use the same Faster R-CNN initial detections. The highlighted rows show MARLNet at the best-performing regularisation strength per dataset. SR@0.5 (success rate at IoU ≥ 0.5) is the primary metric where MARLNet achieves consistent improvements.
Pascal VOC 2012 val — 5,823 images
| Method | mAP | AP@.5 | AP@.75 | AR | SR@.5 | SR@.7 | ΔIoU |
|---|---|---|---|---|---|---|---|
| Baseline | |||||||
| Detector only | 0.490 | 0.748 | 0.544 | 0.593 | 0.504 | 0.389 | — |
| Refinement methods | |||||||
| Heuristic | 0.414 | 0.661 | 0.442 | 0.557 | 0.506 | 0.359 | −0.013 |
| PPO (no motion prior) | 0.452 | 0.745 | 0.498 | 0.555 | 0.501 | 0.377 | −0.012 |
| MARLNet λ=0.10 | 0.437 | 0.750 | 0.491 | 0.544 | 0.515 ↑ | 0.383 | −0.008 |
VisDrone 2019 val — 548 images
| Method | mAP | AP@.5 | AP@.75 | AR | SR@.5 | SR@.7 | ΔIoU |
|---|---|---|---|---|---|---|---|
| Baseline | |||||||
| Detector only | 0.189 | 0.337 | 0.187 | 0.280 | 0.669 | 0.439 | — |
| Refinement methods | |||||||
| Heuristic | 0.140 | 0.257 | 0.134 | 0.266 | 0.671 | 0.418 | −0.007 |
| PPO (no motion prior) | 0.177 | 0.339 | 0.172 | 0.267 | 0.694 ↑ | 0.423 | −0.020 |
| MARLNet λ=0.05 | 0.167 | 0.333 | 0.156 | 0.255 | 0.666 | 0.414 | −0.015 |
↑ denotes improvement over detector baseline. Bold indicates best value per column among refinement methods. SR@.5 = detection success rate at IoU ≥ 0.5.
The full training and evaluation codebase will be released shortly. It includes the complete MARLNet implementation, reward variant ablations, per-category evaluation scripts, and pretrained checkpoints for both datasets.
PPO training · EfficientNet-B0 feature extraction · VOC & VisDrone evaluation · Pretrained checkpoints
The repository will include setup instructions, training scripts, and all evaluation utilities required to reproduce the results in the paper. Star the repository to be notified on release.
Please consider citing our paper once it is available:
@article{marlnet2025,
title = {Analyzing Motion Priors for Reinforcement
Learning-Based Bounding Box Refinement},
author = {Author One and Author Two},
journal = {Under Review},
year = {2025}
}