Under Review  ·  2026

MARLNet — Motion-Aware Reinforcement Learning for Object Localization

Pascal VOC 2012  ·  VisDrone 2019  ·  PPO  ·  EfficientNet-B0

Prithvi Raj Singh ¹ Satyendra Raj Singh ²
¹ McNeese State University, Lake Charles, LA, USA ² Louisiana Tech University, Ruston, LA, USA
📄 Paper COMING SOON arXiv COMING SOON ⌨ Code will be available by end of June 2026

Overview

Post-detection refinement guided by motion priors

Modern object detectors produce accurate class predictions but imprecise bounding boxes — even small coordinate errors cause disproportionate drops in overlap-based metrics, especially for small or densely packed objects. MARLNet addresses this through a lightweight iterative refinement agent trained with Proximal Policy Optimisation (PPO) that learns to adjust detector proposals toward the ground truth through a series of coordinate updates.

A constant-velocity motion prior is incorporated into the 268-dimensional observation state, and an action smoothness penalty regularises the reward function. This combination produces stable training across all regularisation strengths tested on Pascal VOC 2012 and VisDrone 2019, achieving consistent gains in detection success rate at IoU ≥ 0.5 over the base detector while averaging only 1.0–1.2 refinement steps at inference.

+0.011 SR@0.5 gain on VOC
+0.025 SR@0.5 gain on VisDrone
~12 FPS end-to-end (GPU)
<1ms RL policy cost per image
1.1× Avg. refinement steps
[ Pipeline figure — add parlnet_pipeline.png ]

MARLNet inference pipeline. A Faster R-CNN base detector provides initial proposals. At each refinement step, the current box defines a crop processed by a frozen EfficientNet-B0 backbone (256-dim features). The 268-dim state — encoding the proposal, a constant-velocity kinematic prediction, the previous action, and the crop feature — is passed to a shared actor-critic MLP. The actor outputs a coordinate adjustment and a binary stopping signal. If the trigger fires the refined box is returned; otherwise the loop repeats. The smoothness penalty λ‖at − at−1‖ encourages directionally consistent refinement trajectories. Tmax = 15; average steps at inference: 1.0–1.2.

Contributions

01
MARLNet Framework

PPO-based motion-aware RL agent with CV motion prior in state and action smoothness penalty in reward. Stable training across all λ values on two datasets.

02
SR@0.5 Improvements

Consistent detection success rate gains over the base detector on both VOC (+0.011) and VisDrone (+0.025) under precision-critical evaluation.

03
Feature Information Ceiling

Representational ceiling characterised: crop-feature agents sharing a backbone with their detector cannot exceed it. Confirmed via global+local observation ablation.

04
Reward Design Analysis

Identifies reward interference causing training collapse when CV deviation penalty combines with absolute IoU term. Smoothness penalty resolves it entirely.


Results

Quantitative evaluation

All methods use the same Faster R-CNN initial detections. The highlighted rows show MARLNet at the best-performing regularisation strength per dataset. SR@0.5 (success rate at IoU ≥ 0.5) is the primary metric where MARLNet achieves consistent improvements.

Pascal VOC 2012 val — 5,823 images

Method mAP AP@.5 AP@.75 AR SR@.5 SR@.7 ΔIoU
Baseline
Detector only 0.490 0.748 0.544 0.593 0.504 0.389
Refinement methods
Heuristic 0.414 0.661 0.442 0.557 0.506 0.359 −0.013
PPO (no motion prior) 0.452 0.745 0.498 0.555 0.501 0.377 −0.012
MARLNet λ=0.10 0.437 0.750 0.491 0.544 0.515 ↑ 0.383 −0.008

VisDrone 2019 val — 548 images

Method mAP AP@.5 AP@.75 AR SR@.5 SR@.7 ΔIoU
Baseline
Detector only 0.189 0.337 0.187 0.280 0.669 0.439
Refinement methods
Heuristic 0.140 0.257 0.134 0.266 0.671 0.418 −0.007
PPO (no motion prior) 0.177 0.339 0.172 0.267 0.694 ↑ 0.423 −0.020
MARLNet λ=0.05 0.167 0.333 0.156 0.255 0.666 0.414 −0.015

↑ denotes improvement over detector baseline. Bold indicates best value per column among refinement methods. SR@.5 = detection success rate at IoU ≥ 0.5.


Code

Open-source release

The full training and evaluation codebase will be released shortly. It includes the complete MARLNet implementation, reward variant ablations, per-category evaluation scripts, and pretrained checkpoints for both datasets.

github.com / [your-username] / MARLNet

PPO training · EfficientNet-B0 feature extraction · VOC & VisDrone evaluation · Pretrained checkpoints

Code available very soon

The repository will include setup instructions, training scripts, and all evaluation utilities required to reproduce the results in the paper. Star the repository to be notified on release.


Citation

If you find this work useful

Please consider citing our paper once it is available:

@article{marlnet2025,
  title   = {Analyzing Motion Priors for Reinforcement
             Learning-Based Bounding Box Refinement},
  author  = {Author One and Author Two},
  journal = {Under Review},
  year    = {2025}
}