Visual Servoing Playbook: IBVS vs PBVS vs 2.5D (Practical Robotics Guide)

Date: 2026-03-21
Category: knowledge

Why this matters

If a robot must align to a target in the real world (grasping, docking, insertion, inspection), open-loop pose plans usually fail in the last centimeters.

Visual servoing closes that gap by using camera feedback continuously.

The hard part is not “use camera feedback.” The hard part is choosing the right servo formulation:

IBVS (Image-Based Visual Servoing): control from image-feature error directly.
PBVS (Position/Pose-Based Visual Servoing): estimate 3D pose, then control in Cartesian space.
2.5D / hybrid: combine image and partial pose info to avoid both extremes’ failure modes.

Pick wrong, and you get camera retreat, FOV loss, jitter, singularities, or unstable endgame behavior.

1) Quick mental model

IBVS

Control objective is in image space (pixels/normalized features).
Typically robust to moderate calibration/model error.
Often better at keeping target in view.
Can suffer from interaction-matrix singularities and awkward 3D trajectories.

PBVS

Control objective is in 3D pose error.
Intuitive Cartesian trajectories if pose is accurate.
Sensitive to calibration, pose-estimation noise, and model mismatch.
Can lose target from camera FOV if trajectory is not visibility-aware.

2.5D / Hybrid

Uses image features for visibility safety + partial 3D terms for better convergence geometry.
More engineering complexity, but often best practical tradeoff.

2) Core equations (operator level)

Let image features be s, desired features s*, camera twist v.

IBVS relation

dot(s) = L_s * v

L_s: interaction matrix (image Jacobian).
Typical control law:

v = -lambda * L_s^+ * (s - s*)

where L_s^+ is pseudo-inverse and lambda > 0 is gain.

For a normalized point feature (x, y) with depth Z, a common block in L_s is:

[ -1/Z,   0,   x/Z,   x*y,   -(1+x^2),   y ]
[   0,  -1/Z,  y/Z,  1+y^2,   -x*y,     -x ]

(Exact sign/frame convention depends on implementation and camera/robot frames.)

PBVS relation

Estimate target pose in camera frame, compute relative transform error T_err to goal, then control twist from translational + rotational error:

v = -lambda * [ t_err ; theta*u ]

t_err: translation error.
theta*u: axis-angle rotation error.

Practical note: PBVS quality is dominated by pose-estimation quality (PnP, calibration, latency, outlier rejection).

3) Decision matrix (what to deploy first)

Use IBVS first when:

Feature tracking is reliable in image.
Calibration is uncertain or changing.
Keeping object in FOV is high priority.
Task tolerates less-geometric camera path.

Use PBVS first when:

You have strong calibration and robust pose estimation.
3D path shape matters (clearance/constrained approach).
CAD/model priors are strong.
You can enforce visibility constraints explicitly.

Use 2.5D / hybrid when:

You want IBVS robustness but PBVS convergence geometry.
Large initial pose errors are common.
You can afford additional engineering and validation.

4) System architecture that actually works

Perception front-end
- Track features (corners/lines/markers) with quality score.
- Reject outliers (RANSAC, temporal consistency, innovation gating).
State filtering
- Smooth feature/pose estimates (EKF/UKF or simple low-lag filters).
- Keep filter lag bounded; too much smoothing destabilizes loop phase.
Servo controller
- Compute L_s (or pose error map) each cycle.
- Use damped pseudo-inverse near singularities.
- Apply gain scheduling by depth/error magnitude.
Robot interface
- Velocity limits, acceleration/jerk limits, watchdog.
- Hard safety envelope for workspace and joint constraints.
Supervisor
- Mode switching (SEARCH -> ACQUIRE -> SERVO -> INSERT/GRASP -> HOLD/RECOVER).
- Visibility guardrails and fallback actions.

5) Tuning workflow (fastest path to stable behavior)

Phase A: static bench

Verify camera intrinsics/extrinsics sanity.
Validate feature stability under lighting and motion blur.
Measure perception latency and jitter.

Phase B: low-gain closed loop

Start with conservative lambda.
Confirm monotonic error reduction from multiple initial offsets.
Track FOV margin and singular value floor of Jacobian.

Phase C: raise performance carefully

Increase gain until overshoot/oscillation appears, then back off.
Add damping term in pseudo-inverse ((L^T L + mu^2 I)^-1 L^T).
Introduce gain scheduling by depth/condition number.

Phase D: stress tests

Illumination changes, partial occlusion, specular surfaces.
Calibration perturbation tests.
Sudden target motion and temporary feature dropout.

6) Common failure modes (and fixes)

Camera retreat / weird long path
- Typical in naive IBVS setups.
- Fix: feature selection redesign, hybrid/2.5D strategy, trajectory constraints.
Target leaves field of view (PBVS)
- 3D-optimal path is not visibility-optimal.
- Fix: add visibility constraints or image-space secondary task.
Jitter near goal
- Pose noise + high gain + latency.
- Fix: lower terminal gain, filtered target update, deadband/hysteresis.
Singularity/ill-conditioning
- Feature geometry degenerates.
- Fix: damped inverse, feature set diversification, re-acquire strategy.
False stability in sim, failure on robot
- Unmodeled delay, rolling shutter, actuation saturation.
- Fix: measure end-to-end delay and include it in control tuning.

7) Metrics to run in production/field tests

Convergence success rate (%) from randomized initial offsets.
Time-to-converge (median / p95).
Final image error (pixels or normalized).
Final pose error (mm/deg) when pose truth exists.
FOV violation count.
Jacobian condition-number statistics.
Recovery rate after temporary feature loss.
Control saturation ratio (how often velocity commands clip).

Treat these as deployment gates, not “nice to have” charts.

8) Minimal launch checklist

Servo mode state machine implemented (not just one loop).
Damped pseudo-inverse and singularity guardrails enabled.
Perception quality gating + fallback behavior implemented.
Velocity/acceleration safety limits verified on hardware.
FOV-loss and feature-drop stress tests passed.
Logging includes perception latency + controller timing.
Bench + real-scene validation both passed.

References

Hutchinson, Hager, Corke (1996), A Tutorial on Visual Servo Control
https://faculty.cc.gatech.edu/~seth/ResPages/pdfs/HutHagCor96.pdf
Chaumette, Hutchinson (2006), Visual servo control, Part I: Basic approaches
https://inria.hal.science/inria-00350283v1/document
Chaumette, Hutchinson (2007), Visual servo control, Part II: Advanced approaches
https://hal.science/inria-00350638
Malis, Chaumette, Boudet (1999), 2 1/2 D Visual Servoing
https://inria.hal.science/inria-00352542/PDF/1999_ieeera_malis.pdf
OpenCV docs, Perspective-n-Point (solvePnP)
https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html
Wang, Olson (2016), AprilTag 2: Efficient and robust fiducial detection
https://docs.wpilib.org/en/stable/_downloads/cba1039fecb1731ad4e233f7638b9fd0/wang2016iros.pdf