Research Project · Embodied AI

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

Wenze Wang · Mehdi Hosseinzadeh · Feras Dayoub

Australian Institute for Machine Learning (AIML) · Adelaide University

arXiv · coming soon Paper · coming soon GitHub Video

Execution-grounded runtime monitoring for robust real-world grasping.

Demo Video

▶

Watch on YouTube Open the full demo video in a new tab

Featured Visual

Overview

Execution-grounded decision-making for real-world manipulation

Instead of treating grasp execution as a one-shot black box, this system exposes runtime outcomes as explicit states. A lightweight Watchdog monitors execution, surfaces events like SUCCESS or EMPTY, and enables a bounded policy to finalize, retry, or ask for clarification.

Explicit execution-state monitoring

Transforms noisy physical feedback into discrete, decision-ready states for the agent loop.

Bounded recovery without retraining

Wraps the learned manipulation primitive instead of changing the underlying grasp model.

Robust under ambiguity and distractors

Maintains target consistency across clutter, target similarity, and induced empty grasp scenarios.

Loop Teaser

Observe → Act → Evaluate → Decide

A compact summary of the physical agentic loop and its bounded recovery logic.

Method Overview

Agent-centric architecture

Structured goals, perception conditioning, outcome-aware execution, and a bounded decision policy are organized into a single physical agentic loop.

Core Loop

Observe → Act → Evaluate → Decide

Observe

Receive the structured task goal and the current RGB-D scene state.

Act

Execute the unmodified manipulation primitive on the selected target.

Evaluate

Infer discrete outcomes from gripper telemetry and execution traces.

Decide

Finalize, retry once, or escalate through clarification when uncertainty persists.

Watchdog Runtime States

SUCCESS EMPTY WEAK SLIP STALL TIMEOUT

Decision → FINALIZE / RETRY / WAIT_CLARIFY

Recovery Example

Outcome-driven recovery timeline

A recoverable empty grasp triggers a single bounded retry before escalation.

Representative Workflows

Real-world behavior traces

From distractor-heavy scenes to color and spatial ambiguity, the system keeps the target grounded while adapting to execution outcomes.

Selecting the target-colored cup from two differently colored cups

Color-conditioned grounding for choosing the intended cup among visually distinct candidates.

Distractor-aware object selection with a nearby non-target object

Distractor-aware selection with a nearby non-target object

Maintains the intended target despite a salient distractor placed next to the workspace object.

Spatial ambiguity across similar cups

Grounds the requested target among similar cups under spatial ambiguity and bounded decision-making.

Toy grasping under distractor presence

Selects the intended toy while ignoring the nearby cup and preserving semantic target consistency.

Links

Project resources

Preprint arXiv Manuscript Paper Code GitHub Demo Video

Citation

BibTeX

@article{wang2026physicalagenticloop,
  title   = {A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring},
  author  = {Wang, Wenze and Hosseinzadeh, Mehdi and Dayoub, Feras},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2026},
  note    = {Preprint under review; update identifier after announcement}
}

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

Execution-grounded decision-making for real-world manipulation

Explicit execution-state monitoring

Bounded recovery without retraining

Robust under ambiguity and distractors

Observe → Act → Evaluate → Decide

Agent-centric architecture

Observe → Act → Evaluate → Decide

Observe

Act

Evaluate

Decide

Outcome-driven recovery timeline

Real-world behavior traces

Selecting the target-colored cup from two differently colored cups

Distractor-aware selection with a nearby non-target object

Spatial ambiguity across similar cups

Toy grasping under distractor presence

Project resources

BibTeX

Get in touch