<aside> ❗
This is originally a project building off my Planning Fast and Slow: Trajectory Synthesis via Conditional Flow Matchings master dissertation, but this is now a side project and I'm prioritizing Generative World Models instead.
I've decided to de-prioritize this because my original proposal was ill-posed and to put it transparently, I didn't conduct an in-depth literature review. The learnings have been documented in Retro From My First Three Months.
</aside>
Recent advances in test-time inference (TRM, HRM) have opened new avenues for improving model performance. In parallel, generative models such as diffusion and flow have emerged as effective policy parameterizations in robotics. In this work, we propose that these are complementary components of a unified system. We extend the reasoning process by framing it as an agent interacting with its internal world model. We present Recursive Flow Policy (RFP), a novel framework that integrates test-time compute into continuous control. We are the first to reframe planning as a reasoning paradigm that iteratively refines trajectories.