Physics

Policy-DRIFT: Dynamic Reward-Informed Flow Trajectory Steering

AI Insight

Policy-DRIFT is a hybrid control framework that combines conditional flow matching, a generative modeling technique, with deep reinforcement learning to reduce skin-friction drag in wall-bounded turbulent flows. Rather than relying on a scalar reward signal to train a policy directly, the system uses Terminal Reward Guidance to steer a generative model toward reward-maximizing flow states, which the DRL policy then tracks via RMSE minimization. Tested on direct numerical simulation of turbulent channel flow at Re-tau 180, the method achieves 49% drag reduction, approximately 16% higher than standard DRL benchmarks, while using 37 times less actuation energy.


Skin-friction drag from turbulent flows represents a significant portion of energy expenditure in aviation, shipping, and wind energy, meaning even incremental improvements in active flow control translate to substantial real-world energy savings and emissions reductions. This approach demonstrates a generalizable principle of decoupling reward quality from policy training, which could extend to other complex physical control problems beyond fluid dynamics.


arXiv:2605.14022v1 Announce Type: new
Abstract: Skin-friction drag induced by wall-bounded turbulent flows accounts for a substantial fraction of energy consumption across commercial aerospace, wind energy, and marine transport. Its active reduction is one of the highest-value targets in engineering fluid dynamics. Deep reinforcement learning (DRL) has emerged as the leading approach for real-time flow control, yet its performance ceiling is set not by algorithmic capability but by reward structure, the naive scalar objective does not optimally reflect the underlying physics. Policy-DRIFT bypasses this ceiling by relocating reward information from policy gradients to generative model inference: a conditional flow matching model (CFM) constructs a physically-grounded manifold of realisable flow states spanning multiple control regimes, Terminal Reward Guidance (TRG) steers samples toward reward-maximising targets at inference, and a lightweight DRL policy, structurally decoupled from reward quality, tracks these full-field targets via root-mean-squared error (RMSE) minimisation. The test case is turbulent channel flow simulated using direct numerical simulation (DNS) at friction Reynolds number of $mathrm{Re}_tau = 180$, which is the canonical benchmark for wall-bounded turbulence. Policy-DRIFT achieves $49%$ drag reduction approaching the theoretical upper bound, which is $approx 16%$ higher than the DRL benchmark, while consuming 37$times$ less actuation energy. Our approach combines generative methods with active flow control, marking a paradigm shift towards controlling complex physical systems efficiently.

Source: Policy-DRIFT: Dynamic Reward-Informed Flow Trajectory Steering