Continuum: A Dimension-Agnostic Neural ODE Deep Reinforcement Learning Framework for Physics-Based Environments

1 University of Toronto

Continuum is a framework that enables effective physics-informed reinforcement learning.
The demo shown above is on the HalfCheetah-v5 environment.

Abstract

Deep reinforcement learning has proven to be a powerful approach in machine learning, where agents can learn to solve problems in environments through the application of neural networks. In particular, reinforcement learning has been applied to solve physics-based environments, which is prevalent in systems with non-linear dynamics. However, standard neural networks struggle to tackle physics problems because neural networks do not have any concept of the underlying math and physics of the system. Time dependencies in physics problems are not represented in the neural network, and in Deep RL, neural networks will learn trajectories of solutions based on probabilities of best actions versus the actual dynamics of the system. Neural Ordinary Differential Equations (NODEs) represent and learn the dynamics of the system by defining the fundamental differential equation as a neural network that learns the solution to an ODE, making it a powerful architecture for physics-informed machine learning.

In this work, we propose Continuum, a deep RL framework and neural network architecture for physics-informed reinforcement learning. The architecture combines NODEs, Autoencoders, and model-free RL algorithms, where the latent space of the Autoencoder is represented by a time-dependent NODE that learns the continuous-time dynamics of the environment. In this architecture, we aim to build a neural network that has stronger physics alignment and interpretability, thus encouraging policies to make predictions based on structured latent representations of the learned system dynamics that promote stability and performance.

Approach



Continuum uses a proposed model architecture that combines Neural ODEs, Autoencoders, and the PPO on-policy algorithm. The observation space inputs are passed into an encoder layer for enhancing feature representation and dimensionality reduction for efficient and faster learning. The latent space is then represented by the Neural ODE, which receives the reduced observation input and passed into the NODE for learning and representing dynamics. The output of the NODE is then increased back in dimensionality via the decoder layer and passed into the actor and critic heads for the Actor-Critic network architecture used in PPO. The PPO algorithm is then implemented to optimize the model.

Results

We train models on a variety of Gym and MuJoCo environments on our proposed architecture. Some successful results are shown below, where agents learn to find solutions in these environments. We achieved learning curve results that were either comparable to standard PPO or 2 to 3 times better in performance.

Conclusion

Continuum advances deep reinforcement learning in stochastic physics-based environments by integrating model-free algorithms such as PPO, Neural ODEs, and Autoencoders to effectively learn continuous-time feature dynamics for effective and stable learning, creating a framework that excels at physics-informed reinforcement learning. Across Gymnasium and MuJoCo benchmarks, we find that the model architecture proposed by Continuum (using PPO and the RK4 solver) either performs comparable to or outperforms standard PPO and deep learning architectures in model performance and stability in many environments. In environments that especially involve continuous dynamics in mechanics, such as HalfCheetah-v5 or Humanoid-v5, Continuum outperforms standard PPO by approximately 300% and 600%, respectively. The novel architecture introduced in this project has been shown to enable better learning performance and stability, and opens opportunities for further research and advancements in related fields such as model-free reinforcement learning and imitation learning. Future efforts will focus on exploring opportunities to apply this architecture in to these related domains, as well as investigating model interpretability in our current architecture to gain further insights on the latent representations of physics and dynamics in the network arhictecture. Such insights may inspire the design of more effective neural network architectures in future work and research.