Solving a class of stochastic optimal control problems by physics-informed neural networks111This research was partially supported by the National Natural Science Foundation of China (12272297).
Abstract
The aim of this work is to develop a deep learning method for solving high-dimensional stochastic control problems based on the Hamilton–Jacobi–Bellman (HJB) equation and physics-informed learning. Our approach is to parameterize the feedback control and the value function using a decoupled neural network with multiple outputs. We train this network by using a loss function with penalty terms that enforce the HJB equation along the sampled trajectories generated by the controlled system. More significantly, numerical results on various applications are carried out to demonstrate that the proposed approach is efficient and applicable.
keywords:
Stochastic optimal control , High dimension , Hamilton–Jacobi–Bellman equation , Physics-informed learning[inst1]organization=School of Mathematics and Statistics, addressline=Northwestern Polytechnical University, city=Xi’an, postcode=710129, country=China
[inst2]organization=MOE Key Laboratory for Complexity Science in Aerospace,addressline=Northwestern Polytechnical University, city=Xi’an, postcode=710129, country=China
[inst3]organization=State Key Laboratory of Fluid Power and Mechatronic Systems, Department of Mechanics,addressline=Zhejiang University, city=Hangzhou, postcode=310027, country=China
1 Introduction
The range of stochastic optimal control (SOC) problems covers a variety of scientific branches such as finance [1], molecular dynamics [2], neuroscience [3] and robotics [4]. To address SOC problems, there are two prominent frameworks: Pontryagin’s maximum principle (MP) [5] and Bellman’s dynamic programming (DP) [6]. Drawing on these frameworks, many numerical methods have been developed for tackling SOC problems (cf. [7, 8] and references therein).
However, these traditional numerical methods are not applicable when the state dimension is large [9]. In recent years, there has seen significant progress in leveraging deep learning (DL) to solve the high-dimensional SOC problems [10, 11, 12, 13, 14]. Broadly speaking, the deep neural network-based methods for SOC can be divided into two distinct categories. In the first category, it is concerned with the DL-based approach to solve the extended Hamiltonian system, which is derived from stochastic MP (cf. [15, 16, 17]). For the study of the second category, [18, 19] reformulate the SOC problem as Markov decision process based on DP, which is solved by some DL-based algorithms. Another direction of this category is to solve the SOC problem from the view of DP via HJB equation [20, 21, 22]. We need to point out that in these papers Feynman–Kac formula is the basis to probabilistically represent the solution to HJB equation so that the author can utilizes neural networks to obtain the optimal policy.
Motivated by previous research, we aim to solve the SOC problem with physics-informed learning [23, 24]. The main issue we encounter in our approach is to construct a physics-informed neural network (PINN) for solving HJB equation, which is a semilinear parabolic partial differential equation (PDE) with a terminal value condition. Since the HJB equation is defined on the whole space, without boundary condition, PINN cannot be directly used to compute the value function by solving the HJB equation. Thanks to the stochastic verification theorem (see Theorem 1 in Section 2.2), we can simulate the value function along the trajectories of the controlled system, not on the whole space, by neural network. This is the key idea of our approach.
Our main contribution is twofold: (i) In contrast to [13], we use the controlled SDE to conduct sampling on relevant states during PINN training; (ii) We propose a simulation-free algorithm for SOC by physics-informed learning, which means it dose not require numerical solutions of the control problem.
The remaining part of this paper is organized as follows. In Section 2, we briefly introduce the preliminaries about the SOC problem, and the verification theorem that is the basis to construct our DL based solver. This solver called DeepHJB is proposed in Section 3. Numerical examples in Section 4 illustrate our proposed solver to solve some SOC problems. Section 5 provides some conclusions.
2 Stochastic optimal control
2.1 Problem setup
Let and be a -dimensional standard -Brownian motions on a filtered probability space where is the natural filtration generated by . The quadruple also satisfies the usual hypotheses (see Chapter 1.4 in [25]). stands for expectation with respect to the probability measure .
We consider the controlled stochastic differential equation (SDE) as follows
| (1) |
with and the initial data . Here, is the state process, is a control process valued in a given subset of . The cost functional is given by
| (2) |
with the functions and . The goal of our SOC problem is to look for an admissible control (if exists) that minimizes (2) over which is the set of all admissible controls defined by
in which consists of all -adapted functions satisfying .
In this paper, we focus on the SOC problem under the following conditions.
-
1.
The drift term and the diffusion term in (1) have the following linear forms in control
and
with , , and .
-
2.
The random term in which and are mutually independent Brownian motions.
- 3.
-
4.
The terminal cost is linear
or quadratic
with the coefficients and .
Under suitable assumptions (see Chapter 1 in [26]), for any equation (1) has a unique solution and the cost function (2) is well-defined. We call an admissible pair. Any is called an optimal control if it satisfies
The corresponding state process is called an optimal trajectory and the state-control pair called an optimal pair.
2.2 Verification theorem
We define the value function as
The following theorem shows the evolution of the value function along the optimal trajectory and at the the final time, which is deduced from the stochastic verification theorem (see Theorem 5.1 in Chapter 5.5 of [26]). The detailed proof is given in A.
Theorem 1.
An admissible pair , where the feedback control is given by
| (3) |
with
is optimal if and only if the following HJB equation holds
| (4) | ||||
for any , and .
Here, means the first-order derivative of with respect to , and respectively denote the gradient and the Hessian of with respect to , and is the abbreviation of the trace operator.
3 Deep learning approach
In this section we propose our approach to seek an optimal pair that minimizes the cost functional (2) subject to (1) for initial data sampled from a probability distribution in with a density denoted by .
We select a partition of the time interval :
and denote by the th interval of the grid and the -th increment of the Brownian motion. Once the control is computed, the Euler–Maruyama scheme (cf. [27]) of (1) gives
| (5) |
with the initial data . Using the numerical scheme (5), the path can be easily generated. If the value function is known and the discretization of the admissible pair satisfies
| (6) |
Our approach parameterizes the functions and by a decoupled neural network (Figure 1), which are given by
We denote by the weights of the neural network, which is trained by minimizing the sum of the expected losses that arises from the following penalty terms.
- 1.
- 2.
Now, we can define the physics-informed learning problem
The coefficients , and are supposed to be fixed.
Finally, we apply a SGD-type algorithm to optimize the parameter . The pseudo-code for implementing the above approach is given in Algorithm 1.
Input: the initial data , parameter of partition, parameters of networks, learning rate , max-step
For to do
For to do
For to do
while do
end while
end for
end for
random set
end for
4 Numerical experiments
In this section, we apply the DeepHJB solver to some SOC problems. In the following subsections, we discuss the controlled Ornstein–Uhlenbeck (OU) dynamics and the controlled metastable dynamics, respectively. To evaluate the proposed solver, we introduce the following error as the performance metric
where is the baseline optimal control. The detailed configurations of these experiments can be seen in B.
4.1 Ornstein–Uhlenbeck dynamics
We investigate the controlled system with
where are sampled once at the beginning of the experiments.
For the SOC problem with linear terminal cost, we choose
In this situation, the optimal control can be given analytically by
which has been calculated in [21]. We set the initial value to be zero and the terminal time . In Figure 2, the subfigure (a) gives a visible comparison of the optimal control between calculated by the DeepHJB solver and the baseline , while the subfigure (b) shows the evolution of the error against the iteration step. It can be seen that the optimal control approximated by the proposed solver well coincides with the analytical one.
Regarding the case with a quadratic terminal cost, we choose
This type of problems has an analytic optimal control
in which fulfills the Riccati equation
with (see [26, Chapter 6]). We choose the initial value from a pre-specified distribution ? and the terminal time . Figure 3 displays the direct comparison and error between the approximation and the baseline of the solution to this SOC problem, which illustrates the accuracy of our DeepHJB solver.
4.2 Metastable dynamics
We consider the double well
and the controlled system with
The initial states in this experiment are , and the terminal state is set as . As for the cost functional, we choose
and the terminal time .
Firstly, we study the one-dimensional setting, choosing , . Figure 4 displays the approximation of the optimal control computed by the DeepHJB solver and the baseline obtained by a finite difference method. The absolute error between them can be seen in Figure 5. It is clear that the approximation is in close agreement with the baseline. Figure 6 demonstrates the growth of the potential function from an original potential to the optimal potential.
Let us next consider the high-dimensional case, that is, . In particular, we set , for and , for . As can be seen, Figure 7 shows two components of the five dimensional approximated optimal control as well as the baseline , which indicates a good match and illustrates the efficacy of our DeepHJB solver for solving a high dimensional nonlinear SOC problem.
5 Conclusion
In this paper, we proposed the DeepHJB solver to study the finite time horizon SOC problems for a class of dynamical systems. Although these numerical experiments in this work demonstrate the efficacy of the solver, it still has plenty of room for development. From the viewpoint of theoretical analysis, our future research will be devoted to connecting Lyapunov analysis with the DeepHJB solver, and doing error analysis for the solver. Moreover, we will also exploit the present solver to investigate more control problems of high-dimensional nonlinear systems in practical applications.
Appendix A Proof of Theorem 1
From Theorem 5.1 in Chapter 5.5 of [26], we know the fact that an admissible pair is optimal is equivalent to the condition that this pair satisfies the following HJB equation
| (7) | ||||
and
Due to the specific expression of , and , we have
with
Since we have , that is,
then we have
| (8) |
Plugging the expression of the optimal control (8) into equation (7), we obtain the desired equation (4).
Appendix B Experiment configuration
We introduce the following fully connected feedforward neural network
where we refer to as the activation function, to as the number of layers, and to , , and as he number of neurons in the input, output, and -th hidden layer, respectively. We denote by , , the architecture of the neural network.
The computational framework for our numerical examples is conducted by the following architecture.
-
1.
Controlled OU dynamics in Section 4.1.
-
(a)
For the linear terminal cost, the architecture is given by
while for the quadratic terminal cost it is given by
-
(a)
-
2.
Controlled metastable dynamics in Section 4.2.
-
(a)
For one-dimensional case, the architecture is given by
while for ten-dimensional case it is given by
-
(a)
The computing device that we use for our solver includes a single NVIDIA GeForce RTX 2080Ti GPU with 11GB memory. Codes will be publicly available at https://github.com/zhezhejiao/DeepHJB after being accepted.
References
- [1] H. Pham, Continuous-time stochastic control and optimization with financial applications, Vol. 61, Springer Science & Business Media, 2009.
- [2] Y. Gao, T. Li, X. Li, J.-G. Liu, Transition path theory for langevin dynamics on manifolds: Optimal control and data-driven solver, Multiscale Modeling & Simulation 21 (1) (2023) 1–33.
- [3] E. Todorov, Optimality principles in sensorimotor control, Nature Neuroscience 7 (2004) 907–915.
- [4] T. Russ, Robotic Manipulation: Perception, Planning, and Control, Draft textbook, 2023.
- [5] L. S. Pontrygin, Mathematical Theory of Optimal Processes, CRC Press, 1987.
- [6] R. Bellman, Dynamic programming and stochastic control processes, Information and Control 1 (3) (1958) 228–239.
- [7] H. J. Kushner, Numerical methods for stochastic control problems in continuous time, SIAM Journal on Control and Optimization 28 (5) (1990) 999–1048.
- [8] Z. Jin, M. Qiu, K. Q. Tran, G. Yin, A survey of numerical solutions for stochastic control problems: Some recent progress, Numerical Algebra, Control and Optimization 12 (2) (2022) 213–253.
- [9] I. Exarchos, E. A. Theodorou, Stochastic optimal control via forward and backward stochastic differential equations and importance sampling, Automatica 87 (2018) 159–165.
- [10] A. Gorodetsky, S. Karaman, Y. Marzouk, Efficient high-dimensional stochastic optimal motion control using tensor-train decomposition., in: Robotics: Science and Systems, 2015.
- [11] J. Han, W. E, Deep learning approximation for stochastic control problems, in: Deep Reinforcement Learning Workshop, 2016.
- [12] Z. Wang, M. Pereira, T. Chen, E. Theodorou, E. Reed, Deep 2FBSDEs for systems with control multiplicative noise, arXiv:1906.04762.
- [13] X. Li, D. Verma, L. Ruthotto, A neural network approach for stochastic optimal control, SIAM Journal on Scientific Computing 46 (5) (2024) C535–C556.
- [14] W. Cai, S. Fang, T. Zhou, Soc-Martnet: A martingale neural network for the Hamilton-Jacobi-Bellman equation without explicit inf H in stochastic optimal controls, arXiv preprint arXiv:2405.03169.
- [15] J.-P. Fouque, Z. Zhang, Deep learning methods for mean field control problems with delay, Frontiers in Applied Mathematics and Statistics 6 (2020) 11.
- [16] R. Carmona, M. Lauriére, Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games II: the finite horizon case, Annals of Applied Probability 32 (6) (2022) 4065–4105.
- [17] S. Jin, S. Peng, Y. Peng, X. Zhang, Solving stochastic optimal control problem via stochastic maximum principle with deep learning method, Journal of Scientific Computing 93 (2022) 30.
- [18] C. Huré, H. Pham, A. Bachouch, N. Langrené, Deep neural networks algorithms for stochastic control problems on finite horizon: Convergence analysis, SIAM Journal on Numerical Analysis 59 (1) (2021) 525–557.
- [19] A. Bachouch, C. Huré, N. Langrené, H. Pham, Deep neural networks algorithms for stochastic control problems on finite horizon: Numerical applications, Methodology and Computing in Applied Probability 24 (1) (2022) 143–178.
- [20] M. Pereira, Z. Wang, T. Chen, E. Reed, E. Theodorou, Feynman-Kac neural network architectures for stochastic control using second-order fbsde theory, in: Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.
- [21] N. Nüsken, L. Richter, Solving high-dimensional Hamilton–Jacobi–Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space, Partial Differential Equations and Applications 2 (4) (2021) 48.
- [22] M. Hua, M. Laurière, E. Vanden-Eijnden, A simulation-free deep learning approach to stochastic optimal control, arXiv preprint arXiv:2410.05163.
- [23] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707.
- [24] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-informed machine learning, Nature Reviews Physics 3 (6) (2021) 422–440.
- [25] K. Chung, R. Williams, Introduction to Stochastic Integration, Springer, 2013.
- [26] J. Yong, X. Y. Zhou, Stochastic Controls, Springer, 1991.
- [27] P. E. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations, Springer, 1999.