Virginia Tech® home

ECE 5474 - Mathematical Foundation for Reinforcement Learning (3C)

Course Description

Reinforcement Learning for solving Markov Decision Processes (MDPs). Discounted and average MDPs. Dynamic programming including policy evaluation and policy iteration. Stochastic approximation and optimization. Temporal-difference learning, Q-learning, policy gradient methods, and actor critic methods. Function approximation, neural networks, stochastic gradient descent, and backpropagation.

Why take this course?

Given the rapid development of technology and the huge amount of collected data, reinforcement learning becomes more and more important in addressing data-driven decision-making problems. Indeed, during the last few years we have witnessed a great success of using reinforcement learning in solving problems in many areas, including robotics, self-driving cars, health care, power systems, and finance. This course will provide a foundation of mathematical principles, concepts and tools essential for graduate students in learning and applying reinforcement learning for solving challenging real-world problems.

Learning Objectives

  • Formulate practical problems (e.g., stochastic control, finance, resource allocation) as Markov decision processes.
  • Apply reinforcement learning algorithms for solving Markov decision processes.
  • Assess reinforcement learning algorithms using software tools such as Python and OpenAi Gym to simulate complex Markov decision processes.
  • Analyze convergence properties of reinforcement learning algorithms including Q-learning, policy gradient, temporal difference learning.