Program : Dual Degree (Integrated Bachelors and Masters)
Advisor : Dr.Ron Parr, Dr. Balaraman Ravindan
Areas of Interest : Machine Learning, Reinforcement Learning
Research Topic : Learning Lipschitz Continuous Value Functions via Induced MDPs
Non-parametric approximate linear programming (NP-ALP) is a sample-based approach to value function approximation for continuous state and action MDPs, which requires only a distance function and an estimate of the Lipschitz constant of the value function. Using these inputs and a set of training data, NP-ALP finds a value function that is consistent with the specified Lipschitz constant. Previous theoretical work provided appealing sample complexity and approximation error bounds for this technique, but the method required solving a linear program in which the number of constraints scaled quadratically with the number of training samples. We decouple the underlying optimization task from the linear programming solution. In doing so, we open the possibility of new and more efficient approaches to achieve the same objective. We present dynamic programming approaches - a variation on the value iteration and policy iteration algorithms - whose fixed point is the same as the NP-ALP solution. These contributions greatly lower both conceptual and resource barriers to achieving optimal Lipschitz continuous value functions. We show dramatic speedups relative to NP-ALP on synthetic problems over an out-of-the box linear program solver and even a specialized constraint generation approach.
Courses Completed: Machine Learning, Data Mining, Social Network Analysis, Reinforcement Learning, Natural Language Processing, Kernel Methods for Pattern Analysis, Foundations of Data Science.