Convergence results for some temporal difference methods based on least squares H Yu, DP Bertsekas IEEE Transactions on Automatic Control 54 (7), 1515-1531, 2009 | 138 | 2009 |
Projected equation methods for approximate solution of large linear systems DP Bertsekas, H Yu Journal of Computational and Applied Mathematics 227 (1), 27-50, 2009 | 77 | 2009 |
Error bounds for approximations from projected linear equations H Yu, DP Bertsekas Mathematics of Operations Research 35 (2), 306-329, 2010 | 75 | 2010 |
Discretized approximations for POMDP with average cost H Yu, D Bertsekas arXiv preprint arXiv:1207.4154, 2012 | 61 | 2012 |
Q-learning and enhanced policy iteration in discounted dynamic programming DP Bertsekas, H Yu Mathematics of Operations Research 37 (1), 66-94, 2012 | 60 | 2012 |
On convergence of emphatic temporal-difference learning H Yu Conference on learning theory, 1724-1751, 2015 | 56 | 2015 |
A unifying polyhedral approximation framework for convex optimization DP Bertsekas, H Yu SIAM Journal on Optimization 21 (1), 333-360, 2011 | 54 | 2011 |
Multi-step off-policy learning without importance sampling ratios AR Mahmood, H Yu, RS Sutton arXiv preprint arXiv:1702.03006, 2017 | 53 | 2017 |
Q-learning and policy iteration algorithms for stochastic shortest path problems H Yu, DP Bertsekas Annals of Operations Research 208 (1), 95-132, 2013 | 53 | 2013 |
On near optimality of the set of finite-state controllers for average cost POMDP H Yu, DP Bertsekas Mathematics of Operations Research 33 (1), 1-11, 2008 | 51 | 2008 |
Basis function adaptation methods for cost approximation in MDP H Yu, DP Bertsekas 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement …, 2009 | 49 | 2009 |
Q-learning algorithms for optimal stopping based on least squares H Yu, DP Bertsekas 2007 European Control Conference (ECC), 2368-2375, 2007 | 45 | 2007 |
On generalized bellman equations and temporal-difference learning H Yu, AR Mahmood, RS Sutton Journal of Machine Learning Research 19 (48), 1-49, 2018 | 42 | 2018 |
Stochastic shortest path problems under weak conditions DP Bertsekas, H Yu Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, 2013 | 42 | 2013 |
Approximate solution methods for partially observable Markov and semi-Markov decision processes H Yu Massachusetts Institute of Technology, 2006 | 42 | 2006 |
Least squares temporal difference methods: An analysis under general conditions H Yu SIAM Journal on Control and Optimization 50 (6), 3310-3343, 2012 | 41 | 2012 |
Emphatic temporal-difference learning AR Mahmood, H Yu, M White, RS Sutton arXiv preprint arXiv:1507.01569, 2015 | 38 | 2015 |
Convergence of Least Squares Temporal Difference Methods Under General Conditions. H Yu ICML, 1207-1214, 2010 | 38 | 2010 |
On boundedness of Q-learning iterates for stochastic shortest path problems H Yu, DP Bertsekas Mathematics of Operations Research 38 (2), 209-227, 2013 | 34 | 2013 |
On convergence of some gradient-based temporal-differences algorithms for off-policy learning H Yu arXiv preprint arXiv:1712.09652, 2017 | 31 | 2017 |