Segui
Huizhen Yu
Titolo
Citata da
Citata da
Anno
Convergence results for some temporal difference methods based on least squares
H Yu, DP Bertsekas
IEEE Transactions on Automatic Control 54 (7), 1515-1531, 2009
1382009
Projected equation methods for approximate solution of large linear systems
DP Bertsekas, H Yu
Journal of Computational and Applied Mathematics 227 (1), 27-50, 2009
772009
Error bounds for approximations from projected linear equations
H Yu, DP Bertsekas
Mathematics of Operations Research 35 (2), 306-329, 2010
752010
Discretized approximations for POMDP with average cost
H Yu, D Bertsekas
arXiv preprint arXiv:1207.4154, 2012
612012
Q-learning and enhanced policy iteration in discounted dynamic programming
DP Bertsekas, H Yu
Mathematics of Operations Research 37 (1), 66-94, 2012
602012
On convergence of emphatic temporal-difference learning
H Yu
Conference on learning theory, 1724-1751, 2015
562015
A unifying polyhedral approximation framework for convex optimization
DP Bertsekas, H Yu
SIAM Journal on Optimization 21 (1), 333-360, 2011
542011
Multi-step off-policy learning without importance sampling ratios
AR Mahmood, H Yu, RS Sutton
arXiv preprint arXiv:1702.03006, 2017
532017
Q-learning and policy iteration algorithms for stochastic shortest path problems
H Yu, DP Bertsekas
Annals of Operations Research 208 (1), 95-132, 2013
532013
On near optimality of the set of finite-state controllers for average cost POMDP
H Yu, DP Bertsekas
Mathematics of Operations Research 33 (1), 1-11, 2008
512008
Basis function adaptation methods for cost approximation in MDP
H Yu, DP Bertsekas
2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement …, 2009
492009
Q-learning algorithms for optimal stopping based on least squares
H Yu, DP Bertsekas
2007 European Control Conference (ECC), 2368-2375, 2007
452007
On generalized bellman equations and temporal-difference learning
H Yu, AR Mahmood, RS Sutton
Journal of Machine Learning Research 19 (48), 1-49, 2018
422018
Stochastic shortest path problems under weak conditions
DP Bertsekas, H Yu
Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, 2013
422013
Approximate solution methods for partially observable Markov and semi-Markov decision processes
H Yu
Massachusetts Institute of Technology, 2006
422006
Least squares temporal difference methods: An analysis under general conditions
H Yu
SIAM Journal on Control and Optimization 50 (6), 3310-3343, 2012
412012
Emphatic temporal-difference learning
AR Mahmood, H Yu, M White, RS Sutton
arXiv preprint arXiv:1507.01569, 2015
382015
Convergence of Least Squares Temporal Difference Methods Under General Conditions.
H Yu
ICML, 1207-1214, 2010
382010
On boundedness of Q-learning iterates for stochastic shortest path problems
H Yu, DP Bertsekas
Mathematics of Operations Research 38 (2), 209-227, 2013
342013
On convergence of some gradient-based temporal-differences algorithms for off-policy learning
H Yu
arXiv preprint arXiv:1712.09652, 2017
312017
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–20