Seguir
Philip Thomas
Título
Citado por
Citado por
Año
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill
International Conference on Machine Learning, 2139-2148, 2016
7582016
Value function approximation in reinforcement learning using the Fourier basis
G Konidaris, S Osentoski, P Thomas
Proceedings of the AAAI conference on artificial intelligence 25 (1), 380-385, 2011
5562011
High-confidence off-policy evaluation
P Thomas, G Theocharous, M Ghavamzadeh
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
3342015
High confidence policy improvement
P Thomas, G Theocharous, M Ghavamzadeh
International Conference on Machine Learning, 2380-2388, 2015
2242015
Preventing undesirable behavior of intelligent machines
P Thomas, B Castro da Silva, A Barto, S Giguere, Y Brun, E Brunskill
Science 366 (6468), 999-1004, 2019
2082019
Ad recommendation systems for life-time value optimization
G Theocharous, PS Thomas, M Ghavamzadeh
Proceedings of the 24th international conference on world wide web, 1305-1310, 2015
2072015
Learning action representations for reinforcement learning
Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas
International conference on machine learning, 941-950, 2019
2052019
Increasing the action gap: New operators for reinforcement learning
MG Bellemare, G Ostrovski, A Guez, P Thomas, R Munos
Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016
1812016
Bias in natural actor-critic algorithms
P Thomas
International conference on machine learning, 441-448, 2014
1642014
Safe reinforcement learning
PS Thomas
1262015
Is the policy gradient a gradient?
C Nota, PS Thomas
arXiv preprint arXiv:1906.07073, 2019
802019
Evaluating the performance of reinforcement learning algorithms
S Jordan, Y Chandak, D Cohen, M Zhang, P Thomas
International Conference on Machine Learning, 4962-4973, 2020
762020
Optimizing for the future in non-stationary mdps
Y Chandak, G Theocharous, S Shankar, M White, S Mahadevan, ...
International Conference on Machine Learning, 1414-1425, 2020
742020
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards
KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch
IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017
712017
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces
S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ...
arXiv preprint arXiv:1405.6757, 2014
712014
Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing
P Thomas, G Theocharous, M Ghavamzadeh, I Durugkar, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 31 (2), 4740-4745, 2017
682017
Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines
PS Thomas, E Brunskill
arXiv preprint arXiv:1706.06643, 2017
662017
Importance Sampling for Fair Policy Selection.
S Doroudi, PS Thomas, E Brunskill
Grantee Submission, 2017
622017
Offline contextual bandits with high probability fairness guarantees
B Metevier, S Giguere, S Brockman, A Kobren, Y Brun, E Brunskill, ...
Advances in neural information processing systems 32, 2019
602019
Risk Quantification for Policy Deployment
PS Thomas, G Theocharous, M Ghavamzadeh
US Patent App. 14/552,047, 2016
592016
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–20