Trust region policy optimisation in multi-agent reinforcement learning JG Kuba, R Chen, M Wen, Y Wen, F Sun, J Wang, Y Yang International Conference on Learning Representations 2022, 2021 | 246 | 2021 |
Multi-agent reinforcement learning is a sequence modeling problem M Wen, J Kuba, R Lin, W Zhang, Y Wen, J Wang, Y Yang Advances in Neural Information Processing Systems 35, 16509-16521, 2022 | 179 | 2022 |
Safe multi-agent reinforcement learning for multi-robot control S Gu, JG Kuba, Y Chen, Y Du, L Yang, A Knoll, Y Yang Artificial Intelligence 319, 103905, 2023 | 104* | 2023 |
Idql: Implicit q-learning as an actor-critic method with diffusion policies P Hansen-Estruch, I Kostrikov, M Janner, JG Kuba, S Levine arXiv preprint arXiv:2304.10573, 2023 | 100 | 2023 |
Discovered policy optimisation C Lu, J Kuba, A Letcher, L Metz, C Schroeder de Witt, J Foerster Advances in Neural Information Processing Systems 35, 16455-16468, 2022 | 75 | 2022 |
Settling the variance of multi-agent policy gradients JG Kuba, M Wen, L Meng, H Zhang, D Mguni, J Wang, Y Yang Advances in Neural Information Processing Systems 34, 13458-13470, 2021 | 62 | 2021 |
Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl JG Kuba, X Feng, S Ding, H Dong, J Wang, Y Yang arXiv preprint arXiv:2208.01682, 2022 | 47* | 2022 |
Mirror learning: A unifying framework of policy optimisation J Grudzien, CAS De Witt, J Foerster International Conference on Machine Learning, 7825-7844, 2022 | 24* | 2022 |
Understanding value decomposition algorithms in deep cooperative multi-agent reinforcement learning Z Dou, JG Kuba, Y Yang arXiv preprint arXiv:2202.04868, 2022 | 8 | 2022 |
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization K Grudzien, M Uehara, S Levine, P Abbeel International Conference on Artificial Intelligence and Statistics, 2449-2457, 2024 | 3 | 2024 |
Cliqueformer: Model-Based Optimization with Structured Transformers JG Kuba, P Abbeel, S Levine arXiv preprint arXiv:2410.13106, 2024 | | 2024 |
Advantage-Conditioned Diffusion: Offline RL via Generalization JG Kuba, P Abbeel, S Levine | | |