type
status
date
slug
summary
tags
category
icon
password
Property
Jan 19, 2024 11:42 AM
大佬带读系列
大佬做的RL论文的中文笔记:
入门经典RL文章
Value-based methods
一般用于解决离散动作空间问题
- DQN
- Playing Atari with Deep Reinforcement Learning (2013)
- Human-level control through deep reinforcement learning (2015, 发布在Nature上)
- Double DQN (DDQN)
- Deep Reinforcement Learning with Double Q-learning
- Dueling DQN
- Dueling Network Architectures for Deep Reinforcement Learning
Policy-based methods
可离散或连续动作空间
stochastic policy:
- A3C:
- Asynchronous Methods for Deep Reinforcement Learning (2016)
- TRPO:
- Trust Region Policy Optimization (2015)
- PPO:
- Proximal Policy Optimization Algorithms (2017/08 v2)
- SAC:
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor.
- Soft Actor-Critic Algorithms and Applications
- Soft Actor-Critic for Discrete Action Settings
deterministic policy:
- DDPG
- Continuous control with deep reinforcement learning (ICLR 2016, Deepmind)
- TD3
- Addressing Function Approximation Error in Actor-Critic Methods
Tricks
- GAE
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Retrace
- Safe and efficient off-policy reinforcement learning (DeepMind 2016)
