RL入门资料
2024-1-19
| 2024-1-19
字数 173阅读时长 1 分钟
type
status
date
slug
summary
tags
category
icon
password
Property
Jan 19, 2024 11:42 AM

大佬带读系列

 
大佬做的RL论文的中文笔记:
 
 

入门经典RL文章

Value-based methods

一般用于解决离散动作空间问题
  • DQN
    • Playing Atari with Deep Reinforcement Learning (2013)
    • Human-level control through deep reinforcement learning (2015, 发布在Nature上)
  • Double DQN (DDQN)
    • Deep Reinforcement Learning with Double Q-learning
  • Dueling DQN
    • Dueling Network Architectures for Deep Reinforcement Learning

Policy-based methods

可离散或连续动作空间
 
stochastic policy:
  • A3C:
    • Asynchronous Methods for Deep Reinforcement Learning (2016)
  • TRPO:
    • Trust Region Policy Optimization (2015)
  • PPO:
    • Proximal Policy Optimization Algorithms (2017/08 v2)
  • SAC:
    • Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor.
    • Soft Actor-Critic Algorithms and Applications
    • Soft Actor-Critic for Discrete Action Settings
 
deterministic policy:
  • DDPG
    • Continuous control with deep reinforcement learning (ICLR 2016, Deepmind)
  • TD3
    • Addressing Function Approximation Error in Actor-Critic Methods
 

Tricks

  • GAE
    • High-Dimensional Continuous Control Using Generalized Advantage Estimation
  • Retrace
    • Safe and efficient off-policy reinforcement learning (DeepMind 2016)
  • ML
  • Partial episode bootstrapping (PEB)编译OpenWRT教程(下篇)
    Loading...