What is Reinforcement Learning?

How does it help in Generative AI or Large Language Models?

What is RLHF?

Key Algorithms Used for RLHF

  • PPO
  • DPO
  • GRPO

What is RLAIF (Reinforcement Learning from AI Feedback)?

Key Algorithms used for RLAIF