Reinforcement Learning Techniques for Large Language Models

What is Reinforcement Learning? How does it help in Generative AI or Large Language Models? What is RLHF? Key Algorithms Used for RLHF PPO DPO GRPO What is RLAIF (Reinforcement Learning from AI Feedback)? Key Algorithms used for RLAIF

December 25, 2025 · 1 min · Anil Kumar

Paper Summary: Do LLMs Understand User Preferences? Evaluating LLMs on User Rating Prediction

This is a summary post of this paper: https://arxiv.org/abs/2305.06474 Why this paper/ Goal of this paper? As LLMs have the following properties: Large-scale knowledge and real-world information Strong generalization ability through effective few-shot learning Strong reasons capability with chain-of-thought, self-consistency, etc. Key question answered by this paper: Can we use LLMs for recommender systems? RQ1: Do off-the-shelf LLMs perform well for zero-shot and few-shot recommendations? RQ2: How do LLMs compare with the traditional recommenders in a fair setting?...

January 26, 2024 · 3 min · Anil Kumar

All About Gemini Models and Training Process

Gemini is a family of large language models (LLMs) developed by Google AI. They are designed to be more powerful and flexible than previous LLMs, capable of handling a wider range of tasks and data formats. Here’s a detailed breakdown of key aspects: Model(s) Architecture: Multimodal Encoders: Encoders for specific data types i.e. text, audio, image to convert data to a common representation that the model can understand. Early/Late Fusion: Combines the output from various encoders and allows the model to learn the relationship between different modalities....

December 3, 2023 · 3 min · Anil Kumar