Learning to Summarize with Human Feedback

Home » News, Insights & Trends » Artificial Intelligence Insights & Trends » Learning to Summarize with Human Feedback

We’ve applied reinforcement learning from human feedback to train language models that are better at summarization. Our models generate summaries that are better than summaries from 10x larger models trained only with supervised learning. Even though we train our models on the Reddit TL;DR dataset, the same models transfer to generate good summaries of CNN/DailyMail news articles without any further fine-tuning. Our techniques are not specific to summarization; in the long run, our goal is to make aligning AI systems with human preferences a central component of AI research and deployment in many domains.

Read paperView codeView samples

…

"The Power of AI in Business and Entrepreneurship: Unlocking Opportunities and Driving Success"

"The Power of AI: Revolutionizing Business and Empowering Entrepreneurs"

Optimize your inference jobs using dynamic batch inference with TorchServe on Amazon SageMaker

Graph-based recommendation system with Neptune ML: An illustration on social network link prediction...