Skip to content

IanSteenstra/Task-Specific-Dialogue-RLAIF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Parameter-Efficient RLAIF with User Simulations for Task-Specific Dialogue Agents

This research introduces a novel method for training specialized dialogue agents. Leveraging separate Large Language Models (LLMs) for user simulation and reward modeling, RLAIF provides automated, scalable feedback, eliminating the reliance on scarce and costly human input. This, combined with parameter-efficient fine-tuning (PEFT) via LoRA, enables the efficient training of highly effective, adaptable, and robust dialogue agents for diverse applications where collecting human feedback is impractical.

About

Parameter-Efficient RLAIF with User Simulations for Task-Specific Dialogue Agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published