On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification. DFT reframes supervised fine-tuning through reward rectification, achieving better generalization in language model alignment.
Integrated by ms-swift, trl, and llama-factory — three of the most widely-used LLM fine-tuning frameworks.