OPD学习笔记 学习OPD并复现。参考资料https://github.com/david-xinyuwei/david-share/blob/master/DL-Algorithm-Insights/Multi-Expert-OPD-Distillation/README-CN.mdhttps://github.com/david-xinyuwei/david-share/tree/master/DL-Algorithm-Insights。一些启发作者讨论的“为什么是on-policy 而不是 sft”见https://github.com/david-xinyuwei/david-share/blob/master/DL-Algorithm-Insights/Multi-Expert-OPD-Distillation/README-CN.md “vs SFTSupervised Fine-Tuning—— Exposure Bias 问题”