Rollout prompts
Student prompt (x)
Teacher hindsight prompt (x')
▶
Judge prompts
System prompt
User prompt
All tokens
Only top T
Only top S (judge)