-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
Hello, I wanted to ask a quick question.
From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?
Thanks in advance.
Metadata
Metadata
Assignees
Labels
No labels