training#

Functions

align

re-order teacher output tokens so that it aligns with student output tokens with greedy search

calculate_distance

calculate the distance between student output tokens and teacher output tokens

set_bert_grad

train