nq_eval#
Functions
Computes F1, precision, recall for a list of answer scores. |
|
Computes overall F1 given long and short answers, ignoring scores. |
|
Computes overall metrics for long and short answers for their respective optimal thresholds :param long_answer_stats: List of long answer scores. |
|
Computes PR curve and returns R@P for specific targets. |
|
|
|
Library version of the end-to-end evaluation. |
|
Generate metrics dict using long and short answer stats. |
|
Pretty prints the R@P table for default targets. |
|
Compute x / y, but return 0 if y is zero. |
|
Scores all answers for all documents. |
|
Scores a long answer as correct or not. |
|
Scores a short answer as correct or not. |