nq_eval#

Functions

compute_f1

Computes F1, precision, recall for a list of answer scores.

compute_final_f1

Computes overall F1 given long and short answers, ignoring scores.

compute_optimal_metrics

Computes overall metrics for long and short answers for their respective optimal thresholds :param long_answer_stats: List of long answer scores.

compute_pr_curves

Computes PR curve and returns R@P for specific targets.

extract_metrics_at_optimal_threshold

param answer_stats

one of the dictionaries returned from score_answers

get_metrics_as_dict

Library version of the end-to-end evaluation.

get_metrics_with_answer_stats

Generate metrics dict using long and short answer stats.

load_gt_lookup_as_dict

pretty_print

print_r_at_p_table

Pretty prints the R@P table for default targets.

safe_divide

Compute x / y, but return 0 if y is zero.

score_answers

Scores all answers for all documents.

score_long_answer

Scores a long answer as correct or not.

score_short_answer

Scores a short answer as correct or not.