nq_eval#

Functions

`compute_f1`	Computes F1, precision, recall for a list of answer scores.
`compute_final_f1`	Computes overall F1 given long and short answers, ignoring scores.
`compute_optimal_metrics`	Computes overall metrics for long and short answers for their respective optimal thresholds :param long_answer_stats: List of long answer scores.
`compute_pr_curves`	Computes PR curve and returns R@P for specific targets.
`extract_metrics_at_optimal_threshold`	param answer_stats one of the dictionaries returned from score_answers
`get_metrics_as_dict`	Library version of the end-to-end evaluation.
`get_metrics_with_answer_stats`	Generate metrics dict using long and short answer stats.
`load_gt_lookup_as_dict`
`pretty_print`
`print_r_at_p_table`	Pretty prints the R@P table for default targets.
`safe_divide`	Compute x / y, but return 0 if y is zero.
`score_answers`	Scores all answers for all documents.
`score_long_answer`	Scores a long answer as correct or not.
`score_short_answer`	Scores a short answer as correct or not.