primeqa.calibration.confidence_scorer.ConfidenceScorer#

class primeqa.calibration.confidence_scorer.ConfidenceScorer(confidence_model_path=None)#

Bases: object

Class for confidence scoring.

Methods

`make_features`	Make confidence features from the predictions (top-k answers) of an example.
`make_training_data`	Make training data from prediction file and reference file for confidence model training.
`model_exists`	Check if confidence model exists
`predict_scores`	Compute confidence score for each answer in the top-k predictions.
`reference_prediction_overlap`	Calculate the F1-style overlap score between ground truth and prediction.

classmethod make_features(example_predictions) → list#

Make confidence features from the predictions (top-k answers) of an example.

Parameters

example_predictions – Top-k answers generated by postprocessor ExtractivePostProcessor.
contains (Each) –
‘example_id’, ‘cls_score’, ‘start_logit’, ‘end_logit’, ‘span_answer’: {

”start_position”, “end_position”,

}, ‘span_answer_score’, ‘start_index’, ‘end_index’, ‘passage_index’, ‘target_type_logits’, ‘span_answer_text’, ‘yes_no_answer’, ‘start_stdev’, ‘end_stdev’, ‘query_passage_similarity’

Returns

List of features used for confidence scoring.

classmethod make_training_data(prediction_file: str, reference_file: str, overlap_threshold: float = 0.5) → tuple#

Make training data from prediction file and reference file for confidence model training.

Parameters

prediction_file – File containing QA result generated by evaluate() of MRC trainer (i.e. eval_predictions.json).
reference_file – File containing the ground truth generated by evaluate() of MRC trainer (i.e. eval_references.json).
overlap_threshold – Threshold to determine if a prediction is accepted as correct answer.

Returns

Array of features. Y: Array of class label (0: incorrect, 1: correct).

Return type

predict_scores(example_predictions) → list#

Compute confidence score for each answer in the top-k predictions.

Parameters: example_predictions – Top-k answers generated by postprocessor ExtractivePostProcessor.
Returns: List of scores for each of the top-k answers.

classmethod reference_prediction_overlap(ground_truth, prediction) → float#

Calculate the F1-style overlap score between ground truth and prediction.

Parameters

ground_truth – List of ground truth each containing “start_position” and “end_position”.
prediction – Prediction containing “start_position” and “end_position”.

Returns

Overlap score between ground truth and prediction.