primeqa.calibration.confidence_scorer.ConfidenceScorer#

class primeqa.calibration.confidence_scorer.ConfidenceScorer(confidence_model_path=None)#

Bases: object

Class for confidence scoring.

Methods

make_features

Make confidence features from the predictions (top-k answers) of an example.

make_training_data

Make training data from prediction file and reference file for confidence model training.

model_exists

Check if confidence model exists

predict_scores

Compute confidence score for each answer in the top-k predictions.

reference_prediction_overlap

Calculate the F1-style overlap score between ground truth and prediction.

classmethod make_features(example_predictions) list#

Make confidence features from the predictions (top-k answers) of an example.

Parameters
  • example_predictions – Top-k answers generated by postprocessor ExtractivePostProcessor.

  • contains (Each) –

    ‘example_id’, ‘cls_score’, ‘start_logit’, ‘end_logit’, ‘span_answer’: {

    ”start_position”, “end_position”,

    }, ‘span_answer_score’, ‘start_index’, ‘end_index’, ‘passage_index’, ‘target_type_logits’, ‘span_answer_text’, ‘yes_no_answer’, ‘start_stdev’, ‘end_stdev’, ‘query_passage_similarity’

Returns

List of features used for confidence scoring.

classmethod make_training_data(prediction_file: str, reference_file: str, overlap_threshold: float = 0.5) tuple#

Make training data from prediction file and reference file for confidence model training.

Parameters
  • prediction_file – File containing QA result generated by evaluate() of MRC trainer (i.e. eval_predictions.json).

  • reference_file – File containing the ground truth generated by evaluate() of MRC trainer (i.e. eval_references.json).

  • overlap_threshold – Threshold to determine if a prediction is accepted as correct answer.

Returns

Array of features. Y: Array of class label (0: incorrect, 1: correct).

Return type

X

model_exists() bool#

Check if confidence model exists

predict_scores(example_predictions) list#

Compute confidence score for each answer in the top-k predictions.

Parameters

example_predictions – Top-k answers generated by postprocessor ExtractivePostProcessor.

Returns

List of scores for each of the top-k answers.

classmethod reference_prediction_overlap(ground_truth, prediction) float#

Calculate the F1-style overlap score between ground truth and prediction.

Parameters
  • ground_truth – List of ground truth each containing “start_position” and “end_position”.

  • prediction – Prediction containing “start_position” and “end_position”.

Returns

Overlap score between ground truth and prediction.