primeqa.mrc.metrics.tydi_f1.tydi_f1.TyDiF1#

class primeqa.mrc.metrics.tydi_f1.tydi_f1.TyDiF1(config_name: Optional[str] = None, keep_in_memory: bool = False, cache_dir: Optional[str] = None, num_process: int = 1, process_id: int = 0, seed: Optional[int] = None, experiment_id: Optional[str] = None, max_concurrent_cache_files: int = 10000, timeout: Union[int, float] = 100, **kwargs)#

Bases: datasets.metric.Metric

The F1 score is the harmonic mean of the precision and recall. It can be computed with: F1 = 2 * (precision * recall) / (precision + recall). This implementation of F1 is based on the TyDi QA leaderboard.

Adapted from https://github.com/google-research-datasets/tydiqa/blob/master/tydi_eval.py.

Parameters
  • predictions – Predicted labels.

  • references – Ground truth labels.

  • passage_non_null_threshold – threshold for number of null annotations annotations to consider the passage answer as null (default=2)

  • span_non_null_threshold – threshold for number of null annotations annotations to consider the span answer as null (default=2)

  • verbose – dump reference and prediction for debugging purposes

Returns: metrics dict comprising:

  • minimal_f1: Minimal Answer F1.

  • minimal_precision: Minimal Answer Precision.

  • minimal_recall: Minimal Answer Recall.

  • passage_f1: Passage Answer F1.

  • passage_precision: Passage Answer Precision.

  • passage_recall: Passage Answer Recall.

Methods

add

Add one prediction and reference for the metric's stack.

add_batch

Add a batch of predictions and references for the metric's stack.

compute

Compute the metrics.

download_and_prepare

Downloads and prepares dataset for reading.

Attributes

citation

codebase_urls

description

experiment_id

features

format

homepage

info

datasets.MetricInfo object containing all the metadata in the metric.

inputs_description

license

name

reference_urls

streamable

add(*, prediction=None, reference=None, **kwargs)#

Add one prediction and reference for the metric’s stack.

Parameters
  • prediction (list/array/tensor, optional) – Predictions.

  • reference (list/array/tensor, optional) – References.

add_batch(*, predictions=None, references=None, **kwargs)#

Add a batch of predictions and references for the metric’s stack.

Parameters
  • predictions (list/array/tensor, optional) – Predictions.

  • references (list/array/tensor, optional) – References.

compute(*, predictions=None, references=None, **kwargs) Optional[dict]#

Compute the metrics.

Usage of positional arguments is not allowed to prevent mistakes.

Parameters
  • predictions (list/array/tensor, optional) – Predictions.

  • references (list/array/tensor, optional) – References.

  • **kwargs (optional) – Keyword arguments that will be forwarded to the metrics _compute() method (see details in the docstring).

Returns

dict or None

  • Dictionary with the metrics if this metric is run on the main process (process_id == 0).

  • None if the metric is not run on the main process (process_id != 0).

download_and_prepare(download_config: Optional[datasets.utils.file_utils.DownloadConfig] = None, dl_manager: Optional[datasets.utils.download_manager.DownloadManager] = None)#

Downloads and prepares dataset for reading.

Parameters
  • download_config (DownloadConfig, optional) – Specific download configuration parameters.

  • dl_manager (DownloadManager, optional) – Specific download manager to use.

property info#

datasets.MetricInfo object containing all the metadata in the metric.