primeqa.mrc.metrics.squad.squad.SQUAD#

class primeqa.mrc.metrics.squad.squad.SQUAD(config_name: Optional[str] = None, keep_in_memory: bool = False, cache_dir: Optional[str] = None, num_process: int = 1, process_id: int = 0, seed: Optional[int] = None, experiment_id: Optional[str] = None, max_concurrent_cache_files: int = 10000, timeout: Union[int, float] = 100, **kwargs)#

Bases: datasets.metric.Metric

This metric wrap the official scoring script for version 1 of the Stanford Question Answering Dataset (SQuAD).

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Computes SQuAD scores (F1 and EM). :param predictions: List of question-answers dictionaries with the following key-values:

‘id’: id of the question-answer pair as given in the references (see below)

‘prediction_text’: the text of the answer

Parameters

references –

List of question-answers dictionaries with the following key-values: - ‘id’: id of the question-answer pair (see above), - ‘answers’: a Dict in the SQuAD dataset format

{
‘text’: list of possible texts for the answer, as a list of strings ‘answer_start’: list of start positions for the answer, as a list of ints

} Note that answer_start values are not taken into account to compute the metric.

Returns

Exact match (the normalized answer exactly match the gold answer) ‘f1’: The F-score of predicted tokens versus the gold answer

Return type

‘exact_match’

Examples

>>> predictions = [{'prediction_text': '1976', 'id': '56e10a3be3433e1400422b22'}]
>>> references = [{'answers': {'answer_start': [97], 'text': ['1976']}, 'id': '56e10a3be3433e1400422b22'}]
>>> squad_metric = datasets.load_metric("squad")
>>> results = squad_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'exact_match': 100.0, 'f1': 100.0}

Methods

`add`	Add one prediction and reference for the metric's stack.
`add_batch`	Add a batch of predictions and references for the metric's stack.
`compute`	Compute the metrics.
`download_and_prepare`	Downloads and prepares dataset for reading.

Attributes

`citation`
`codebase_urls`
`description`
`experiment_id`
`features`
`format`
`homepage`
`info`	`datasets.MetricInfo` object containing all the metadata in the metric.
`inputs_description`
`license`
`name`
`reference_urls`
`streamable`

add(*, prediction=None, reference=None, **kwargs)#

Add one prediction and reference for the metric’s stack.

Parameters

prediction (list/array/tensor, optional) – Predictions.
reference (list/array/tensor, optional) – References.

add_batch(*, predictions=None, references=None, **kwargs)#

Add a batch of predictions and references for the metric’s stack.

Parameters

predictions (list/array/tensor, optional) – Predictions.
references (list/array/tensor, optional) – References.

compute(*, predictions=None, references=None, **kwargs) → Optional[dict]#

Compute the metrics.

Usage of positional arguments is not allowed to prevent mistakes.

Parameters

predictions (list/array/tensor, optional) – Predictions.
references (list/array/tensor, optional) – References.
**kwargs (optional) – Keyword arguments that will be forwarded to the metrics _compute() method (see details in the docstring).

Returns

dict or None

Dictionary with the metrics if this metric is run on the main process (process_id == 0).
None if the metric is not run on the main process (process_id != 0).

download_and_prepare(download_config: Optional[datasets.utils.file_utils.DownloadConfig] = None, dl_manager: Optional[datasets.utils.download_manager.DownloadManager] = None)#

Downloads and prepares dataset for reading.

Parameters

download_config (DownloadConfig, optional) – Specific download configuration parameters.
dl_manager (DownloadManager, optional) – Specific download manager to use.

property info#: datasets.MetricInfo object containing all the metadata in the metric.