primeqa.mrc.run_mrc_utils.process_raw_datasets#

primeqa.mrc.run_mrc_utils.process_raw_datasets(raw_datasets, preprocessors, training_args, split='train', max_samples=None)#

Process datasets into features.

Parameters
  • raw_datasets – list of datasets to be processed.

  • preprocessors – list of preprocessors for featurization.

  • training_args – training arguments.

  • split – split of datasets for logging use.

  • max_samples – number of examples of each dataset to be processed.

Returns

list of raw datasets truncated with max_samples. feature_datasets: list of feature datasets.

Return type

example_datasets