primeqa.mrc.run_mrc_utils.process_raw_datasets#
- primeqa.mrc.run_mrc_utils.process_raw_datasets(raw_datasets, preprocessors, training_args, split='train', max_samples=None)#
Process datasets into features.
- Parameters
raw_datasets – list of datasets to be processed.
preprocessors – list of preprocessors for featurization.
training_args – training arguments.
split – split of datasets for logging use.
max_samples – number of examples of each dataset to be processed.
- Returns
list of raw datasets truncated with max_samples. feature_datasets: list of feature datasets.
- Return type
example_datasets