primeqa.ir.util.corpus_reader.DocumentCollection#
- class primeqa.ir.util.corpus_reader.DocumentCollection(input_files: Union[str, bytes, os.PathLike], fieldnames=None)#
Bases:
object
Methods
Look up and add document text/title to the hits
Load the corpus tsv/csv or json
Write out the corpus in a format ready for indexing.
- add_document_text_to_hit(hits: list)#
Look up and add document text/title to the hits
- Parameters
hits – list of (document_id, score) tuples
- Returns
list of dict {
’document’: document_dict, ‘score’: score
}
- Return type
list[dict]
- load_corpus()#
Load the corpus tsv/csv or json
- write_corpus_tsv(output_file: str)#
Write out the corpus in a format ready for indexing.
- Parameters
output_file (str) – tsv file where each row is in format ‘id text itle’