primeqa.ir.util.corpus_reader.DocumentCollection#

class primeqa.ir.util.corpus_reader.DocumentCollection(input_files: Union[str, bytes, os.PathLike], fieldnames=None)#

Bases: object

Methods

`add_document_text_to_hit`	Look up and add document text/title to the hits
`load_corpus`	Load the corpus tsv/csv or json
`write_corpus_tsv`	Write out the corpus in a format ready for indexing.

add_document_text_to_hit(hits: list)#

Look up and add document text/title to the hits

Parameters

hits – list of (document_id, score) tuples

Returns

list of dict {

’document’: document_dict, ‘score’: score

}

Return type

list[dict]

write_corpus_tsv(output_file: str)#

Write out the corpus in a format ready for indexing.

Parameters: output_file (str) – tsv file where each row is in format ‘id text itle’