primeqa.qg.processors.hybrid_qg.hybridqa_processor.HybridQAProcessor#
- class primeqa.qg.processors.hybrid_qg.hybridqa_processor.HybridQAProcessor(tokenizer=None, input_max_len=512, target_max_len=20)#
Bases:
object
Methods
tokenizes and converts the raw hybrid chains to tensors
Extracts reasoning paths from a processed table row.
converts a hybrid chain to a T5 format by adding special tokens to seperate different parts of a hybrid chain.
Extracts reasoning paths from a processed dataset.
Split passages in to sentences.
preprocess_data
identify passages linked to cells in the rows of a table.
- __call__(dataset) datasets.arrow_dataset.Dataset #
Call self as a function.
- convert_to_features(example_batch: Dict)#
tokenizes and converts the raw hybrid chains to tensors
- get_candidate_hybrid_chains(row, answer_node, num_hops=3, beam_size=5)#
Extracts reasoning paths from a processed table row.
- Parameters
row (dict) – A dictionary of a processed table row.
answer_node (str) – Gold answer.
num_hops (int) – Number of hops to reach an answer from the question over the entity graph.
- hybrid_chain_to_t5_sequence(qdict, chain_id=0)#
converts a hybrid chain to a T5 format by adding special tokens to seperate different parts of a hybrid chain.
- hybrid_chains(data, beam_size=10, num_hops=[3, 4], num_chains_per_hops=4)#
Extracts reasoning paths from a processed dataset.
- Parameters
dataset (dict) – A processed dataset where table cell text is linked to its processed passages.
- Returns
A list of sampled hybrid chains questions (list): A list of corresponding questions
- Return type
chains (list)
- link_sents_to_cells(qdict)#
Split passages in to sentences. Sentences become nodes with ‘text’ and ‘link’
- preprocess_hybridqa_data(*args)#
identify passages linked to cells in the rows of a table. make a nice dict to store this “structured fused block”