primeqa.qg.processors.hybrid_qg.hybridqa_processor.HybridQAProcessor#

class primeqa.qg.processors.hybrid_qg.hybridqa_processor.HybridQAProcessor(tokenizer=None, input_max_len=512, target_max_len=20)#

Bases: object

Methods

convert_to_features

tokenizes and converts the raw hybrid chains to tensors

get_candidate_hybrid_chains

Extracts reasoning paths from a processed table row.

hybrid_chain_to_t5_sequence

converts a hybrid chain to a T5 format by adding special tokens to seperate different parts of a hybrid chain.

hybrid_chains

Extracts reasoning paths from a processed dataset.

link_sents_to_cells

Split passages in to sentences.

preprocess_data

preprocess_hybridqa_data

identify passages linked to cells in the rows of a table.

__call__(dataset) datasets.arrow_dataset.Dataset#

Call self as a function.

convert_to_features(example_batch: Dict)#

tokenizes and converts the raw hybrid chains to tensors

get_candidate_hybrid_chains(row, answer_node, num_hops=3, beam_size=5)#

Extracts reasoning paths from a processed table row.

Parameters
  • row (dict) – A dictionary of a processed table row.

  • answer_node (str) – Gold answer.

  • num_hops (int) – Number of hops to reach an answer from the question over the entity graph.

hybrid_chain_to_t5_sequence(qdict, chain_id=0)#

converts a hybrid chain to a T5 format by adding special tokens to seperate different parts of a hybrid chain.

hybrid_chains(data, beam_size=10, num_hops=[3, 4], num_chains_per_hops=4)#

Extracts reasoning paths from a processed dataset.

Parameters

dataset (dict) – A processed dataset where table cell text is linked to its processed passages.

Returns

A list of sampled hybrid chains questions (list): A list of corresponding questions

Return type

chains (list)

Split passages in to sentences. Sentences become nodes with ‘text’ and ‘link’

preprocess_hybridqa_data(*args)#

identify passages linked to cells in the rows of a table. make a nice dict to store this “structured fused block”