Tutorial: Generative QA with Retrieval Augmented Generation#

In this tutorial, you’ll learn how to run generative question answering by connecting a retriever to a generative LLM. You’ll also learn how to use prompts with a generative model to tune your answers. The system should also generate a response like “Unanswerable” if no evidence is found.

You can plug-and-play this tutorial with most models on the HuggingFace model hub and also OpenAI LLMs. Some supported models include: - FLAN UL2-20B - FLAN T5 - Open AI ChatGPT (gpt-3.5-turbo) - InstructGPT(text-davinci-003) - lots more..

Step 0: Prepare a Colab Environment to run this tutorial on GPUs#

Make sure to “Enable GPU Runtime” by following this url. This step will make sure the tutorial runs faster.

Step 1: Install PrimeQA#

First, we need to include the required modules.

[ ]:
! pip install --upgrade primeqa

Step 2: Initialize the Retriever#

Pre-process your document collection here to be ready to be stored in your Neural Search Index.#

In this step we download a publicly available .csv file from a Google Drive location and save it as .tsv.

[ ]:
# save your input document as a .tsv
import pandas as pd
url='https://drive.google.com/file/d/1LULJRPgN_hfuI2kG-wH4FUwXCCdDh9zh/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url)
df.to_csv('input.tsv', sep='\t', columns = ['text', 'title'])

Initialize the model. In PrimeQA we use the SearchableCorpus class for searching through your corpus.#

For DPR, you need to point to a question and context encoder models available via the HuggingFace model hub.

[ ]:
from primeqa.components import SearchableCorpus
retriever = SearchableCorpus(context_encoder_name_or_path="PrimeQA/XOR-TyDi_monolingual_DPR_ctx_encoder",
                             query_encoder_name_or_path="PrimeQA/XOR-TyDi_monolingual_DPR_qry_encoder",
                             batch_size=64, top_k=10)

Add your documents into the searchable corpus.#

The input.tsv file can be added to the searchable corpus and it assumes the following format as needed by DPR:

id \t text \t title_of_document

Note: since DPR is based on an encoder language model the typical sequence length is 512 max sub-word tokens. Make sure your documents are split into text length of ~220 words.

[ ]:
retriever.add_documents("input.tsv")

Step 3: Initialize the Reader#

In this step you can use a generative LLM which can be prompted. This reader can be any of the generative models available in the HuggingFace model hub or OpenAI models.

[ ]:
from primeqa.components import GenerativeReader

reader = GenerativeReader(model_type='HuggingFace', model_name='google/flan-t5-small')
# setup an OpenAI generative reader : we support gpt-3.5-turbo and text-davinci-003
# reader = GenerativeReader(model_type='OpenAI', model_name='gpt-3.5-turbo', api_key='API KEY HERE')

Step 4: Setup the RAG pipeline#

Attach a retriever to a generative LLM. You can then prompt it to answer questions.

[ ]:
from primeqa.pipelines import RAG
pipeline = RAG(retriever, reader)

Step 5: Start asking questions#

We “run” the pipeline we just created and also attach a prompt.

[ ]:
questions = ['When was Idaho split in two?' , 'Who was Danny Nozel']
prompt_prefix = "Answer the following question after looking at the text."

answers = pipeline.run(questions, prefix=prompt_prefix)
[ ]:
import pandas as pd
from IPython.display import display, HTML

output = pd.DataFrame.from_records(answers)
display(HTML(output.to_html()))

Congratulations 🎉✨🎊🥳 !! You can now perform retrieve and generate (RAG) with PrimeQA!