Tutorial: Generative QA with Retrieval Augmented Generation#
In this tutorial, you’ll learn how to run generative question answering by connecting a retriever to a generative LLM. You’ll also learn how to use prompts with a generative model to tune your answers. The system should also generate a response like “Unanswerable” if no evidence is found.
You can plug-and-play this tutorial with most models on the HuggingFace model hub and also OpenAI LLMs. Some supported models include: - FLAN UL2-20B - FLAN T5 - Open AI ChatGPT (gpt-3.5-turbo) - InstructGPT(text-davinci-003) - lots more..
Step 0: Prepare a Colab Environment to run this tutorial on GPUs#
Make sure to “Enable GPU Runtime” by following this url. This step will make sure the tutorial runs faster.
Step 1: Install PrimeQA#
First, we need to include the required modules.
[ ]:
! pip install --upgrade primeqa
Step 2: Initialize the Retriever#
Pre-process your document collection here to be ready to be stored in your Neural Search Index.#
In this step we download a publicly available .csv file from a Google Drive location and save it as .tsv.
[ ]:
# save your input document as a .tsv
import pandas as pd
url='https://drive.google.com/file/d/1LULJRPgN_hfuI2kG-wH4FUwXCCdDh9zh/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url)
df.to_csv('input.tsv', sep='\t', columns = ['text', 'title'])
Initialize the model. In PrimeQA we use the SearchableCorpus class for searching through your corpus.#
For DPR, you need to point to a question and context encoder models available via the HuggingFace model hub.
[ ]:
from primeqa.components import SearchableCorpus
retriever = SearchableCorpus(context_encoder_name_or_path="PrimeQA/XOR-TyDi_monolingual_DPR_ctx_encoder",
query_encoder_name_or_path="PrimeQA/XOR-TyDi_monolingual_DPR_qry_encoder",
batch_size=64, top_k=10)
Add your documents into the searchable corpus.#
The input.tsv file can be added to the searchable corpus and it assumes the following format as needed by DPR:
id \t text \t title_of_document
Note: since DPR is based on an encoder language model the typical sequence length is 512 max sub-word tokens. Make sure your documents are split into text length of ~220 words.
[ ]:
retriever.add_documents("input.tsv")
Step 3: Initialize the Reader#
In this step you can use a generative LLM which can be prompted. This reader can be any of the generative models available in the HuggingFace model hub or OpenAI models.
[ ]:
from primeqa.components import GenerativeReader
reader = GenerativeReader(model_type='HuggingFace', model_name='google/flan-t5-small')
# setup an OpenAI generative reader : we support gpt-3.5-turbo and text-davinci-003
# reader = GenerativeReader(model_type='OpenAI', model_name='gpt-3.5-turbo', api_key='API KEY HERE')
Step 4: Setup the RAG pipeline#
Attach a retriever to a generative LLM. You can then prompt it to answer questions.
[ ]:
from primeqa.pipelines import RAG
pipeline = RAG(retriever, reader)
Step 5: Start asking questions#
We “run” the pipeline we just created and also attach a prompt.
[ ]:
questions = ['When was Idaho split in two?' , 'Who was Danny Nozel']
prompt_prefix = "Answer the following question after looking at the text."
answers = pipeline.run(questions, prefix=prompt_prefix)
[ ]:
import pandas as pd
from IPython.display import display, HTML
output = pd.DataFrame.from_records(answers)
display(HTML(output.to_html()))
Congratulations 🎉✨🎊🥳 !! You can now perform retrieve and generate (RAG) with PrimeQA!