Application Structure

The app is organized into the following key components:

1. LLM Setup

Ollama LLMs: Two options are provided: Llama3 and Mistral. The user selects the model via a dropdown menu. The selected model is instantiated with Ollama from the langchain_community.llms module.

llm_options = {
"Llama3": Ollama(model="llama3", base_url="http://127.0.0.1:11434"),
"Mistral": Ollama(model="mistral", base_url="http://127.0.0.1:11434"),
"salmatrafi/acegpt:7b": Ollama(model="salmatrafi/acegpt:7b", base_url="http://127.0.0.1:11434"),
 }

2. Embedding and Vector Store

Embedding Model: OllamaEmbeddings generates vector embeddings for storing and retrieving document chunks.

embed_model = OllamaEmbeddings(
model =selected_llm_name.split()[0].lower(),
base_url = "http://127.0.0.1:11434"
  )

Vector Store: Chroma is used to store processed document embeddings persistently.

persist_directory = r"C:\Users\PC\anaconda3\envs\rag_env\Lib\site-packages\chromadb"
vector_store = Chroma(persist_directory=persist_directory, embedding_function = embed_model)

3. Document Processing

PDFs are uploaded and processed using: PyPDFLoader: Loads content from PDF files. RecursiveCharacterTextSplitter: Splits documents into chunks of 500 characters for indexing. Processed chunks are added to the Chroma vector store.


4. Retrieval-Augmented Generation (RAG) Chain

Retriever: Fetches relevant documents from the vector store using similarity search.

retriever = vector_store.as_retriever(search_kwargs={"k":5})

QA Chain: Combines retrieved documents with the input query to generate context-aware answers using the selected LLM.

def custom_qa_prompt(context,input):
if context:
    return f"Voici le contexte extrait du PDF : {context}. Maintenant, réponds à la question : {input}"
else:
    return f"Je n'ai trouvé aucun contexte pertinent dans le PDF. donne une réponse courte et directesur la question suivante : {input}"

# Wrap the custom_qa_prompt in a PromptTemplate
qa_prompt_template = PromptTemplate(
    input_variables=["input", "context"],
    template=custom_qa_prompt("{input}", "{context}")
)

5. Streamlit Interface

File Uploader: Handles PDF uploads. Text Input: Accepts user questions. Button Actions: Triggers PDF processing or question answering.