{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# RAG with LlamaIndex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "LlamaIndex is a very popular framework to help build connections between data sources and LLMs. It is also a top choice when people would like to build an RAG framework. In this tutorial, we will go through how to use LlamaIndex to aggregate bge-base-en-v1.5 and GPT-4o-mini to an RAG application." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Preparation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First install the required packages in the environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install llama-index-llms-openai llama-index-embeddings-huggingface llama-index-vector-stores-faiss\n", "%pip install llama_index " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then fill the OpenAI API key below:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# For openai key\n", "import os\n", "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BGE-M3 is a very powerful embedding model, We would like to know what does that 'M3' stands for.\n", "\n", "Let's first ask GPT the question:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "M3-Embedding stands for Multimodal Multiscale Embedding. It is a technique used in machine learning and data analysis to embed high-dimensional data into a lower-dimensional space while preserving the structure and relationships within the data. This technique is particularly useful for analyzing complex datasets that contain multiple modalities or scales of information.\n" ] } ], "source": [ "from llama_index.llms.openai import OpenAI\n", "\n", "# non-streaming\n", "response = OpenAI().complete(\"What does M3-Embedding stands for?\")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By checking the description in GitHub [repo](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) of BGE-M3, we are pretty sure that GPT is giving us hallucination. Let's build an RAG pipeline to solve the problem!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, download BGE-M3 [paper](https://arxiv.org/pdf/2402.03216) to a directory, and load it through `SimpleDirectoryReader`. \n", "\n", "Note that `SimpleDirectoryReader` can read all the documents under that directory and supports a lot of commonly used [file types](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/#supported-file-types)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import SimpleDirectoryReader\n", "\n", "reader = SimpleDirectoryReader(\"data\")\n", "# reader = SimpleDirectoryReader(\"DIR_TO_FILE\")\n", "documents = reader.load_data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Settings` object is a global settings for the RAG pipeline. Attributes in it have default settings and can be modified by users (OpenAI's GPT and embedding model). Large attributes like models will be only loaded when being used.\n", "\n", "Here, we specify the `node_parser` to `SentenceSplitter()` with our chosen parameters, use the open-source `bge-base-en-v1.5` as our embedding model, and `gpt-4o-mini` as our llm." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import Settings\n", "from llama_index.core.node_parser import SentenceSplitter\n", "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n", "from llama_index.llms.openai import OpenAI\n", "\n", "# set the parser with parameters\n", "Settings.node_parser = SentenceSplitter(\n", " chunk_size=1000, # Maximum size of chunks to return\n", " chunk_overlap=150, # number of overlap characters between chunks\n", ")\n", "\n", "# set the specific embedding model\n", "Settings.embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-base-en-v1.5\")\n", "\n", "# set the llm we want to use\n", "Settings.llm = OpenAI(model=\"gpt-4o-mini\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing is one of the most important part in RAG. LlamaIndex integrates a great amount of vector databases. Here we will use Faiss as an example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First check the dimension of the embeddings, which will need for initializing a Faiss index." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "768\n" ] } ], "source": [ "embedding = Settings.embed_model.get_text_embedding(\"Hello world\")\n", "dim = len(embedding)\n", "print(dim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then create the index with Faiss and our documents. Here LlamaIndex help capsulate the Faiss function calls. If you would like to know more about Faiss, refer to the tutorial of [Faiss and indexing](https://github.com/FlagOpen/FlagEmbedding/tree/master/Tutorials/3_Indexing)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import faiss\n", "from llama_index.vector_stores.faiss import FaissVectorStore\n", "from llama_index.core import StorageContext, VectorStoreIndex\n", "\n", "# init Faiss and create a vector store\n", "faiss_index = faiss.IndexFlatL2(dim)\n", "vector_store = FaissVectorStore(faiss_index=faiss_index)\n", "\n", "# customize the storage context using our vector store\n", "storage_context = StorageContext.from_defaults(\n", " vector_store=vector_store\n", ")\n", "\n", "# use the loaded documents to build the index\n", "index = VectorStoreIndex.from_documents(\n", " documents, storage_context=storage_context\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Retrieve and Generate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With a well constructed index, we can now build the query engine to accomplish our task:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "query_engine = index.as_query_engine()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cell displays the default prompt template for Q&A in our pipeline:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Context information is below.\n", "---------------------\n", "{context_str}\n", "---------------------\n", "Given the context information and not prior knowledge, answer the query.\n", "Query: {query_str}\n", "Answer: \n" ] } ], "source": [ "# check the default promt template\n", "prompt_template = query_engine.get_prompts()['response_synthesizer:text_qa_template']\n", "print(prompt_template.get_template())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Optional) You could modify the prompt to match your use cases:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "You are a Q&A chat bot.\n", "Use the given context only, answer the question.\n", "\n", "\n", "{context_str}\n", "\n", "\n", "Question: {query_str}\n", "\n" ] } ], "source": [ "from llama_index.core import PromptTemplate\n", "\n", "template = \"\"\"\n", "You are a Q&A chat bot.\n", "Use the given context only, answer the question.\n", "\n", "\n", "{context_str}\n", "\n", "\n", "Question: {query_str}\n", "\"\"\"\n", "\n", "new_template = PromptTemplate(template)\n", "query_engine.update_prompts(\n", " {\"response_synthesizer:text_qa_template\": new_template}\n", ")\n", "\n", "prompt_template = query_engine.get_prompts()['response_synthesizer:text_qa_template']\n", "print(prompt_template.get_template())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's see how does the RAG application performs on our query!" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "M3-Embedding stands for Multi-Linguality, Multi-Functionality, and Multi-Granularity.\n" ] } ], "source": [ "response = query_engine.query(\"What does M3-Embedding stands for?\")\n", "print(response)" ] } ], "metadata": { "kernelspec": { "display_name": "test", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 2 }