{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Faiss GPU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the last tutorial, we went through the basics of indexing using faiss-cpu. While for the use cases in research and industry. The size of dataset for indexing will be extremely large, the frequency of searching might also be very high. In this tutorial we'll see how to combine Faiss and GPU almost seamlessly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Installation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Faiss maintain the latest updates on conda. And its gpu version only supports Linux x86_64\n", "\n", "create a conda virtual environment and run:\n", "\n", "```conda install -c pytorch -c nvidia faiss-gpu=1.8.0```\n", "\n", "make sure you select that conda env as the kernel for this notebook. After installation, restart the kernal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your system does not satisfy the requirement, install faiss-cpu and just skip the steps with gpu related codes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Data Preparation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's create two datasets with \"fake embeddings\" of corpus and queries:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import faiss\n", "import numpy as np\n", "\n", "dim = 768\n", "corpus_size = 1000\n", "# np.random.seed(111)\n", "\n", "corpus = np.random.random((corpus_size, dim)).astype('float32')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Create Index on CPU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Option 1:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Faiss provides a great amount of choices of indexes by initializing directly:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# first build a flat index (on CPU)\n", "index = faiss.IndexFlatIP(dim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Option 2:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides the basic index class, we can also use the index_factory function to produce composite Faiss index." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "index = faiss.index_factory(dim, \"Flat\", faiss.METRIC_L2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Build GPU Index and Search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the GPU indexes are built with `StandardGpuResources` object. It contains all the needed resources for each GPU in use. By default it will allocate 18% of the total VRAM as a temporary scratch space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `GpuClonerOptions` and `GpuMultipleClonerOptions` objects are optional when creating index from cpu to gpu. They are used to adjust the way the GPUs stores the objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Single GPU:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# use a single GPU\n", "rs = faiss.StandardGpuResources()\n", "co = faiss.GpuClonerOptions()\n", "\n", "# then make it to gpu index\n", "index_gpu = faiss.index_cpu_to_gpu(provider=rs, device=0, index=index, options=co)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 5.31 ms, sys: 6.26 ms, total: 11.6 ms\n", "Wall time: 8.94 ms\n" ] } ], "source": [ "%%time\n", "index_gpu.add(corpus)\n", "D, I = index_gpu.search(corpus, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### All Available GPUs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your system contains multiple GPUs, Faiss provides the option to deploy al available GPUs. You can control their usages through `GpuMultipleClonerOptions`, e.g. whether to shard or replicate the index acrross GPUs." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# cloner options for multiple GPUs\n", "co = faiss.GpuMultipleClonerOptions()\n", "\n", "index_gpu = faiss.index_cpu_to_all_gpus(index=index, co=co)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 29.8 ms, sys: 26.8 ms, total: 56.6 ms\n", "Wall time: 33.9 ms\n" ] } ], "source": [ "%%time\n", "index_gpu.add(corpus)\n", "D, I = index_gpu.search(corpus, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiple GPUs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's also option that use multiple GPUs but not all:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "ngpu = 4\n", "resources = [faiss.StandardGpuResources() for _ in range(ngpu)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create vectors for the GpuResources and divices, then pass them to the index_cpu_to_gpu_multiple() function." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "vres = faiss.GpuResourcesVector()\n", "vdev = faiss.Int32Vector()\n", "for i, res in zip(range(ngpu), resources):\n", " vdev.push_back(i)\n", " vres.push_back(res)\n", "index_gpu = faiss.index_cpu_to_gpu_multiple(vres, vdev, index)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.49 ms, sys: 13.4 ms, total: 16.9 ms\n", "Wall time: 9.03 ms\n" ] } ], "source": [ "%%time\n", "index_gpu.add(corpus)\n", "D, I = index_gpu.search(corpus, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the three approaches should lead to identical result. Now let's do a quick sanity check:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# The nearest neighbor of each vector in the corpus is itself\n", "assert np.all(corpus[:] == corpus[I[:, 0]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the corresponding distance should be 0." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 111.30057 113.2251 113.342316]\n", " [ 0. 111.158875 111.742325 112.09038 ]\n", " [ 0. 116.44429 116.849915 117.30502 ]]\n" ] } ], "source": [ "print(D[:3])" ] } ], "metadata": { "kernelspec": { "display_name": "faiss", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 2 }