Faiss GPU#

In the last tutorial, we went through the basics of indexing using faiss-cpu. While for the use cases in research and industry. The size of dataset for indexing will be extremely large, the frequency of searching might also be very high. In this tutorial we’ll see how to combine Faiss and GPU almost seamlessly.

1. Installation#

Faiss maintain the latest updates on conda. And its gpu version only supports Linux x86_64

create a conda virtual environment and run:

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

make sure you select that conda env as the kernel for this notebook. After installation, restart the kernal.

If your system does not satisfy the requirement, install faiss-cpu and just skip the steps with gpu related codes.

2. Data Preparation#

First let’s create two datasets with “fake embeddings” of corpus and queries:

import faiss
import numpy as np

dim = 768
corpus_size = 1000
# np.random.seed(111)

corpus = np.random.random((corpus_size, dim)).astype('float32')

3. Create Index on CPU#

Option 1:#

Faiss provides a great amount of choices of indexes by initializing directly:

# first build a flat index (on CPU)
index = faiss.IndexFlatIP(dim)

Option 2:#

Besides the basic index class, we can also use the index_factory function to produce composite Faiss index.

index = faiss.index_factory(dim, "Flat", faiss.METRIC_L2)

4. Build GPU Index and Search#

All the GPU indexes are built with StandardGpuResources object. It contains all the needed resources for each GPU in use. By default it will allocate 18% of the total VRAM as a temporary scratch space.

The GpuClonerOptions and GpuMultipleClonerOptions objects are optional when creating index from cpu to gpu. They are used to adjust the way the GPUs stores the objects.

Single GPU:#

# use a single GPU
rs = faiss.StandardGpuResources()
co = faiss.GpuClonerOptions()

# then make it to gpu index
index_gpu = faiss.index_cpu_to_gpu(provider=rs, device=0, index=index, options=co)

%%time
index_gpu.add(corpus)
D, I = index_gpu.search(corpus, 4)

CPU times: user 5.31 ms, sys: 6.26 ms, total: 11.6 ms
Wall time: 8.94 ms

All Available GPUs#

If your system contains multiple GPUs, Faiss provides the option to deploy al available GPUs. You can control their usages through GpuMultipleClonerOptions, e.g. whether to shard or replicate the index acrross GPUs.

# cloner options for multiple GPUs
co = faiss.GpuMultipleClonerOptions()

index_gpu = faiss.index_cpu_to_all_gpus(index=index, co=co)

%%time
index_gpu.add(corpus)
D, I = index_gpu.search(corpus, 4)

CPU times: user 29.8 ms, sys: 26.8 ms, total: 56.6 ms
Wall time: 33.9 ms

Multiple GPUs#

There’s also option that use multiple GPUs but not all:

ngpu = 4
resources = [faiss.StandardGpuResources() for _ in range(ngpu)]

Create vectors for the GpuResources and divices, then pass them to the index_cpu_to_gpu_multiple() function.

vres = faiss.GpuResourcesVector()
vdev = faiss.Int32Vector()
for i, res in zip(range(ngpu), resources):
    vdev.push_back(i)
    vres.push_back(res)
index_gpu = faiss.index_cpu_to_gpu_multiple(vres, vdev, index)

%%time
index_gpu.add(corpus)
D, I = index_gpu.search(corpus, 4)

CPU times: user 3.49 ms, sys: 13.4 ms, total: 16.9 ms
Wall time: 9.03 ms

5. Results#

All the three approaches should lead to identical result. Now let’s do a quick sanity check:

# The nearest neighbor of each vector in the corpus is itself
assert np.all(corpus[:] == corpus[I[:, 0]])

And the corresponding distance should be 0.

print(D[:3])

[[  0.       111.30057  113.2251   113.342316]
 [  0.       111.158875 111.742325 112.09038 ]
 [  0.       116.44429  116.849915 117.30502 ]]