Faiss Quantizers#

In this notebook, we will introduce the quantizer object in Faiss and how to use them.

Preparation#

For CPU usage, run:

%pip install faiss-cpu

For GPU on Linux x86_64 system, use Conda:

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

import faiss
import numpy as np

np.random.seed(768)

data = np.random.random((1000, 128))

1. Scalar Quantizer#

Normal data type of vector embeedings is usually 32 bit floats. Scalar quantization is transforming the 32 float representation to, for example, 8 bit interger. Thus with a 4x reduction in size. In this way, it can be seen as we distribute each dimension into 256 buckets.

Name	Class	Parameters
`ScalarQuantizer`	Quantizer class	`d`: dimension of vectors `qtype`: map dimension into $2^\text{qtype}$ clusters
`IndexScalarQuantizer`	Flat index class	`d`: dimension of vectors `qtype`: map dimension into $2^\text{qtype}$ clusters `metric`: similarity metric (L2 or IP)
`IndexIVFScalarQuantizer`	IVF index class	`d`: dimension of vectors `nlist`: number of cells/clusters to partition the inverted file space `qtype`: map dimension into $2^\text{qtype}$ clusters `metric`: similarity metric (L2 or IP)

Quantizer class objects are used to compress the data before adding into indexes. Flat index class objects and IVF index class objects can be used direct as and index. Quantization will be done automatically.

Scalar Quantizer#

d = 128
qtype = faiss.ScalarQuantizer.QT_8bit

quantizer = faiss.ScalarQuantizer(d, qtype)

quantizer.train(data)
new_data = quantizer.compute_codes(data)

print(new_data[0])

[156 180  46 226  13 130  41 187  63 251  16 199 205 166 117 122 214   2
137  71 186  20 131  59  57  68 114  35  45  28 210  27  93  74 245
 5  32  42  44 128  10 189  10  13  42 162 179 221 241 104 205  21
87  52 219 172 138 193   0 228 175 144  34  59  88 170   1 233 220
64 245 241   5 161  41  55  30 247 107   8 229  90 201  10  43 158
184 187 114 232  90 116 205  14 214 135 158 237 192 205 141 232 176
176 163  68  49  91 125  70   6 170  55  44 215  84  46  48 218  56
176]

Scalar Quantizer Index#

d = 128
k = 3
qtype = faiss.ScalarQuantizer.QT_8bit
# nlist = 5

index = faiss.IndexScalarQuantizer(d, qtype, faiss.METRIC_L2)
# index = faiss.IndexIVFScalarQuantizer(d, nlist, faiss.ScalarQuantizer.QT_8bit, faiss.METRIC_L2)

index.train(data)
index.add(data)

D, I = index.search(data[:1], k)

print(f"closest elements: {I}")
print(f"distance: {D}")

closest elements: [[  0 471 188]]
distance: [[1.6511828e-04 1.6252808e+01 1.6658131e+01]]

2. Product Quantizer#

When speed and memory are crucial factors in searching, product quantizer becomes a top choice. It is one of the effective quantizer on reducing memory size.

The first step of PQ is dividing the original vectors with dimension d into smaller, low-dimensional sub-vectors with dimension d/m. Here m is the number of sub-vectors.

Then clustering algorithms are used to create codebook of a fixed number of centroids.

Next, each sub-vector of a vector is replaced by the index of the closest centroid from its corresponding codebook. Now each vector will be stored with only the indices instead of the full vector.

When comuputing the distance between a query vector. Only the distances to the centroids in the codebooks are calculated, thus enable the quick approximate nearest neighbor searches.

Name	Class	Parameters
`ProductQuantizer`	Quantizer class	`d`: dimension of vectors `M`: number of sub-vectors that D % M == 0 `nbits`: number of bits per subquantizer, so each contain $2^\text{nbits}$ centroids
`IndexPQ`	Flat index class	`d`: dimension of vectors `M`: number of sub-vectors that D % M == 0 `nbits`: number of bits per subquantizer, so each contain $2^\text{nbits}$ centroids `metric`: similarity metric (L2 or IP)
`IndexIVFPQ`	IVF index class	`quantizer`: the quantizer used in computing distance phase. `d`: dimension of vectors `nlist`: number of cells/clusters to partition the inverted file space `M`: number of sub-vectors that D % M == 0 `nbits`: number of bits per subquantizer, so each contain $2^\text{nbits}$ centroids `metric`: similarity metric (L2 or IP)

Product Quantizer#

d = 128
M = 8
nbits = 4

quantizer = faiss.ProductQuantizer(d, M, nbits)

quantizer.train(data)
new_data = quantizer.compute_codes(data)

print(new_data.max())
print(new_data[:2])

255
[[ 90 169 226  45]
 [ 33  51  34  15]]

Product Quantizer Index#

index = faiss.IndexPQ(d, M, nbits, faiss.METRIC_L2)

index.train(data)
index.add(data)

D, I = index.search(data[:1], k)

print(f"closest elements: {I}")
print(f"distance: {D}")

closest elements: [[  0 946 330]]
distance: [[ 8.823908 11.602461 11.746731]]

Product Quantizer IVF Index#

nlist = 5

quantizer = faiss.IndexFlat(d, faiss.METRIC_L2)
index = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits, faiss.METRIC_L2)

index.train(data)
index.add(data)

D, I = index.search(data[:1], k)

print(f"closest elements: {I}")
print(f"distance: {D}")

closest elements: [[  0 899 521]]
distance: [[ 8.911423 12.088312 12.104569]]