Faiss Quantizers#
In this notebook, we will introduce the quantizer object in Faiss and how to use them.
Preparation#
For CPU usage, run:
%pip install faiss-cpu
For GPU on Linux x86_64 system, use Conda:
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
import faiss
import numpy as np
np.random.seed(768)
data = np.random.random((1000, 128))
1. Scalar Quantizer#
Normal data type of vector embeedings is usually 32 bit floats. Scalar quantization is transforming the 32 float representation to, for example, 8 bit interger. Thus with a 4x reduction in size. In this way, it can be seen as we distribute each dimension into 256 buckets.
Name |
Class |
Parameters |
---|---|---|
|
Quantizer class |
|
|
Flat index class |
|
|
IVF index class |
|
Quantizer class objects are used to compress the data before adding into indexes. Flat index class objects and IVF index class objects can be used direct as and index. Quantization will be done automatically.
Scalar Quantizer#
d = 128
qtype = faiss.ScalarQuantizer.QT_8bit
quantizer = faiss.ScalarQuantizer(d, qtype)
quantizer.train(data)
new_data = quantizer.compute_codes(data)
print(new_data[0])
[156 180 46 226 13 130 41 187 63 251 16 199 205 166 117 122 214 2
206 137 71 186 20 131 59 57 68 114 35 45 28 210 27 93 74 245
167 5 32 42 44 128 10 189 10 13 42 162 179 221 241 104 205 21
70 87 52 219 172 138 193 0 228 175 144 34 59 88 170 1 233 220
20 64 245 241 5 161 41 55 30 247 107 8 229 90 201 10 43 158
238 184 187 114 232 90 116 205 14 214 135 158 237 192 205 141 232 176
124 176 163 68 49 91 125 70 6 170 55 44 215 84 46 48 218 56
107 176]
Scalar Quantizer Index#
d = 128
k = 3
qtype = faiss.ScalarQuantizer.QT_8bit
# nlist = 5
index = faiss.IndexScalarQuantizer(d, qtype, faiss.METRIC_L2)
# index = faiss.IndexIVFScalarQuantizer(d, nlist, faiss.ScalarQuantizer.QT_8bit, faiss.METRIC_L2)
index.train(data)
index.add(data)
D, I = index.search(data[:1], k)
print(f"closest elements: {I}")
print(f"distance: {D}")
closest elements: [[ 0 471 188]]
distance: [[1.6511828e-04 1.6252808e+01 1.6658131e+01]]
2. Product Quantizer#
When speed and memory are crucial factors in searching, product quantizer becomes a top choice. It is one of the effective quantizer on reducing memory size.
The first step of PQ is dividing the original vectors with dimension d
into smaller, low-dimensional sub-vectors with dimension d/m
. Here m
is the number of sub-vectors.
Then clustering algorithms are used to create codebook of a fixed number of centroids.
Next, each sub-vector of a vector is replaced by the index of the closest centroid from its corresponding codebook. Now each vector will be stored with only the indices instead of the full vector.
When comuputing the distance between a query vector. Only the distances to the centroids in the codebooks are calculated, thus enable the quick approximate nearest neighbor searches.
Name |
Class |
Parameters |
---|---|---|
|
Quantizer class |
|
|
Flat index class |
|
|
IVF index class |
|
Product Quantizer#
d = 128
M = 8
nbits = 4
quantizer = faiss.ProductQuantizer(d, M, nbits)
quantizer.train(data)
new_data = quantizer.compute_codes(data)
print(new_data.max())
print(new_data[:2])
255
[[ 90 169 226 45]
[ 33 51 34 15]]
Product Quantizer Index#
index = faiss.IndexPQ(d, M, nbits, faiss.METRIC_L2)
index.train(data)
index.add(data)
D, I = index.search(data[:1], k)
print(f"closest elements: {I}")
print(f"distance: {D}")
closest elements: [[ 0 946 330]]
distance: [[ 8.823908 11.602461 11.746731]]
Product Quantizer IVF Index#
nlist = 5
quantizer = faiss.IndexFlat(d, faiss.METRIC_L2)
index = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits, faiss.METRIC_L2)
index.train(data)
index.add(data)
D, I = index.search(data[:1], k)
print(f"closest elements: {I}")
print(f"distance: {D}")
closest elements: [[ 0 899 521]]
distance: [[ 8.911423 12.088312 12.104569]]