BGE Reranker#

Like embedding models, BGE has a group of rerankers with various sizes and functionalities. In this tutorial, we will introduce the BGE rerankers series.

0. Installation#

Install the dependencies in the environment.

%pip install -U FlagEmbedding

1. bge-reranker#

The first generation of BGE reranker contains two models:

Model

Language

Parameters

Description

Base Model

BAAI/bge-reranker-base

Chinese and English

278M

a cross-encoder model which is more accurate but less efficient

XLM-RoBERTa-Base

BAAI/bge-reranker-large

Chinese and English

560M

a cross-encoder model which is more accurate but less efficient

XLM-RoBERTa-Large

from FlagEmbedding import FlagReranker

model = FlagReranker(
    'BAAI/bge-reranker-large',
    use_fp16=True,
    devices=["cuda:0"],   # if you don't have GPUs, you can use "cpu"
)

pairs = [
    ["What is the capital of France?", "Paris is the capital of France."],
    ["What is the capital of France?", "The population of China is over 1.4 billion people."],
    ["What is the population of China?", "Paris is the capital of France."],
    ["What is the population of China?", "The population of China is over 1.4 billion people."]
]

scores = model.compute_score(pairs)
scores
/share/project/xzy/Envs/ft/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[7.984375, -6.84375, -7.15234375, 5.44921875]

2. bge-reranker v2#

Model

Language

Parameters

Description

Base Model

BAAI/bge-reranker-v2-m3

Multilingual

568M

a lightweight cross-encoder model, possesses strong multilingual capabilities, easy to deploy, with fast inference.

XLM-RoBERTa-Large

BAAI/bge-reranker-v2-gemma

Multilingual

2.51B

a cross-encoder model which is suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities.

Gemma2-2B

BAAI/bge-reranker-v2-minicpm-layerwise

Multilingual

2.72B

a cross-encoder model which is suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference.

MiniCPM

BAAI/bge-reranker-v2.5-gemma2-lightweight

Multilingual

9.24B

a cross-encoder model which is suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers, compress ratio and compress layers for output, facilitating accelerated inference.

Gemma2-9B

bge-reranker-v2-m3#

bge-reranker-v2-m3 is trained based on bge-m3, introducing great multi-lingual capability as keeping a slim model size.

from FlagEmbedding import FlagReranker

# Setting use_fp16 to True speeds up computation with a slight performance degradation (if using gpu)
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
# or set "normalize=True" to apply a sigmoid function to the score for 0-1 range
score = reranker.compute_score(['query', 'passage'], normalize=True)

print(score)
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[0.003483424193080668]

bge-reranker-v2-gemma#

bge-reranker-v2-gemma is trained based on gemma-2b. It has excellent performances with both English proficiency and multilingual capabilities.

from FlagEmbedding import FlagLLMReranker

reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
print(score)
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  5.29it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 45.99it/s]
[1.974609375]

bge-reranker-v2-minicpm-layerwise#

bge-reranker-v2-minicpm-layerwise is trained based on minicpm-2b-dpo-bf16. It’s suitable for multi-lingual contexts, performs well in Both English and Chinese proficiency.

Another special functionality is the layerwise design gives user freedom to select layers for output, facilitating accelerated inference.

from FlagEmbedding import LayerWiseFlagLLMReranker

reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28])
print(score)
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  3.85it/s]
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 24.51it/s]
[-7.06640625]

bge-reranker-v2.5-gemma2-lightweight#

bge-reranker-v2.5-gemma2-lightweight is trained based on gemma2-9b. It’s also suitable for multi-lingual contexts.

Besides the layerwise reduction functionality, bge-reranker-v2.5-gemma2-lightweight integrates token compression capabilities to further save more resources while maintaining outstanding performances.

from FlagEmbedding import LightWeightFlagLLMReranker

reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40])
print(score)
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  3.60it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 23.95it/s]
[14.734375]

Comparison#

BGE reranker series provides a great number of choices for all kinds of functionalities. You can select the model according your senario and resource:

  • For multilingual, utilize BAAI/bge-reranker-v2-m3, BAAI/bge-reranker-v2-gemma and BAAI/bge-reranker-v2.5-gemma2-lightweight.

  • For Chinese or English, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise.

  • For efficiency, utilize BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise.

  • For saving resources and extreme efficiency, utilize BAAI/bge-reranker-base and BAAI/bge-reranker-large.

  • For better performance, recommand BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma.

Make sure always test on your real use case and choose the one with best speed-quality balance!