BEIR ==== `BEIR `_ (Benchmarking-IR) is a heterogeneous evaluation benchmark for information retrieval. It is designed for evaluating the performance of NLP-based retrieval models and widely used by research of modern embedding models. You can evaluate model's performance on the BEIR benchmark by running our provided shell script: .. code:: bash chmod +x /examples/evaluation/beir/eval_beir.sh ./examples/evaluation/beir/eval_beir.sh Or by running: .. code:: bash python -m FlagEmbedding.evaluation.beir \ --eval_name beir \ --dataset_dir ./beir/data \ --dataset_names fiqa arguana cqadupstack \ --splits test dev \ --corpus_embd_save_dir ./beir/corpus_embd \ --output_dir ./beir/search_results \ --search_top_k 1000 \ --rerank_top_k 100 \ --cache_path /root/.cache/huggingface/hub \ --overwrite False \ --k_values 10 100 \ --eval_output_method markdown \ --eval_output_path ./beir/beir_eval_results.md \ --eval_metrics ndcg_at_10 recall_at_100 \ --ignore_identical_ids True \ --embedder_name_or_path BAAI/bge-large-en-v1.5 \ --reranker_name_or_path BAAI/bge-reranker-v2-m3 \ --devices cuda:0 cuda:1 \ --reranker_max_length 1024 \ change the embedder, devices and cache directory to your preference. .. toctree:: :hidden: beir/arguments beir/data_loader beir/evaluator beir/runner