MKQA#

MKQA is an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages. The queries are sampled from the [Google Natural Questions Dataset](google-research-datasets/natural-questions).

Each example in the dataset has the following structure:

{
    'example_id': 563260143484355911,
    'queries': {
        'en': "who sings i hear you knocking but you can't come in",
        'ru': "кто поет i hear you knocking but you can't come in",
        'ja': '「 I hear you knocking」は誰が歌っていますか',
        'zh_cn': "《i hear you knocking but you can't come in》是谁演唱的",
        ...
    },
    'query': "who sings i hear you knocking but you can't come in",
    'answers': {
        'en': [{
            'type': 'entity',
            'entity': 'Q545186',
            'text': 'Dave Edmunds',
            'aliases': [],
        }],
        'ru': [{
            'type': 'entity',
            'entity': 'Q545186',
            'text': 'Эдмундс, Дэйв',
            'aliases': ['Эдмундс', 'Дэйв Эдмундс', 'Эдмундс Дэйв', 'Dave Edmunds'],
        }],
        'ja': [{
            'type': 'entity',
            'entity': 'Q545186',
            'text': 'デイヴ・エドモンズ',
            'aliases': ['デーブ・エドモンズ', 'デイブ・エドモンズ'],
        }],
        'zh_cn': [{
            'type': 'entity',
            'text': '戴维·埃德蒙兹 ',
            'entity': 'Q545186',
        }],
        ...
    },
}

You can evaluate model’s performance on MKQA simply by running our provided shell script:

chmod +x /examples/evaluation/mkqa/eval_mkqa.sh
./examples/evaluation/mkqa/eval_mkqa.sh

Or by running:

python -m FlagEmbedding.evaluation.mkqa \
--eval_name mkqa \
--dataset_dir ./mkqa/data \
--dataset_names en zh_cn \
--splits test \
--corpus_embd_save_dir ./mkqa/corpus_embd \
--output_dir ./mkqa/search_results \
--search_top_k 1000 \
--rerank_top_k 100 \
--cache_path /root/.cache/huggingface/hub \
--overwrite False \
--k_values 20 \
--eval_output_method markdown \
--eval_output_path ./mkqa/mkqa_eval_results.md \
--eval_metrics qa_recall_at_20 \
--embedder_name_or_path BAAI/bge-m3 \
--reranker_name_or_path BAAI/bge-reranker-v2-m3 \
--devices cuda:0 cuda:1 \
--cache_dir /root/.cache/huggingface/hub \
--reranker_max_length 1024

change the embedder, reranker, devices and cache directory to your preference.