data_loader#

class FlagEmbedding.evaluation.mkqa.MKQAEvalDataLoader(eval_name: str, dataset_dir: str | None = None, cache_dir: str | None = None, token: str | None = None, force_redownload: bool = False)[source]#

Data loader class for MKQA.

Methods#

MKQAEvalDataLoader.available_dataset_names() List[str][source]#

Get the available dataset names.

Returns:

All the available dataset names.

Return type:

List[str]

MKQAEvalDataLoader.available_splits(dataset_name: str | None = None) List[str][source]#

Get the avaialble splits.

Parameters:

dataset_name (str) – Dataset name.

Returns:

All the available splits for the dataset.

Return type:

List[str]

MKQAEvalDataLoader.load_corpus(dataset_name: str | None = None) DatasetDict[source]#

Load the corpus.

Parameters:

dataset_name (Optional[str], optional) – Name of the dataset. Defaults to None.

Returns:

Loaded datasets instance of corpus.

Return type:

datasets.DatasetDict

MKQAEvalDataLoader._load_local_qrels(save_dir: str, dataset_name: str | None = None, split: str = 'test') DatasetDict[source]#

Try to load qrels from local datasets.

Parameters:
  • save_dir (str) – Directory that save the data files.

  • dataset_name (Optional[str], optional) – Name of the dataset. Defaults to None.

  • split (str, optional) – Split of the dataset. Defaults to 'test'.

Raises:

ValueError – No local qrels found, will try to download from remote.

Returns:

Loaded datasets instance of qrels.

Return type:

datasets.DatasetDict

MKQAEvalDataLoader._load_remote_corpus(dataset_name: str | None = None, save_dir: str | None = None) DatasetDict[source]#

Refer to: https://arxiv.org/pdf/2402.03216. We use the corpus from the BeIR dataset.

MKQAEvalDataLoader._load_remote_qrels(dataset_name: str, split: str = 'test', save_dir: str | None = None) DatasetDict[source]#

Load remote qrels from HF.

Parameters:
  • dataset_name (str) – Name of the dataset.

  • split (str, optional) – Split of the dataset. Defaults to 'test'.

  • save_dir (Optional[str], optional) – Directory to save the dataset. Defaults to None.

Returns:

Loaded datasets instance of qrel.

Return type:

datasets.DatasetDict

MKQAEvalDataLoader._load_remote_queries(dataset_name: str, split: str = 'test', save_dir: str | None = None) DatasetDict[source]#

Load the queries from HF.

Parameters:
  • dataset_name (str) – Name of the dataset.

  • split (str, optional) – Split of the dataset. Defaults to 'test'.

  • save_dir (Optional[str], optional) – Directory to save the dataset. Defaults to None.

Returns:

Loaded datasets instance of queries.

Return type:

datasets.DatasetDict