data_loader#
- class FlagEmbedding.evaluation.mkqa.MKQAEvalDataLoader(eval_name: str, dataset_dir: str | None = None, cache_dir: str | None = None, token: str | None = None, force_redownload: bool = False)[source]#
Data loader class for MKQA.
Methods#
- MKQAEvalDataLoader.available_dataset_names() List[str] [source]#
Get the available dataset names.
- Returns:
All the available dataset names.
- Return type:
List[str]
- MKQAEvalDataLoader.available_splits(dataset_name: str | None = None) List[str] [source]#
Get the avaialble splits.
- Parameters:
dataset_name (str) – Dataset name.
- Returns:
All the available splits for the dataset.
- Return type:
List[str]
- MKQAEvalDataLoader.load_corpus(dataset_name: str | None = None) DatasetDict [source]#
Load the corpus.
- Parameters:
dataset_name (Optional[str], optional) – Name of the dataset. Defaults to None.
- Returns:
Loaded datasets instance of corpus.
- Return type:
datasets.DatasetDict
- MKQAEvalDataLoader._load_local_qrels(save_dir: str, dataset_name: str | None = None, split: str = 'test') DatasetDict [source]#
Try to load qrels from local datasets.
- Parameters:
save_dir (str) – Directory that save the data files.
dataset_name (Optional[str], optional) – Name of the dataset. Defaults to
None
.split (str, optional) – Split of the dataset. Defaults to
'test'
.
- Raises:
ValueError – No local qrels found, will try to download from remote.
- Returns:
Loaded datasets instance of qrels.
- Return type:
datasets.DatasetDict
- MKQAEvalDataLoader._load_remote_corpus(dataset_name: str | None = None, save_dir: str | None = None) DatasetDict [source]#
Refer to: https://arxiv.org/pdf/2402.03216. We use the corpus from the BeIR dataset.
- MKQAEvalDataLoader._load_remote_qrels(dataset_name: str, split: str = 'test', save_dir: str | None = None) DatasetDict [source]#
Load remote qrels from HF.
- Parameters:
dataset_name (str) – Name of the dataset.
split (str, optional) – Split of the dataset. Defaults to
'test'
.save_dir (Optional[str], optional) – Directory to save the dataset. Defaults to
None
.
- Returns:
Loaded datasets instance of qrel.
- Return type:
datasets.DatasetDict
- MKQAEvalDataLoader._load_remote_queries(dataset_name: str, split: str = 'test', save_dir: str | None = None) DatasetDict [source]#
Load the queries from HF.
- Parameters:
dataset_name (str) – Name of the dataset.
split (str, optional) – Split of the dataset. Defaults to
'test'
.save_dir (Optional[str], optional) – Directory to save the dataset. Defaults to
None
.
- Returns:
Loaded datasets instance of queries.
- Return type:
datasets.DatasetDict