AbsDataset#
AbsRerankerTrainDataset#
- class FlagEmbedding.abc.finetune.reranker.AbsRerankerTrainDataset(args: AbsRerankerDataArguments, tokenizer: PreTrainedTokenizer)[source]#
Abstract class for reranker training dataset.
- Parameters:
args (AbsRerankerDataArguments) – Data arguments.
tokenizer (PreTrainedTokenizer) – Tokenizer to use.
Methods#
- AbsRerankerTrainDataset.create_one_example(qry_encoding: str, doc_encoding: str)[source]#
Creates a single input example by encoding and preparing a query and document pair for the model.
- Parameters:
qry_encoding (str) – Query to be encoded.
doc_encoding (str) – Document to be encoded.
- Returns:
A dictionary containing tokenized and prepared inputs, ready for model consumption.
- Return type:
dict
AbsRerankerCollator#
- class FlagEmbedding.abc.finetune.reranker.AbsRerankerCollator(tokenizer: PreTrainedTokenizerBase, padding: bool | str | PaddingStrategy = True, max_length: int | None = None, pad_to_multiple_of: int | None = None, return_tensors: str = 'pt', query_max_len: int = 32, passage_max_len: int = 128)[source]#
The abstract reranker collator.
AbsLLMRerankerTrainDataset#
- class FlagEmbedding.abc.finetune.reranker.AbsLLMRerankerTrainDataset(args: AbsRerankerDataArguments, tokenizer: PreTrainedTokenizer)[source]#
Abstract class for LLM reranker training dataset.
- Parameters:
args (AbsRerankerDataArguments) – Data arguments.
tokenizer (PreTrainedTokenizer) – Tokenizer to use.
AbsLLMRerankerCollator#
- class FlagEmbedding.abc.finetune.reranker.AbsLLMRerankerCollator(tokenizer: PreTrainedTokenizerBase, model: Any | None = None, padding: bool | str | PaddingStrategy = True, max_length: int | None = None, pad_to_multiple_of: int | None = None, label_pad_token_id: int = -100, return_tensors: str = 'pt', query_max_len: int = 32, passage_max_len: int = 128)[source]#
Wrapper that does conversion from List[Tuple[encode_qry, encode_psg]] to List[qry], List[psg] and pass batch separately to the actual collator. Abstract out data detail for the model.