BaseEmbedder#
- class FlagEmbedding.inference.embedder.decoder_only.base.BaseLLMEmbedder(model_name_or_path: str, normalize_embeddings: bool = True, use_fp16: bool = True, query_instruction_for_retrieval: str | None = None, query_instruction_format: str = 'Instruct: {}\nQuery: {}', devices: str | List[str] | None = None, trust_remote_code: bool = False, cache_dir: str | None = None, batch_size: int = 256, query_max_length: int = 512, passage_max_length: int = 512, convert_to_numpy: bool = True, **kwargs: Any)[source]#
Base embedder class for LLM like decoder only models.
- Args:
- model_name_or_path (str): If it’s a path to a local model, it loads the model from the path. Otherwise tries to download and
load a model from HuggingFace Hub with the name.
normalize_embeddings (bool, optional): If True, normalize the embedding vector. Defaults to
True
. use_fp16 (bool, optional): If true, use half-precision floating-point to speed up computation with a slight performancedegradation. Defaults to
True
.- query_instruction_for_retrieval (Optional[str], optional): Query instruction for retrieval tasks, which will be used with
with
query_instruction_format
. Defaults toNone
.
query_instruction_format (str, optional): The template for
query_instruction_for_retrieval
. Defaults to :data:`”Instruct: {}
- Query: {}”`.
devices (Optional[Union[str, int, List[str], List[int]]], optional): Devices to use for model inference. Defaults to
None
. trust_remote_code (bool, optional): trust_remote_code for HF datasets or models. Defaults toFalse
. cache_dir (Optional[str], optional): Cache directory for the model. Defaults toNone
. batch_size (int, optional): Batch size for inference. Defaults to256
. query_max_length (int, optional): Maximum length for query. Defaults to512
. passage_max_length (int, optional): Maximum length for passage. Defaults to512
. convert_to_numpy (bool, optional): If True, the output embedding will be a Numpy array. Otherwise, it will be a Torch Tensor.Defaults to
True
.- Attributes:
DEFAULT_POOLING_METHOD: The default pooling method when running the model.
Methods#
- BaseLLMEmbedder.encode_queries(queries: List[str] | str, batch_size: int | None = None, max_length: int | None = None, convert_to_numpy: bool | None = None, **kwargs: Any) ndarray | Tensor [source]#
Encode the queries.
- Parameters:
queries (Union[List[str], str]) – Input queries to encode.
batch_size (Optional[int], optional) – Number of sentences for each iter. Defaults to
None
.max_length (Optional[int], optional) – Maximum length of tokens. Defaults to
None
.convert_to_numpy (Optional[bool], optional) – If True, the output embedding will be a Numpy array. Otherwise, it will be a Torch Tensor. Defaults to
None
.
- Returns:
Return the embedding vectors in a numpy array or tensor.
- Return type:
Union[torch.Tensor, np.ndarray]
- BaseLLMEmbedder.encode_corpus(corpus: List[str] | str, batch_size: int | None = None, max_length: int | None = None, convert_to_numpy: bool | None = None, **kwargs: Any) ndarray | Tensor [source]#
Encode the corpus.
- Parameters:
corpus (Union[List[str], str]) – Input corpus to encode.
batch_size (Optional[int], optional) – Number of sentences for each iter. Defaults to
None
.max_length (Optional[int], optional) – Maximum length of tokens. Defaults to
None
.convert_to_numpy (Optional[bool], optional) – If True, the output embedding will be a Numpy array. Otherwise, it will be a Torch Tensor. Defaults to
None
.
- Returns:
Return the embedding vectors in a numpy array or tensor.
- Return type:
Union[torch.Tensor, np.ndarray]
- BaseLLMEmbedder.encode(sentences: List[str] | str, batch_size: int | None = None, max_length: int | None = None, convert_to_numpy: bool | None = None, **kwargs: Any) ndarray | Tensor [source]#
Encode the input sentences with the embedding model.
- Parameters:
sentences (Union[List[str], str]) – Input sentences to encode.
batch_size (Optional[int], optional) – Number of sentences for each iter. Defaults to
None
.max_length (Optional[int], optional) – Maximum length of tokens. Defaults to
None
.convert_to_numpy (Optional[bool], optional) – If True, the output embedding will be a Numpy array. Otherwise, it will be a Torch Tensor. Defaults to
None
.
- Returns:
return the embedding vectors in a numpy array or tensor.
- Return type:
Union[torch.Tensor, np.ndarray]
- BaseLLMEmbedder.encode_single_device(sentences: List[str] | str, batch_size: int = 256, max_length: int = 512, convert_to_numpy: bool = True, device: str | None = None, **kwargs: Any)[source]#
Encode input sentences by a single device.
- Parameters:
sentences (Union[List[str], str]) – Input sentences to encode.
batch_size (int, optional) – Number of sentences for each iter. Defaults to
256
.max_length (int, optional) – Maximum length of tokens. Defaults to
512
.convert_to_numpy (bool, optional) – If True, the output embedding will be a Numpy array. Otherwise, it will be a Torch Tensor. Defaults to
True
.device (Optional[str], optional) – Device to use for encoding. Defaults to None.
- Returns:
return the embedding vectors in a numpy array or tensor.
- Return type:
Union[torch.Tensor, np.ndarray]