BGE-Code-v1 =========== **`BGE-Code-v1 `_** is an LLM-based code embedding model that supports code retrieval, text retrieval, and multilingual retrieval. It primarily demonstrates the following capabilities: - Superior Code Retrieval Performance: The model demonstrates exceptional code retrieval capabilities, supporting natural language queries in both English and Chinese, as well as 20 programming languages. - Robust Text Retrieval Capabilities: The model maintains strong text retrieval capabilities comparable to text embedding models of similar scale. - Extensive Multilingual Support: BGE-Code-v1 offers comprehensive multilingual retrieval capabilities, excelling in languages such as English, Chinese, Japanese, French, and more. +-------------------------------------------------------------------+-----------------+------------+--------------+----------------------------------------------------------------------------------------------------+ | Model | Language | Parameters | Model Size | Description | +===================================================================+=================+============+==============+====================================================================================================+ | `BAAI/bge-code-v1 `_ | Multilingual | 1.5B | 6.18 GB | SOTA code retrieval model, with exceptional multilingual text retrieval performance as well | +-------------------------------------------------------------------+-----------------+------------+--------------+----------------------------------------------------------------------------------------------------+ .. code:: python from FlagEmbedding import FlagLLMModel queries = [ "Delete the record with ID 4 from the 'Staff' table.", 'Delete all records in the "Livestock" table where age is greater than 5' ] documents = [ "DELETE FROM Staff WHERE StaffID = 4;", "DELETE FROM Livestock WHERE age > 5;" ] model = FlagLLMModel('BAAI/bge-code-v1', query_instruction_format="{}\n{}", query_instruction_for_retrieval="Given a question in text, retrieve SQL queries that are appropriate responses to the question.", trust_remote_code=True, use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation embeddings_1 = model.encode_queries(queries) embeddings_2 = model.encode_corpus(documents) similarity = embeddings_1 @ embeddings_2.T print(similarity)