Xinference
as an example to gradually complete a full vendor integration.
It is important to note that for custom models, each model integration requires a complete vendor credential.
Unlike predefined models, custom vendor integration will always have the following two parameters, which do not need to be defined in the vendor YAML file.
validate_provider_credential
. The Runtime will automatically call the corresponding model layer’s validate_credentials
based on the model type and model name selected by the user for validation.
llm
Text Generation Modeltext_embedding
Text Embedding Modelrerank
Rerank Modelspeech2text
Speech to Texttts
Text to Speechmoderation
ModerationXinference
supports LLM
, Text Embedding
, and Rerank
, so we will start writing xinference.yaml
.
model_type
to specify the type of the model. It has three types, so we write it as follows:model_name
, so we need to define it here.model_uid
, so we need to define it here.llm
type as an example and write xinference.llm.llm.py
.
In llm.py
, create a Xinference LLM class, which we will name XinferenceAILargeLanguageModel
(arbitrary name), inheriting from the __base.large_language_model.LargeLanguageModel
base class. Implement the following methods:
yield
keyword as generator functions, and the return data type is fixed as Generator
. Therefore, synchronous and streaming returns need to be implemented separately, as shown below (note that the example uses simplified parameters; the actual implementation should follow the parameter list above):
self._get_num_tokens_by_gpt2(text: str)
to get precomputed tokens. This method is located in the AIModel
base class and uses GPT2’s Tokenizer for calculation. However, it can only be used as an alternative method and is not completely accurate.
max_tokens
, temperature
, and top_p
parameters.
However, some vendors support different parameters depending on the model. For instance, the vendor OpenLLM
supports top_k
, but not all models provided by this vendor support top_k
. Here, we illustrate that Model A supports top_k
, while Model B does not. Therefore, we need to dynamically generate the model parameter schema, as shown below:
InvokeError
type to facilitate Dify’s different subsequent processing for different errors.
Runtime Errors:
InvokeConnectionError
Invocation connection errorInvokeServerUnavailableError
Invocation server unavailableInvokeRateLimitError
Invocation rate limit reachedInvokeAuthorizationError
Invocation authorization failedInvokeBadRequestError
Invocation parameter error