This document details how to integrate custom models into Dify, using the Xinference model as an example. It covers the complete process, including creating model provider files, writing code based on model type, implementing model invocation logic, handling exceptions, debugging, and publishing. It specifically details the implementation of core methods like LLM invocation, token calculation, credential validation, and parameter generation.
validate_provider_credential
in your provider configuration file. During runtime, based on the user’s choice of model type or model name, Dify automatically calls the corresponding model layer’s validate_credentials
method to verify credentials.
llm
or text_embedding
), create separate code files. Ensure that each model type is organized into distinct logical layers for easier maintenance and future expansion.llm.py
). Define a class in the file that implements the specific model logic, conforming to the system’s model interface specifications./provider
directory, create a xinference.yaml
file.
The Xinference
family of models supports LLM, Text Embedding, and Rerank model types, so your xinference.yaml
must include all three.
Example:
provider_credential_schema
. Since Xinference
supports text-generation, embeddings, and reranking models, you can configure it as follows:
model_name
:
yield
as a generator returning type Generator
, so it’s best to split them:
self._get_num_tokens_by_gpt2(text: str)
from the AIModel
base class, which uses a GPT-2 tokenizer. Remember this is an approximation and may not match your model exactly.
max_tokens
, temperature
, and top_p
. Some other providers (e.g., OpenLLM
) may support parameters like top_k
only for certain models. This means you need to adapt your schema to each model’s capabilities: