This page is being phased out as part of our documentation reorganization.
Click this card to be redirected to the updated version with the most current information.
If you notice any discrepancies or areas needing improvement in the new documentation, please use the “Report an issue” button at the bottom of the page.
A custom model refers to an LLM that you deploy or configure on your own. This document uses the Xinference model as an example to demonstrate how to integrate a custom model into your model plugin.
By default, a custom model automatically includes two parameters—its model type and model name—and does not require additional definitions in the provider YAML file.
You do not need to implement validate_provider_credential
in your provider configuration file. During runtime, based on the user’s choice of model type or model name, Dify automatically calls the corresponding model layer’s validate_credentials
method to verify credentials.
Below are the steps to integrate a custom model:
llm
or text_embedding
), create separate code files. Ensure that each model type is organized into distinct logical layers for easier maintenance and future expansion.llm.py
). Define a class in the file that implements the specific model logic, conforming to the system’s model interface specifications.In your plugin’s /provider
directory, create a xinference.yaml
file.
The Xinference
family of models supports LLM, Text Embedding, and Rerank model types, so your xinference.yaml
must include all three.
Example:
Next, define the provider_credential_schema
. Since Xinference
supports text-generation, embeddings, and reranking models, you can configure it as follows:
Every model in Xinference requires a model_name
:
Because Xinference must be locally deployed, users need to supply the server address (server_url) and model UID. For instance:
Once you’ve defined these parameters, the YAML configuration for your custom model provider is complete. Next, create the functional code files for each model defined in this config.
Since Xinference supports llm, rerank, speech2text, and tts, you should create corresponding directories under /models, each containing its respective feature code.
Below is an example for an llm-type model. You’d create a file named llm.py, then define a class—such as XinferenceAILargeLanguageModel—that extends __base.large_language_model.LargeLanguageModel. This class should include:
The core method for invoking the LLM, supporting both streaming and synchronous responses:
You’ll need two separate functions to handle streaming and synchronous responses. Python treats any function containing yield
as a generator returning type Generator
, so it’s best to split them:
If your model doesn’t provide a token-counting interface, simply return 0:
Alternatively, you can call self._get_num_tokens_by_gpt2(text: str)
from the AIModel
base class, which uses a GPT-2 tokenizer. Remember this is an approximation and may not match your model exactly.
Similar to provider-level credential checks, but scoped to a single model:
Unlike predefined models, no YAML is defining which parameters a model supports. You must generate a parameter schema dynamically.
For example, Xinference supports max_tokens
, temperature
, and top_p
. Some other providers (e.g., OpenLLM
) may support parameters like top_k
only for certain models. This means you need to adapt your schema to each model’s capabilities:
When an error occurs during model invocation, map it to the appropriate InvokeError type recognized by the runtime. This lets Dify handle different errors in a standardized manner:
Runtime Errors:
For more details on interface methods, see the Model Documentation.
To view the complete code files discussed in this guide, visit the GitHub Repository.
After finishing development, test the plugin to ensure it runs correctly. For more details, refer to:
If you’d like to list this plugin on the Dify Marketplace, see:
publish-to-dify-marketplace
Quick Start:
Plugins Endpoint Docs:
Edit this page | Report an issue
This page is being phased out as part of our documentation reorganization.
Click this card to be redirected to the updated version with the most current information.
If you notice any discrepancies or areas needing improvement in the new documentation, please use the “Report an issue” button at the bottom of the page.
A custom model refers to an LLM that you deploy or configure on your own. This document uses the Xinference model as an example to demonstrate how to integrate a custom model into your model plugin.
By default, a custom model automatically includes two parameters—its model type and model name—and does not require additional definitions in the provider YAML file.
You do not need to implement validate_provider_credential
in your provider configuration file. During runtime, based on the user’s choice of model type or model name, Dify automatically calls the corresponding model layer’s validate_credentials
method to verify credentials.
Below are the steps to integrate a custom model:
llm
or text_embedding
), create separate code files. Ensure that each model type is organized into distinct logical layers for easier maintenance and future expansion.llm.py
). Define a class in the file that implements the specific model logic, conforming to the system’s model interface specifications.In your plugin’s /provider
directory, create a xinference.yaml
file.
The Xinference
family of models supports LLM, Text Embedding, and Rerank model types, so your xinference.yaml
must include all three.
Example:
Next, define the provider_credential_schema
. Since Xinference
supports text-generation, embeddings, and reranking models, you can configure it as follows:
Every model in Xinference requires a model_name
:
Because Xinference must be locally deployed, users need to supply the server address (server_url) and model UID. For instance:
Once you’ve defined these parameters, the YAML configuration for your custom model provider is complete. Next, create the functional code files for each model defined in this config.
Since Xinference supports llm, rerank, speech2text, and tts, you should create corresponding directories under /models, each containing its respective feature code.
Below is an example for an llm-type model. You’d create a file named llm.py, then define a class—such as XinferenceAILargeLanguageModel—that extends __base.large_language_model.LargeLanguageModel. This class should include:
The core method for invoking the LLM, supporting both streaming and synchronous responses:
You’ll need two separate functions to handle streaming and synchronous responses. Python treats any function containing yield
as a generator returning type Generator
, so it’s best to split them:
If your model doesn’t provide a token-counting interface, simply return 0:
Alternatively, you can call self._get_num_tokens_by_gpt2(text: str)
from the AIModel
base class, which uses a GPT-2 tokenizer. Remember this is an approximation and may not match your model exactly.
Similar to provider-level credential checks, but scoped to a single model:
Unlike predefined models, no YAML is defining which parameters a model supports. You must generate a parameter schema dynamically.
For example, Xinference supports max_tokens
, temperature
, and top_p
. Some other providers (e.g., OpenLLM
) may support parameters like top_k
only for certain models. This means you need to adapt your schema to each model’s capabilities:
When an error occurs during model invocation, map it to the appropriate InvokeError type recognized by the runtime. This lets Dify handle different errors in a standardized manner:
Runtime Errors:
For more details on interface methods, see the Model Documentation.
To view the complete code files discussed in this guide, visit the GitHub Repository.
After finishing development, test the plugin to ensure it runs correctly. For more details, refer to:
If you’d like to list this plugin on the Dify Marketplace, see:
publish-to-dify-marketplace
Quick Start:
Plugins Endpoint Docs:
Edit this page | Report an issue