LLM Configuration and Usage
Last updated
Last updated
Dify supports custom API domain names for OpenAI and any large model API server compatible with OpenAI. In the community edition, you can fill in the target server address through Settings --> Model Providers --> OpenAI --> Edit API.
gpt-3.5-turbo: gpt-3.5-turbo is an upgraded version of the gpt-3 model series. It is more powerful than gpt-3 and can handle more complex tasks. It has significant improvements in understanding long texts and cross-document reasoning. gpt-3.5-turbo can generate more coherent and persuasive text. It has also greatly improved in summarization, translation, and creative writing. Specializes in: Long text understanding, cross-document reasoning, summarization, translation, creative writing.
gpt-4: gpt-4 is the latest and most powerful Transformer language model. It has approximately 20 billion pre-trained parameters, making it top-notch in all language tasks, especially those requiring deep understanding and generation of long, complex responses. gpt-4 can handle all aspects of human language, including understanding abstract concepts and cross-page reasoning. gpt-4 is the first truly universal language understanding system capable of handling any natural language processing task within the AI domain. Specializes in: All NLP tasks, language understanding, long text generation, cross-document reasoning, abstract concept understanding. For more details, refer to the .
In natural language processing, longer text outputs usually require more computation time and resources. Therefore, limiting the length of the output text can reduce computational cost and time to some extent. For example, setting max_tokens=500 means only considering the first 500 tokens of the output text, and any part beyond this length will be discarded. This ensures that the output text length does not exceed the LLM's acceptable range and optimizes computational resources, improving model efficiency. Additionally, setting a smaller max_tokens allows for a longer prompt. For instance, gpt-3.5-turbo has a limit of 4097 tokens; if max_tokens=4000, only 97 tokens are left for the prompt, and exceeding this will cause an error.
In some natural language processing applications, texts are typically split by paragraphs or sentences to better handle and understand the semantic and structural information in the text. The smallest splitting unit depends on the specific task and technical implementation. For example:
For text classification tasks, texts are usually split by sentences or paragraphs.
For machine translation tasks, entire sentences or paragraphs are used as splitting units.
Finally, experiments and evaluations are needed to determine the most suitable embedding technique and splitting unit. You can compare the performance of different techniques and splitting units on the test set and choose the optimal solution.
We use . The choice of distance function is generally not critical. OpenAI embeddings are normalized to a length of 1, which means:
Using dot product can slightly speed up the calculation of cosine similarity
Cosine similarity and Euclidean distance will result in the same ranking
If normalized embedding vectors are used to calculate cosine similarity or Euclidean distance and vectors are ranked based on these similarity measures, the ranking results will be the same. This is because, after normalization, the length of the vectors no longer affects their relative relationships; only directional information is retained. Therefore, when using normalized vectors for similarity measurement, different measurement methods will yield the same ranking results. After normalization, all vectors are scaled to a length of 1, meaning they all lie on the unit length. Unit vectors describe only direction without magnitude, as their length is always 1. For specific principles, you can ask ChatGPT.
When embedding vectors are normalized to a length of 1, calculating the cosine similarity between two vectors can be simplified to their dot product. Since the normalized vector lengths are all 1, the dot product result is equivalent to the cosine similarity result. Given that dot product operations are faster than other similarity measures (like Euclidean distance), using normalized vectors for dot product calculations can slightly improve computational efficiency.
We collaborate with major model providers to offer a certain amount of free token trial quotas to Chinese users. Through Dify Settings --> Model Providers --> Show more model providers, click "Get Free" on the Zhipu·AI, iFlytek Spark, or MiniMax icons. If you can't see the entrance in the English interface, switch the product language to Chinese:
Zhipu·AI: Get 10 million tokens for free. Click "Get Free", enter your phone number and verification code to receive the quota, regardless of whether you have registered with Zhipu·AI before.
iFlytek Spark (V1.5 model, V2.0 model): Get 6 million tokens for free, 3 million tokens for each model, quotas are not interchangeable. Enter through Dify, complete the registration on iFlytek Spark's open platform (only for phone numbers not previously registered with iFlytek Spark), return to Dify, wait for 5 minutes, and refresh the page to see the available quota.
MiniMax: Get 1 million tokens for free. Click "Get Free" to receive the quota without manual registration, regardless of whether you have registered with MiniMax before.
Once the trial quota is credited, select the model you need to use in Prompt Arrangement --> Model and Parameters --> Language Model.
This indicates that your OpenAI key's account has run out of funds. Please go to OpenAI to recharge.
Error one:
Error two:
First, check if the front-end and back-end versions are the latest and consistent. Second, this error may occur because you are using an Azure OpenAI key but have not successfully deployed the model. Check if the model is deployed in your Azure OpenAI. The gpt-3.5-turbo model version must be 0613 or above (as versions before 0613 do not support the function call capability used by Zhichat, making it unusable).
Usually, this is due to your environment setting a proxy. Please check if a proxy is set.
Each model has different parameter values. Set the parameter values according to the current model's range.
In the parameter settings on the orchestration page, reduce the value of "max token."
The default model can be configured in Settings - Model Providers. Currently, it supports text generation models from providers like OpenAI / Azure OpenAI / Anthropic, and also supports integration of open-source models hosted on Hugging Face / Replicate / xinference.
Check if the API key for the Embedding model you are using has reached the rate limit.
If you encounter the error "Invalid token," try the following solutions:
Clear the browser cache (Cookies, Session Storage, and Local Storage). If using a mobile app, clear the corresponding app's cache and re-access it.
Generate a new App URL and re-enter the URL.
Because Claude does not support the Embedding model, the Embedding process and other dialogue generation by default use OpenAI's key, thus consuming OpenAI's quota. You can also set other default inference models and Embedding models in Settings - Model Providers.
Set the header in the first row, and display content in each subsequent row without additional header settings or complex table formats.
For example, in the table below, only retain the second row's header. The first row (Table 1) is an extra header and should be removed.
Dify supports the following for use as Embedding models. Simply select the Embeddings
type in the configuration box.
Azure
LocalAI
MiniMax
OpenAI
Replicate
XInference
GPUStack
This feature provides application templates for cloud version users to reference, and currently does not support setting your created applications as templates. If you use the cloud version, you can Add to Workspace or Customize it to become your own application. If you use the community version and need to create more application templates for your team, you can contact our commercialization team for paid technical support: business@dify.ai
.
Please check if you have reached the official API call rate limit. Refer to the for details.
Currently, the maximum size for a single document upload is 15MB, with a total document limit of 100. If you need to adjust these limits for a locally deployed version, refer to .
Whether to use the dataset depends on the dataset's description. Make the dataset description as clear as possible. For specific writing techniques, refer to .
OpenAI's GPT-4 model API and ChatGPT Plus are two separate products with separate charges. The model API has its own pricing. Refer to . To apply for paid access, you must first bind a card. Binding a card grants GPT-3.5 access but not GPT-4 access. GPT-4 access requires a paid bill. Refer to for details.