# App Toolkit Source: https://docs.dify.ai/en/use-dify/build/additional-features Optional features that make your Dify apps more useful Dify apps come with optional features you can enable to improve the end-user experience. Open the **Features** panel of the builder to see what's available for your app type. Features Panel in Chatbots, Agents, and Text Generators

Features Panel in Chatbots, Agents, and Text Generators

## Conversation Opener Set an opening message that greets users at the start of each conversation, with optional suggested questions to guide them toward what the app does well. You can insert variables into the opening message and suggested questions to personalize the experience. * In the opening message, type `{` or `/` to insert variables from the picker. * In suggested questions, type variable names manually in `{{variable_name}}` format. Configuration

## Follow-up When enabled, the LLM automatically suggests 3 follow-up questions after each response, helping users continue the conversation. This feature is a simple toggle—no additional configuration is needed. Follow-up questions are generated by a separate LLM call using your workspace's system reasoning model (set in **Settings** > **Model Provider** > **Default Model Settings**), not the model configured in your app. ## Text to Speech Convert AI responses to audio. You can configure the language and voice to match your app's audience, and enable **Auto Play** to stream audio automatically as the AI responds. **Text to Speech** uses your workspace's text-to-speech model (set in **Settings** > **Model Provider** > **Default Model Settings**). The feature only appears in the **Features** panel when a default TTS model is configured. ## Speech to Text Enable voice input for the chat interface. When enabled, your end users can dictate messages instead of typing by clicking the microphone button. **Speech to Text** uses your workspace's speech-to-text model (set in **Settings** > **Model Provider** > **Default Model Settings**). The feature only appears in the **Features** panel when a default STT model is configured. ## File Upload Allow end users to send files at any point during a conversation. You can configure which file types to accept, the upload method, and the maximum number of files per message. For self-hosted deployments, you can adjust file size limits via the following environment variables: * `UPLOAD_IMAGE_FILE_SIZE_LIMIT` (default: 10 MB) * `UPLOAD_FILE_SIZE_LIMIT` (default: 15 MB) * `UPLOAD_AUDIO_FILE_SIZE_LIMIT` (default: 50 MB) * `UPLOAD_VIDEO_FILE_SIZE_LIMIT` (default: 100 MB) See [Environment Variables](/en/self-host/configuration/environments) for details. ## Citations and Attributions Show the source documents behind AI responses. When enabled, responses that draw from a connected knowledge base display numbered citations linking back to the original documents and chunks.

## Content Moderation Filter inappropriate content in user inputs, AI outputs, or both. Choose a moderation provider based on your needs: * **OpenAI Moderation**: Use OpenAI's dedicated moderation model to detect harmful content across multiple categories. * **Keywords**: Define a list of blocked terms. Any match triggers the preset response. * **API Extension**: Connect a custom moderation endpoint for your own filtering logic. When content is flagged, the app replaces it with a preset response that you define. ## Annotation Reply Define curated Q\&A pairs that take priority over LLM responses. When a user's query **semantically** matches an annotation above the score threshold (how closely a query must match), the curated answer is returned directly without calling the LLM. You can configure the score threshold and the embedding model used for semantic matching. To create and manage your annotations: * Convert existing conversations into annotations directly from **Debug & Preview** or **Logs** by clicking the **Add Annotation** icon on any LLM response. Once a message is annotated, the icon changes to **Edit**, so you can modify the annotation in place. Add Annotation Icon

* In the **Logs & Annotations** > **Annotations** tab, manually add new Q\&A pairs, manage existing annotations, and view hit history. Click `...` to bulk import or bulk export. Bulk Annotation Operation

## More Like This Generate alternative outputs for the same input. Once enabled, each generated result includes a button to produce a variation, so you can explore different responses without re-entering your query.

You can generate up to 2 variations per result. Each variation uses additional tokens. # Agent Source: https://docs.dify.ai/en/use-dify/build/agent Chat-style apps where the model can reason, make decisions, and use tools autonomously Agents are chat-style apps where the model can reason through a task, decide what to do next, and use tools when needed to complete the user's request. Use it when you want the model to autonomously decide how to approach a task using available tools, without designing a multi-step workflow. For example, building a data analysis assistant that can fetch live data, generate charts, and summarize findings on its own. Agents keep up to 500 messages or 2,000 tokens of history per conversation. If either limit is exceeded, the oldest messages will be removed to make room for new ones. Agents support optional features like conversation openers, follow-up suggestions, and more. See [App Toolkit](/en/use-dify/build/additional-features) for details. ## Configure ### Write the Prompt The prompt tells the model what to do, how to respond, and what constraints to follow. For an agent, the prompt also guides how the model reasons through tasks and decides when to use tools, so be specific about the workflow you expect. Here are some tips for writing effective prompts: * **Define the persona**: Describe who the model should act as and the expertise it should draw on. * **Specify the output format**: Describe the structure, length, or style you expect. * **Set constraints**: Tell the model what to avoid or what rules to follow. * **Guide tool usage**: Mention specific tools by name and describe when they should be used. * **Outline the workflow**: Break down complex tasks into logical steps the model should follow. #### Create Dynamic Prompts with Variables To adapt the agent to different users or contexts without rewriting the prompt each time, add variables to collect the necessary information upfront. Variables are placeholders in the prompt—each one appears as an input field that users fill in before the conversation starts, and their values are injected into the prompt at runtime. Users can also update variable values mid-conversation, and the prompt will adjust accordingly. For example, a data analysis agent might use a domain variable so users can specify which area to focus on: ```text wrap theme={null} You are a data analyst specializing in {{domain}}. Help users explore and understand their data. When asked a question, use available data tools to fetch the relevant information. If the result suits a visual format, generate a chart. Explain your findings in plain language. Keep responses concise. If a question is ambiguous, ask for clarification before fetching data. ``` While drafting the prompt, type `/` > **New Variable** to quickly insert a named placeholder. You can configure its details in the **Variables** section later. Choose the variable type that matches the input you expect: Accepts up to 256 characters. Use it for names, email addresses, titles, or any brief text input that fits on a single line. Allows long-form text without length restrictions. It gives users a multi-line text area for detailed descriptions. Displays a dropdown menu with predefined options. Restricts input to numerical values only—ideal for quantities, ratings, IDs, or any data requiring mathematical processing. Provides a simple yes/no option. When a user checks the box, the output is `true`; otherwise, it's `false`. Use it for confirmations or any case that requires a binary choice. Fetches variable values from an external API at runtime instead of collecting them from users. Use it when your prompt needs dynamic data from an external source, such as live weather conditions or database records. See [API Extension](/en/use-dify/workspace/api-extension/api-extension) for details. **Label Name** is what end users see for each input field. #### Generate or Improve the Prompt with AI If you're unsure where to start or want to refine the existing prompt, click **Generate** to let an LLM help you draft it. Describe what you want from scratch, or reference `current_prompt` and specify what to improve. For more targeted results, add an example in **Ideal Output**. Each generation is saved as a version, so you can experiment and roll back freely. ### Extend the Agent with Dify Tools Add [Dify tools](/en/use-dify/workspace/tools) to enable the model to interact with external services and APIs for tasks beyond text generation, such as fetching live data, searching the web, or querying databases. The model decides when and which tools to use based on each query. To guide this more precisely, mention specific tool names in your prompt and describe when they should be used.

You can disable or remove added tools, and modify their configuration. If a tool requires authentication, select an existing credential or create a new one. To change the default credential, go to **Tools** or **Plugins**. #### Maximum Iterations **Maximum Iterations** in **Agent Settings** limits how many times the model can repeat its reasoning-and-action cycle (think, call a tool, process the result) for a single request. Increase this value for complex, multi-step tasks that require multiple tool calls. Higher values increase latency and token costs. ### Ground Responses in Your Own Data To ground the model's responses in your own data rather than general knowledge, add a knowledge base. The model evaluates each user query against your knowledge base descriptions and decides whether retrieval is needed—you don't need to mention knowledge bases in your prompt. **The more detailed your knowledge base description, the better the model can determine relevance**, leading to more accurate and targeted retrieval. #### Configure App-Level Retrieval Settings To fine-tune how retrieval results are processed, click **Retrieval Setting**. There are two layers of retrieval settings—the knowledge base level and the app level. Think of them as two consecutive filters: the knowledge base settings determine the initial pool of results, and the app settings further rerank the results or narrow down the pool. * **Rerank Settings** * **Weighted Score** The relative weight between semantic similarity and keyword matching during reranking. Higher semantic weight favors meaning relevance, while higher keyword weight favors exact matches. Weighted Score is available only when all added knowledge bases are indexed with **High Quality** mode. * **Rerank Model** The rerank model to re-score and reorder all the results based on their relevance to the query. If any multimodal knowledge bases are added, select a multimodal rerank model (marked with a **Vision** tag) as well. Otherwise, retrieved images will be excluded from reranking and the final output. * **Top K** The maximum number of top results to return after reranking. When a rerank model is selected, this value will be automatically adjusted based on the model's maximum input capacity (how much text the model can process at once). * **Score Threshold** The minimum similarity score for returned results. Results scoring below this threshold are excluded. Use higher thresholds for stricter relevance or lower thresholds to include broader matches. #### Search Within Specific Documents By default, retrieval searches across the entire knowledge base. To restrict retrieval to specific documents, enable manual or automatic metadata filtering. This improves retrieval precision, especially when your knowledge base is large or contains content for different contexts. For creating and managing document metadata, see [Metadata](/en/use-dify/knowledge/metadata). ### Process Multimodal Inputs To allow end users to upload files, select a model with the corresponding multimodal capabilities. The relevant file type toggles—**Vision**, **Audio**, or **Document**—appear once the model supports them, and you can enable each as needed. You can quickly identify a model's supported modalities by its tags. Model Tags

Click **Settings** under **Vision** to configure how files are accepted and processed. Upload settings apply across all enabled file types. * **Resolution**: Controls the detail level for **image** processing only. * **High**: Better accuracy for complex images but uses more tokens * **Low**: Faster processing with fewer tokens for simple images * **Upload Method**: Choose whether users can upload from their device, paste a URL, or both. * **Upload Limit**: The maximum number of files a user can upload per message. For self-hosted deployments, you can adjust file size limits via the following environment variables: * `UPLOAD_IMAGE_FILE_SIZE_LIMIT` (default: 10 MB) * `UPLOAD_FILE_SIZE_LIMIT` (default: 15 MB) * `UPLOAD_AUDIO_FILE_SIZE_LIMIT` (default: 50 MB) See [Environment Variables](/en/self-host/configuration/environments) for details. ## Debug & Preview In the preview panel on the right, test your agent in real time. Select a model, type a message, and send it to see how the agent responds. You can adjust a model's parameters to control how it generates responses. Available parameters and presets vary by model. To compare outputs across different models, click **Debug as Multiple Models** to run up to 4 models simultaneously. Debug with Multiple Models

We recommend selecting models that are strong at **reasoning** and **natively support tool calling**. An agent needs to judge *when* to use a tool, *which tool* fits the task, and *how* to interpret the result—this depends on the model's reasoning ability. Models with built-in tool-call support also execute these decisions more reliably. You can verify your model's tool-call support in **Agent Settings**, where the system automatically displays the agent mode: * **Function Calling** for models with native support, meaning they can call tools directly. * **ReAct** for others, so Dify guides them to use tools through a prompting strategy. ## Publish When you're happy with the results, click **Publish** to make your app available. See [Publish](/en/use-dify/publish/README) for the full list of publishing options. # Chatbot Source: https://docs.dify.ai/en/use-dify/build/chatbot The simplest way to build a conversational app with a model and a prompt Chatbots are conversational apps where users interact with the model through a chat interface. Use it for tasks that benefit from back-and-forth interaction but don't require tool calls or a multi-step workflow—for example, building an internal Q\&A assistant grounded in your team's knowledge base. Chatbots keep up to 500 messages or 2,000 tokens of history per conversation. If either limit is exceeded, the oldest messages will be removed to make room for new ones. Chatbots also support optional features like conversation openers, follow-up suggestions, and more. See [App Toolkit](/en/use-dify/build/additional-features) for details. ## Configure ### Write the Prompt The prompt tells the model what to do, how to respond, and what constraints to follow. It shapes how the model behaves throughout the conversation, so think of it as defining a consistent persona rather than describing a one-off task. Here are some tips for writing effective prompts: * **Define the persona**: Describe who the model should act as and the tone it should use. * **Specify the output format**: Describe the structure, length, or style you expect. * **Set constraints**: Tell the model what to avoid or what rules to follow. #### Create Dynamic Prompts with Variables To adapt your chatbot to different users or contexts without rewriting the prompt each time, add variables to collect the necessary information upfront. Variables are placeholders in the prompt—each one appears as an input field that users fill in before the conversation starts, and their values are injected into the prompt at runtime. Users can also update variable values mid-conversation, and the prompt will adjust accordingly. For example, an onboarding assistant might use `role` and `language` to tailor its responses: ```text wrap theme={null} You are an onboarding assistant for new {{role}} hires. Answer questions about company processes and policies. Keep answers friendly and concise, and respond in {{language}}. ``` While drafting the prompt, type `/` > **New Variable** to quickly insert a named placeholder. You can configure its details in the **Variables** section later. Choose the variable type that matches the input you expect: Accepts up to 256 characters. Use it for names, email addresses, titles, or any brief text input that fits on a single line. Allows long-form text without length restrictions. It gives users a multi-line text area for detailed descriptions. Displays a dropdown menu with predefined options. Restricts input to numerical values only—ideal for quantities, ratings, IDs, or any data requiring mathematical processing. Provides a simple yes/no option. When a user checks the box, the output is `true`; otherwise, it's `false`. Use it for confirmations or any case that requires a binary choice. Fetches variable values from an external API at runtime instead of collecting them from users. Use it when your prompt needs dynamic data from an external source, such as live weather conditions or database records. See [API Extension](/en/use-dify/workspace/api-extension/api-extension) for details. **Label Name** is what end users see for each input field. #### Generate or Improve the Prompt with AI If you're unsure where to start or want to refine the existing prompt, click **Generate** to let an LLM help you draft it. Describe what you want from scratch, or reference `current_prompt` and specify what to improve. For more targeted results, add an example in **Ideal Output**. Each generation is saved as a version, so you can experiment and roll back freely. ### Ground Responses in Your Own Data To ground the model's responses in your own data rather than general knowledge, add a knowledge base. Each time a user sends a message, it is used as the search query to retrieve relevant content from the knowledge base, which is then injected into the prompt as context for the model. #### Configure App-Level Retrieval Settings To fine-tune how retrieval results are processed, click **Retrieval Setting**. There are two layers of retrieval settings—the knowledge base level and the app level. Think of them as two consecutive filters: the knowledge base settings determine the initial pool of results, and the app settings further rerank the results or narrow down the pool. * **Rerank Settings** * **Weighted Score** The relative weight between semantic similarity and keyword matching during reranking. Higher semantic weight favors meaning relevance, while higher keyword weight favors exact matches. Weighted Score is available only when all added knowledge bases are indexed with **High Quality** mode. * **Rerank Model** The rerank model to re-score and reorder all the results based on their relevance to the query. If any multimodal knowledge bases are added, select a multimodal rerank model (marked with a **Vision** tag) as well. Otherwise, retrieved images will be excluded from reranking and the final output. * **Top K** The maximum number of top results to return after reranking. When a rerank model is selected, this value will be automatically adjusted based on the model's maximum input capacity (how much text the model can process at once). * **Score Threshold** The minimum similarity score for returned results. Results scoring below this threshold are excluded. Use higher thresholds for stricter relevance or lower thresholds to include broader matches. #### Search Within Specific Documents By default, retrieval searches across the entire knowledge base. To restrict retrieval to specific documents, enable manual or automatic metadata filtering. This improves retrieval precision, especially when your knowledge base is large or contains content for different contexts. For creating and managing document metadata, see [Metadata](/en/use-dify/knowledge/metadata). ### Process Multimodal Inputs To allow end users to upload files, select a model with the corresponding multimodal capabilities. The relevant file type toggles—**Vision**, **Audio**, or **Document**—appear once the model supports them, and you can enable each as needed. You can quickly identify a model's supported modalities by its tags. Model Tags

Click **Settings** under **Vision** to configure how files are accepted and processed. Upload settings apply across all enabled file types. * **Resolution**: Controls the detail level for **image** processing only. * **High**: Better accuracy for complex images but uses more tokens * **Low**: Faster processing with fewer tokens for simple images * **Upload Method**: Choose whether users can upload from their device, paste a URL, or both. * **Upload Limit**: The maximum number of files a user can upload per message. For self-hosted deployments, you can adjust file size limits via the following environment variables: * `UPLOAD_IMAGE_FILE_SIZE_LIMIT` (default: 10 MB) * `UPLOAD_FILE_SIZE_LIMIT` (default: 15 MB) * `UPLOAD_AUDIO_FILE_SIZE_LIMIT` (default: 50 MB) See [Environment Variables](/en/self-host/configuration/environments) for details. ## Debug & Preview In the preview panel on the right, test your chatbot in real time. Select a model that best fits your task, type a message, and send it to see how the model responds. After selecting a model, you can adjust its parameters to control how it generates responses. Available parameters and presets vary by model. To compare outputs across different models, click **Debug as Multiple Models** to run up to 4 models simultaneously. Debug with Multiple Models

## Publish When you're happy with the results, click **Publish** to make your app available. See [Publish](/en/use-dify/publish/README) for the full list of publishing options. # Using MCP Tools Source: https://docs.dify.ai/en/use-dify/build/mcp Connect external tools from [MCP servers](https://modelcontextprotocol.io/docs/getting-started/intro) to your Dify apps. Instead of just built-in tools, you can use tools from the growing [MCP ecosystem](https://mcpservers.org/). This covers using MCP tools in Dify. To publish Dify apps as MCP servers, see [here](/en/use-dify/publish/publish-mcp). Only supports MCP servers with [HTTP transport](https://modelcontextprotocol.io/docs/learn/architecture#transport-layer) right now. ## Adding MCP servers Go to **Tools** → **MCP** in your workspace. MCP Server List

Click **Add MCP Server (HTTP)**: MCP Add Server Dialog

**Server URL**: Where the MCP server lives (like `https://api.notion.com/mcp`) **Name & Icon**: Call it something useful. Dify tries to grab icons automatically. **Server ID**: Unique identifier (lowercase, numbers, underscores, hyphens, max 24 chars) Never change the server ID once you start using it. This will break any apps that use tools from this server. ## What happens next Dify automatically: 1. Connects to the server 2. Handles any OAuth stuff 3. Gets the list of available tools 4. Makes them available in your app builder You'll see a server card once it finds tools: MCP Server Card

## Managing servers Click any server card to: **Update Tools**: Refresh when the external service adds new tools MCP Server Tools List

**Re-authorize**: Fix auth when tokens expire **Edit Settings**: Change server details (but not the ID!) **Remove**: Disconnect the server (this breaks apps using its tools) ## Using MCP tools Once connected, MCP tools show up everywhere you'd expect: **In agents**: Tools appear grouped by server ("Notion MCP » Create Page") **In workflows**: MCP tools become available as nodes **In agent nodes**: Same as regular agents ## Customizing tools When you add an MCP tool, you can customize it: MCP Tool Settings

**Description**: Override the default description to be more specific **Parameters**: For each tool parameter, choose: * **Auto**: Let the AI decide the value * **Fixed**: Set a specific value that never changes **Example**: For a search tool, set `numResults` to 5 (fixed) but keep `query` on auto. ## Sharing apps When you export apps that use MCP tools: * The export includes server IDs * To use the app elsewhere, add the same servers with identical IDs * Document which MCP servers your app needs ## Troubleshooting **"Unconfigured Server"**: Check the URL and re-authorize **Missing tools**: Hit "Update Tools" **Broken apps**: You probably changed a server ID. Add it back with the original ID. ## Tips * Use permanent, descriptive server IDs like `github-prod` or `crm-system` * Keep the same MCP setup across dev/staging/production * Set fixed values for config stuff, auto for dynamic inputs * Test MCP integrations before deploying # Flow Logic Source: https://docs.dify.ai/en/use-dify/build/orchestrate-node ## Serial vs. Parallel execution ![](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/3984e13db72e2bd19870f5764ec000cf.jpeg) Flows execute differently depending on how you connect the nodes. When you connect nodes one after another, they execute in sequence. Each node waits for the previous one to finish before starting. Each node may use variables from any node that ran before it in the chain. ![When You Connect Nodes One After Another, They Execute in Sequence](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/e8e884e146994b5f95cb16ec31cdd81b.png) When you connect multiple nodes to the same starting node, they all run at the same time. Nodes may not reference parallel node outputs. ![When You Connect Multiple Nodes to the Same Starting Node, They All Run at The](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/5ba85864454880561ec95a37db382f20.png) You can have a maximum of 10 parallel branches from one node, and up to 3 levels of nested parallel structures. ## Variable access In serial flows, nodes can access variables from any previous node in the chain. In parallel flows, nodes can access variables from nodes that ran before the parallel split, but they cannot access variables from other parallel nodes since they're running simultaneously. After parallel branches finish, downstream nodes can access variables from all the parallel outputs. ## Answer node streaming Answer nodes handle parallel outputs differently. When an Answer node references variables from multiple parallel branches, it streams content progressively: * Content streams up to the first unresolved variable * Once that variable's node completes, streaming continues to the next unresolved variable * The order of variables in the Answer node determines the streaming sequence, not the node execution order For example, in a flow where `Node A -> Node B -> Answer` with the Answer containing `{{B}}` then `{{A}}`, the Answer will wait for Node B before streaming any content, even if Node A completes first. # Handling Errors Source: https://docs.dify.ai/en/use-dify/build/predefined-error-handling-logic ![](https://assets-docs.dify.ai/2024/12/6e2655949889d4d162945d840d698649.png) [LLM](/en/use-dify/nodes/llm), [HTTP](/en/use-dify/nodes/http-request), [Code](/en/use-dify/nodes/code), and [Tool](/en/use-dify/nodes/tools) nodes support error handling out-of-box. When a node fails, it can take one of the three behaviors below: The default behavior. When a node fails, the whole workflow stops. You get the original error message. Use this when: * You're testing and want to see what broke * The workflow can't continue without this step When a node fails, use a backup value instead. The workflow keeps running. ![When a Node Fails, Use a Backup Value Instead](https://assets-docs.dify.ai/2024/12/e9e5e757090679243e0c9976093c7e6c.png) **Requirements** * The default value must match the node's output type -- if it outputs a string, your default must be a string. **Example** Your LLM node normally returns analysis, but sometimes it fails due to rate limits. Set a default value like: ``` "Sorry, I'm temporarily unavailable. Please try again in a few minutes." ``` Now users get a helpful message instead of a broken workflow. When a node fails, trigger a separate flow to handle the error. ![When a Node Fails, Trigger a Separate Flow to Handle the Error](https://assets-docs.dify.ai/2024/12/e5ea1af947818bd9e27cab3042c1c4f3.png) The fail branch is highlighted in orange. You can: * Send error notifications * Try a different approach * Log the error for debugging * Use a backup service **Example** Your main API fails, so the fail branch calls a backup API instead. Users never know there was a problem. ## Error in Loop/Iteration Nodes When child nodes fail inside loops and iterations, these control flow nodes have their own error behaviors. **Loop nodes** always stop immediately when any child node fails. The entire loop terminates and returns the error, preventing any further iterations from running. **Iteration nodes** let you choose how to handle child node failures through the error handling mode setting: * `terminated` - Stops processing immediately when any item fails (default) * `continue-on-error` - Skips the failed item and continues with the next one * `remove-abnormal-output` - Continues processing but filters out failed items from the final output When you set an iteration to `continue-on-error`, failed items return `null` in the output array. When you use `remove-abnormal-output`, the output array only contains successful results, making it shorter than the input array. ## Error variables When using default value or fail branch, you get two special variables: * `error_type` - What kind of error happened (see [Error Types](/en/use-dify/debug/error-type)) * `error_message` - The actual error details Use these to: * Show users helpful messages * Send alerts to your team * Choose different recovery strategies * Log errors for debugging **Example** ``` {% if error_type == "rate_limit" %} Too many requests. Please wait a moment and try again. {% else %} Something went wrong. Our team has been notified. {% endif %} ``` # Hotkeys Source: https://docs.dify.ai/en/use-dify/build/shortcut-key Speed up your workflow building with keyboard shortcuts. **[Go to Anything](/en/use-dify/build/goto-anything)**: Press `Cmd+K` (macOS) or `Ctrl+K` (Windows) anywhere in Dify to search and jump to everything—apps, plugins, knowledge bases, even workflow nodes. Use slash commands like `/theme` to change appearance, `/language` to switch languages, or `/help` to access documentation. Command+K Search Interface

## Node operations With any selected node(s) on canvas:

Windows	macOS	Action
`Ctrl` + `C`	`Cmd` + `C`	Copy nodes
`Ctrl` + `V`	`Cmd` + `V`	Paste nodes
`Ctrl` + `D`	`Cmd` + `D`	Duplicate nodes
`Delete`	`Delete`	Delete selected nodes
`Ctrl` + `O`	`Cmd` + `O`	Auto-arrange nodes
`Shift`	`Shift`	Visualize variable dependencies (single node only)

## Canvas navigation

Windows	macOS	Action
`Ctrl` + `1`	`Cmd` + `1`	Fit to view
`Ctrl` + `-`	`Cmd` + `-`	Zoom out
`Ctrl` + `=`	`Cmd` + `=`	Zoom in
`Shift` + `1`	`Shift` + `1`	Reset to 100%
`Shift` + `5`	`Shift` + `5`	Set to 50%
`H`	`H`	Hand tool (pan)
`V`	`V`	Select tool

## History

Windows	macOS	Action
`Ctrl` + `Z`	`Cmd` + `Z`	Undo
`Ctrl` + `Y`	`Cmd` + `Y`	Redo
`Ctrl` + `Shift` + `Z`	`Cmd` + `Shift` + `Z`	Redo

## Testing

Windows	macOS	Action
`Alt` + `R`	`Option` + `R`	Run workflow

# Text Generator Source: https://docs.dify.ai/en/use-dify/build/text-generator Simple single-turn apps for generating text from a prompt and user inputs Text Generators are simple single-turn apps: you write a prompt, provide inputs, and the model generates a response. It's a good fit for tasks that don't require multi-turn conversation, tool calls, or a multi-step workflow. Just a clear input, one model call, and a ready-to-use output. Text Generators support optional features like generating multiple outputs at once, text to speech, and content moderation. See [App Toolkit](/en/use-dify/build/additional-features) for details. ## Configure ### Write the Prompt The prompt tells the model what to do, how to respond, and what constraints to follow. Since a Text Generator runs in a single turn with no conversation history, the prompt is the model's only source of context—include everything it needs to produce the right output in one pass. Here are some tips for writing effective prompts: * **Define the task clearly**: State what the model should produce (e.g., a translation, a summary, a SQL statement). * **Specify the output format**: Describe the structure, length, or style you expect. * **Set constraints**: Tell the model what to avoid or what rules to follow. Because a Text Generator always requires user input to run, a paragraph-type `query` variable is automatically inserted into the prompt when you create a new app. You can rename `query` or change its type. Variables are placeholders—each one becomes an input field that users fill in before running the app, and their values are substituted into the prompt at runtime. For example: ```text wrap theme={null} You are a professional editor. Summarize the following text into 3 concise bullet points. Use neutral tone and avoid adding information not present in the original text. {{query}} ``` While drafting the prompt, type `/` > **New Variable** to quickly insert a named placeholder. You can configure its details in the **Variables** section later. Choose the variable type that matches the input you expect: Accepts up to 256 characters. Use it for names, email addresses, titles, or any brief text input that fits on a single line. Allows long-form text without length restrictions. It gives users a multi-line text area for detailed descriptions. Displays a dropdown menu with predefined options. Restricts input to numerical values only—ideal for quantities, ratings, IDs, or any data requiring mathematical processing. Provides a simple yes/no option. When a user checks the box, the output is `true`; otherwise, it's `false`. Use it for confirmations or any case that requires a binary choice. Fetches variable values from an external API at runtime instead of collecting them from users. Use it when your prompt needs dynamic data from an external source, such as live weather conditions or database records. See [API Extension](/en/use-dify/workspace/api-extension/api-extension) for details. **Label Name** is what end users see for each input field. #### Create Dynamic Prompts with Variables To adapt your app to different users or contexts without rewriting the prompt each time, add more variables. Each variable collects a specific piece of information upfront and injects it into the prompt at runtime. For example, an SQL generator might use `database_type` to adapt the output dialect while `query` captures the user's natural language request: ```text wrap theme={null} You are an SQL generator. Translate the following natural language query into a {{database_type}} SQL statement: {{query}} ``` #### Generate or Improve the Prompt with AI If you're unsure where to start or want to refine the existing prompt, click **Generate** to let an LLM help you draft it. Describe what you want from scratch, or reference `current_prompt` and specify what to improve. For more targeted results, add an example in **Ideal Output**. Each generation is saved as a version, so you can experiment and roll back freely. ### Ground Responses in Your Own Data To ground the model's responses in your own data rather than general knowledge, add a knowledge base and select an existing variable as the **Query Variable**. When a user runs the app and fills in that field, its value is used as the search query to retrieve relevant content from the knowledge base. The retrieved content is then injected into the prompt as context, so the model can generate a more informed response. For example, suppose your knowledge base contains style guides for different content types—blog posts, social media captions, product descriptions, and so on. In a content writing app, set `content_type` as the **Query Variable**. When a user selects a content type, the app retrieves the matching style guide and generates copy that follows the corresponding writing standards. Your prompt might look like this: ```text wrap theme={null} You are a brand content writer. Write a {{content_type}} based on the following brief: {{brief}} Follow the style and tone guidelines provided in the context. ``` #### Configure App-Level Retrieval Settings To fine-tune how retrieval results are processed, click **Retrieval Setting**. There are two layers of retrieval settings—the knowledge base level and the app level. Think of them as two consecutive filters: the knowledge base settings determine the initial pool of results, and the app settings further rerank the results or narrow down the pool. * **Rerank Settings** * **Weighted Score** The relative weight between semantic similarity and keyword matching during reranking. Higher semantic weight favors meaning relevance, while higher keyword weight favors exact matches. Weighted Score is available only when all added knowledge bases are indexed with **High Quality** mode. * **Rerank Model** The rerank model to re-score and reorder all the results based on their relevance to the query. If any multimodal knowledge bases are added, select a multimodal rerank model (marked with a **Vision** tag) as well. Otherwise, retrieved images will be excluded from reranking and the final output. * **Top K** The maximum number of top results to return after reranking. When a rerank model is selected, this value will be automatically adjusted based on the model's maximum input capacity (how much text the model can process at once). * **Score Threshold** The minimum similarity score for returned results. Results scoring below this threshold are excluded. Use higher thresholds for stricter relevance or lower thresholds to include broader matches. #### Search Within Specific Documents By default, retrieval searches across the entire knowledge base. To restrict retrieval to specific documents, enable manual or automatic metadata filtering. This improves retrieval precision, especially when your knowledge base is large or contains content for different contexts. For creating and managing document metadata, see [Metadata](/en/use-dify/knowledge/metadata). ### Process Multimodal Inputs To allow end users to upload files, select a model with the corresponding multimodal capabilities. The relevant file type toggles—**Vision**, **Audio**, or **Document**—appear once the model supports them, and you can enable each as needed. You can quickly identify a model's supported modalities by its tags. Model Tags

Click **Settings** under **Vision** to configure how files are accepted and processed. Upload settings apply across all enabled file types. * **Resolution**: Controls the detail level for **image** processing only. * **High**: Better accuracy for complex images but uses more tokens * **Low**: Faster processing with fewer tokens for simple images * **Upload Method**: Choose whether users can upload from their device, paste a URL, or both. * **Upload Limit**: The maximum number of files a user can upload per run. For self-hosted deployments, you can adjust file size limits via the following environment variables: * `UPLOAD_IMAGE_FILE_SIZE_LIMIT` (default: 10 MB) * `UPLOAD_FILE_SIZE_LIMIT` (default: 15 MB) * `UPLOAD_AUDIO_FILE_SIZE_LIMIT` (default: 50 MB) See [Environment Variables](/en/self-host/configuration/environments) for details. ## Debug & Preview In the preview panel on the right, test your app in real time. Select a model that best fits your task, fill in the input fields, and click **Run** to see the output. After selecting a model, you can adjust its parameters to control how it generates responses. Available parameters and presets vary by model. To compare outputs across different models, click **Debug as Multiple Models** to run up to 4 models simultaneously. Debug with Multiple Models

# Version Control Source: https://docs.dify.ai/en/use-dify/build/version-control Track changes and manage versions in Chatflow and Workflow apps. Only available for Chatflow and Workflow apps right now. ## How it works **Current Draft**: Your working version. This is where you make changes. Not live for users. ![Current Draft](https://assets-docs.dify.ai/2025/03/38296a597c0ca31b5fb70be2234f2363.png) **Latest Version**: The live version users see. ![Latest Version](https://assets-docs.dify.ai/2025/03/e4c06a0817c30cf9e8893487c889cb02.png) **Previous Versions**: Older published versions. ![Previous Versions](https://assets-docs.dify.ai/2025/03/4cd05033b93d84b53496f3d02e88601f.png) ## Publishing versions Click **Publish** → **Publish Update** to make your draft live. ![](https://assets-docs.dify.ai/2025/03/26f3f324ab4ecb965708d553ddd78d97.png) Your draft becomes the new Latest Version, and you get a fresh draft to work in. ![](https://assets-docs.dify.ai/2025/03/67e95de17577bc272addad6c33f8ea59.png) ## Viewing versions Click the history icon to see all versions: ![](https://assets-docs.dify.ai/2025/03/eed667bbc9498425342c09039054cf98.png) Filter by: * **All versions** or **only yours** * **Only named versions** (skip auto-generated names) ![Only Named Versions (Skip Auto-Generated Names)](https://assets-docs.dify.ai/2025/03/0bf8fef8858671a8fef160f49dd83dad.jpg) ## Managing versions **Name a version**: Give it a proper name instead of the auto-generated one ![Name a Version](https://assets-docs.dify.ai/2025/03/ac149f63da6611d7080d305dd3fad65c.jpg) **Edit version info**: Change the name and add release notes ![Edit Version Info](https://assets-docs.dify.ai/2025/03/1d840edf979132a9bbf1e065f95e663c.jpg) **Delete old versions**: Clean up versions you don't need ![Delete Old Versions](https://assets-docs.dify.ai/2025/03/25ad1999fc9f6f44fcac04526ac5563a.jpg) You can't delete the Current Draft or Latest Version. **Restore a version**: Load an old version back into your draft ![Restore a Version](https://assets-docs.dify.ai/2025/03/c96b714accc29df8e46e711782a7a6a9.jpg) This replaces your current draft completely. Make sure you don't have unsaved work. ## Example workflow Here's how versions work through a typical development cycle: ### 1. Start with a draft ![How Versions Work Through a Typical Development Cycle](https://assets-docs.dify.ai/2025/03/35ece9d5d5d4d8c46a3fb5ceae4d0c15.jpeg) ### 2. Publish first version ![](https://assets-docs.dify.ai/2025/03/3d1f66cdeb08710f01462a6b0f3ed0a8.jpeg) ### 3. Publish second version ![](https://assets-docs.dify.ai/2025/03/92ffbf88a3cbeeeeab47c1bd8b4f7198.jpeg) ### 4. Restore old version to draft ![](https://assets-docs.dify.ai/2025/03/541f1891416af90dab5b51bfec833249.jpeg) ### 5. Publish the restored version ![](https://assets-docs.dify.ai/2025/03/3572a4f2edef166c3f14e4ec4e68b297.jpeg) Complete demo: ![Complete Demo](https://assets-docs.dify.ai/2025/03/dc7c15a4dfafb72ce7fffea294d5b5e5.gif) ## Tips * Always test in draft before publishing * Use descriptive version names for important releases * Restore versions when you need to rollback quickly * Keep old versions around for reference # Workflow & Chatflow Source: https://docs.dify.ai/en/use-dify/build/workflow-chatflow Build agentic workflows that combine AI models, tools, and logic into reliable, repeatable processes ## Why Agentic Workflows AI models are powerful, but on their own they can be unpredictable—they may hallucinate, miss steps, or produce inconsistent outputs. In production environments, especially for teams and enterprises where reliability matters, you need more control over how AI operates. Agentic workflows solve this by embedding AI capabilities within a structured, repeatable process. Instead of relying on a single model to figure everything out, you design a flow that orchestrates models, tools, and logic step by step—with clear conditions, checkpoints, and fallback paths. The AI is still doing the heavy lifting, but within boundaries you define. ## Workflow vs. Chatflow Dify offers two app types for building agentic workflows: **Workflow** and **Chatflow**. Both are built on a shared visual canvas and node system. To build a flow, connect nodes that each handle a specific step, such as calling a model, retrieving knowledge, running code, or branching on conditions. Most of the work is **drag, connect, and configure**—code is only needed when your logic calls for it. Their core difference is how users interact with the app: * A **Workflow** runs once from start to finish. It takes an input, processes it through the flow, and returns a result. Use it for tasks like automated report generation, data processing pipelines, or batch processing. * A **Chatflow** adds a conversation layer. Users interact through a chat interface, and each message triggers the flow you designed before a response is generated. Use it for interactive assistants, guided Q\&A, or any conversational scenario that requires structured processing behind each reply. Chatflows support optional features like content moderation, text to speech, and more. See [App Toolkit](/en/use-dify/build/additional-features) for details. # Error Types Source: https://docs.dify.ai/en/use-dify/debug/error-type Each node type throws specific error classes that help you understand what went wrong and how to fix it. ## Node-specific errors `CodeNodeError` Your Python or JavaScript code threw an exception during execution ![Code Error](https://assets-docs.dify.ai/2024/12/c86b11af7f92368180ea1bac38d77083.png) `OutputValidationError` The data type your code returned doesn't match the output variable type you configured `DepthLimitError` Your code created nested data structures deeper than 5 levels `CodeExecutionError` The sandbox service couldn't execute your code - usually means the service is down ![CodeExecutionError](https://assets-docs.dify.ai/2024/12/ab8cae01a590b037017dfe9ea4dbbb8b.png) `VariableNotFoundError` Your prompt template references a variable that doesn't exist in the workflow context ![VariableNotFoundError](https://assets-docs.dify.ai/2024/12/f20c5fbde345144de6183374ab277662.png) `InvalidContextStructureError` You passed an array or object to the context field, which only accepts strings `NoPromptFoundError` The prompt field is completely empty `ModelNotExistError` No model is selected in the LLM node configuration `LLMModeRequiredError` The selected model doesn't have valid API credentials configured `InvalidVariableTypeError` Your prompt template isn't valid Jinja2 syntax or plain text format ![InvalidVariableTypeError](https://assets-docs.dify.ai/2024/12/9882f7a5ee544508ba11b51fb469a911.png) `AuthorizationConfigError` Missing or invalid authentication configuration for the API endpoint `InvalidHttpMethodError` HTTP method must be GET, HEAD, POST, PUT, PATCH, or DELETE `ResponseSizeError` API response exceeded the 10MB size limit `FileFetchError` Couldn't retrieve a file variable referenced in the request `InvalidURLError` The URL format is malformed or unreachable `ToolParameterError` Parameters passed to the tool don't match its expected schema `ToolFileError` The tool couldn't access required files `ToolInvokeError` The external tool API returned an error during execution

![](https://assets-docs.dify.ai/2024/12/84af0831b7cb23e64159dfbba80e9b28.jpg)

`ToolProviderNotFoundError` The tool provider isn't installed or configured properly ## System-level errors `InvokeConnectionError` Network connection failed to the external service `InvokeServerUnavailableError` External service returned a 503 status or is temporarily down `InvokeRateLimitError` You've hit rate limits on the API or model provider `QuotaExceededError` Your usage quota has been exceeded for this service # Run History Source: https://docs.dify.ai/en/use-dify/debug/history-and-logs Dify records detailed Run History every time your workflow runs. You can see what happened at both the application level and for individual nodes. For Run History from live users after publishing, see [Logs](/en/use-dify/monitor/logs). ## Application Run History Each workflow run creates a complete log entry. Click any entry to see three sections: ![Each Workflow Run Creates a Complete Log Entry](https://assets-docs.dify.ai/2025/04/08a885858cfa6e8863faac891a5be319.png) ### Result Shows the final output that users see. If the workflow failed, you'll see error messages here. ![Shows the Final Output That Users See](https://assets-docs.dify.ai/2025/06/22856751d278ffad99d0533d2d96e125.png) Only available for Workflow applications. ### Detail Shows the original input, final output, and system metadata from the execution. ![Shows the Original Input, Final Output, and System Metadata from the Execution](https://assets-docs.dify.ai/2025/06/882b783cd843ab666f5bc3c06f78521d.png) ### Tracing Shows exactly how your workflow executed, including which nodes ran in what order, how long each took, and where data flowed between them. This is useful for finding bottlenecks and understanding complex workflows with branches or loops. ![Shows Exactly How Your Workflow Executed, Including Which Nodes Ran in What](https://assets-docs.dify.ai/2025/06/9e614ac01b1f6e0aeadda78c91ce93b7.png) ## Node Run History You can also check the last execution of any individual node. Click "Last run" in the node's config panel to see its most recent input, output, and timing details. ![](https://assets-docs.dify.ai/2025/06/9c6e57236d85f426a930424863042d7d.png) # Single Node Source: https://docs.dify.ai/en/use-dify/debug/step-run Test individual nodes or run through your workflow step-by-step to catch issues before publishing. ## Single node testing You can test any node individually without running the entire workflow. Select the node, provide test input in its settings panel, and click Run to see the output. ![](https://assets-docs.dify.ai/2025/04/376c9de6f92cb7a5f97a6661c5e0e9eb.png) After testing, click "Last run" to see execution details including inputs, outputs, timing, and any error messages. Answer and End nodes don't support single node testing. ## Step-by-step execution When you run nodes one at a time, their outputs are cached in the Variable Inspector. You can edit these cached variables to test different scenarios without re-running upstream nodes. ![When You Run Nodes One at a Time, Their Outputs Are Cached in the Variable](https://assets-docs.dify.ai/2025/06/f8656d8deeeaefeab0a8d9169f0ed2d3.png) This is useful when you want to test how a node responds to different data without having to modify and re-run all the nodes before it. Just change the variable values in the inspector and run the node again. ## Viewing execution history Every node execution creates a record. Click "Last run" on any node to see its most recent execution details including what data went in, what came out, and how long it took. ![Every Node Execution Creates a Record](https://assets-docs.dify.ai/2025/04/5ee92e6406979f5101d21865f95a86e5.png) # Variable Inspector Source: https://docs.dify.ai/en/use-dify/debug/variable-inspect The Variable Inspector shows you all the data flowing through your workflow. It captures inputs and outputs from each node after they run, so you can see what's happening and test different scenarios. ![](https://assets-docs.dify.ai/2025/06/38f26d7339f64abfdfb6955b1c34f4ae.png) ## Viewing variables After any node runs, its output variables appear in the inspector panel at the bottom of the screen. Click any variable to see its full content. ![Any Node Runs, Its Output Variables Appear in the Inspector Panel at the Bottom](https://assets-docs.dify.ai/2025/06/94a4741c25204db5fd1281ec475093d9.png) ## Editing variables You can edit most variable values by clicking on them. When you run downstream nodes, they'll use your edited values instead of the original ones. This lets you test different scenarios without re-running the entire workflow. Editing variables here doesn't change the "Last run" record for the node that originally created them. For example, if an LLM node generates SQL like `SELECT * FROM users`, you can edit it to `SELECT username FROM users` in the inspector and then re-run just the database node to see different results. ![](https://assets-docs.dify.ai/2025/06/fb8c49fc0c8c63866f1a9379e8752d9e.png) ## Resetting variables Click the revert icon next to any variable to restore its original value, or click "Reset all" to clear all cached variables at once. ![](https://assets-docs.dify.ai/2025/06/b713290543a0feb95ecab65336e97483.png) # Introduction Source: https://docs.dify.ai/en/use-dify/getting-started/introduction Dify is an open-source platform for building agentic workflows. It lets you define processes visually, connect your existing tools and data sources, and deploy AI applications that solve real problems. Dify Studio Overview

Start shipping powerful apps in minutes Core Dify building blocks explained Deploy Dify on your own laptop / server Trade notes with the community What's changed over past releases Example Dify use case walkthroughs The name Dify comes from **D**o **I**t **F**or **Y**ou. # Key Concepts Source: https://docs.dify.ai/en/use-dify/getting-started/key-concepts Quick overview of essential Dify concepts ### Dify App Dify is made for agentic app building. In **Studio**, you can quickly build agentic workflows via a drag & drop interface and publish them as apps. You can access published apps via API, the web, or as an [MCP server](/en/use-dify/publish/publish-mcp). Dify offers two main app types: workflow and chatflow. You will need to choose an app type when creating a new app. We recommend choosing Workflow or Chatflow your app type. But in addition to these, Dify also offers 3 more basic app types: Chatbot, Agent, and Text Generator. App Type Selector

These app types run on the same workflow engine underneath, but comes with simpler legacy interfaces: Chatbot Interface

### Workflow Build workflow apps to handle single-turn tasks. The webapp interface and API provides easy access to batch execute many tasks at once. Underneath it all, workflow forms the basis for all other app types in Dify. You can specify how and when to start your workflow. There are two types of Start nodes: * **[User Input](/en/use-dify/nodes/user-input)**: Direct user interaction or API call invokes the app. * **[Trigger](/en/use-dify/nodes/trigger/overview)**: The application runs automatically on a schedule or in response to a specific third-party event. User Input and Trigger Start nodes are mutually exclusive—they cannot be used on the same canvas. To switch between them, right-click the current start node > **Change Node**. Alternatively, delete the current start node and add a new one. Only workflows started by User Input can be published as standalone web apps or MCP servers, exposed through backend service APIs, or used as tools in other Dify applications. ### Chatflow Chatflow is a special type of workflow app that gets triggered at every turn of a conversation. Other than workflow features, chatflow comes with the ability to store and update custom conversation-specific variables, enable memory in LLM nodes, and stream formatted text, images, and files at different points throughout the chatflow run. Unlike workflow, chatflow can't use [Trigger](/en/use-dify/nodes/trigger/overview) to start. ### Dify DSL All Dify apps can be exported into a YAML file in Dify's own DSL (Domain-Specific Language) and you may create Dify apps from these DSL files directly. This makes it easy to port apps to other Dify instances and share with others. ### Variables A variable is a labeled container to store information, so you can find and use that information later by referencing its name. You'll come across different types of variables when building a Dify app: **Inputs**: You can specify any number of input variables at the [User Input](/en/use-dify/nodes/user-input) node for your app's end users to fill in. User Input Node Variables

Additionally, the User Input node comes with a set of input variables that you can reference later in the flow. Depending on the app type (workflow or chatflow), different variables are provided. | Variable Name |

Data Type

| Description | Notes | | :-------------------- | :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `sys.conversation_id` | String | A unique ID for the chatting box interaction session, grouping all related messages into the same conversation, ensuring that the LLM continues the chatting on the same topic and context. | | | `sys.dialogue_count` | Number | The number of conversations turns during the user's interaction with a Chatflow application. The count automatically increases by one after each chat round and can be combined with if-else nodes to create rich branching logic.

For example, LLM will review the conversation history at the X conversation turn and automatically provide an analysis. | | | `sys.user_id` | String | A unique ID is assigned for each application user to distinguish different conversation users. | The Service API does not share conversations created by the WebApp. This means users with the same ID will have separate conversation histories between API and WebApp interfaces. | | `sys.app_id` | String | App ID: A unique identifier automatically assigned by the system to each App. This parameter is used to record the basic information of the current application. | This parameter is used to differentiate and locate distinct Workflow applications for users with development capabilities. | | `sys.workflow_id` | String | Workflow ID: This parameter records information about all nodes information in the current Workflow application. | This parameter can be used by users with development capabilities to track and record information about the nodes contained within a Workflow. | | `sys.workflow_run_id` | String | Workflow Run ID: Used to record the runtime status and execution logs of a Workflow application. | This parameter can be used by users with development capabilities to track the application's historical execution records. | User inputs are set at the start of each workflow run and cannot be updated. **Outputs**: Each node produces one or more outputs that can be referenced in subsequent nodes. For instance, the LLM node has outputs: LLM Node Output Variables

Like inputs, node outputs cannot be updated either. **Environment Variables**: Use environment variable to store sensitive information like API keys specific to your app. This allows a clean separation between secrets and the Dify app itself, so you don't have to risk exposing passwords and keys when sharing your app's DSL. Environment variables are also constants and cannot be updated. **Conversation Variables (Chatflow only)**: These variables are conversation-specific -- meaning they persist over multi-turn chatflow runs in a single conversation so you can store and access dynamic information like to-do list and token cost. You can update the value of a conversation variable via the Variable Assigner node: Conversation Variables Panel

### Variable Referencing You can easily pass variables to any node when configuring its input field by selecting from a dropdown: Variable Picker Dropdown

You can also insert variable values into complex text inputs by typing `/` slash, and selecting the desired variable from the dropdown. Variable Slash Insert

# 30-Minute Quick Start Source: https://docs.dify.ai/en/use-dify/getting-started/quick-start Dive into Dify through an example app This step-by-step tutorial will walk you through creating a multi-platform content generator from scratch. Beyond basic LLM integration, you'll discover how to use powerful Dify nodes to orchestrate sophisticated AI applications faster with less effort. By the end of this tutorial, you'll have a workflow that takes whatever content you throw at it (text, documents, or images), adds your preferred voice and tone, and spits out polished, platform-specific social media posts in your chosen language. The complete workflow is shown below. Feel free to refer back to this as you build to stay on track and see how all the nodes work together. Workflow Overview

## Before You Start Go to [Dify Cloud](https://cloud.dify.ai) and sign up for free. New accounts on the Sandbox plan include 200 AI credits for calling models from providers like OpenAI, Anthropic, and Gemini. AI credits are a one-time allocation and don't renew monthly. Go to **Settings** > **Model Provider** and install the OpenAI plugin. This tutorial uses `gpt-5.2` for the examples. If you're using Sandbox credits, no API key is required—the plugin is ready to use once installed. You can also configure your own API key and use it instead. 1. In the top-right corner of the **Model Provider** page, click **Default Model Settings**. 2. Set the **System Reasoning Model** to `gpt-5.2`. This becomes the default model in the workflow. ## Step 1: Create a New Workflow 1. Go to **Studio**, then select **Create from blank** > **Workflow**. 2. Name the workflow `Multi-platform content generator` and click **Create**. You'll automatically land on the workflow canvas to start building. 3. Select the User Input node to start our workflow. ## Step 2: Orchestrate & Configure Keep any unmentioned settings at their default values. Give nodes and variables clear, descriptive names to make them easier to identify and reference. ### 1. Collect User Inputs: User Input Node First, we need to define what information to gather from users for running our content generator, such as the draft text, target platforms, desired tone, and any reference materials. The User Input node is where we can easily set this up. Each input field we add here becomes a variable that all downstream nodes can reference and use. Click the User Input node to open its configuration panel, then add the following input fields. * Field type: `Paragraph` * Variable Name: `draft` * Label Name: `Draft` * Max length: `2048` * Required: `Yes` * Field type: `File list` * Variable Name: `user_file` * Label Name: `Upload File (≤ 10)` * Support File Types: `Document`, `Image` * Upload File Types: `Both` * Max number of uploads: `10` * Required: `No` * Field type: `Paragraph` * Variable Name: `voice_and_tone` * Label Name: `Voice & Tone` * Max length: `2048` * Required: `No` * Field type: `Short Text` * Variable Name: `platform` * Label Name: `Target Platform (≤ 10)` * Max length: `256` * Required: `Yes` * Field type: `Select` * Variable Name: `language` * Label Name: `Language` * Options: * `English` * `日本語` * `简体中文` * Required: `Yes` User Input

### 2. Identify Target Platforms: Parameter Extractor Node Since our platform field accepts free-form text input, users might type in various ways: `x and linkedIn`, `post on Twitter and LinkedIn`, or even `Twitter + LinkedIn please`. However, we need a clean and structured list, like `["Twitter", "LinkedIn"]`, that downstream nodes can work with reliably. This is the perfect job for the Parameter Extractor node. In our case, it uses the gpt-5.2 model to analyze users' natural language, recognize all these variations, and output a standardized array. After the User Input node, add a Parameter Extractor node and configure it: 1. In the **Input Variable** field, select `User Input/platform`. 2. Add an extract parameter: * Name: `platform` * Type: `Array[String]` * Description: `The platform(s) for which the user wants to create tailored content.` * Required: `Yes` 3. In the **Instruction** field, paste the following to guide the LLM in parameter extraction: ```markdown INSTRUCTION theme={null} # TASK DESCRIPTION Parse platform names from input and output as a JSON array. ## PROCESSING RULES - Support multiple delimiters: commas, semicolons, spaces, line breaks, "and", "&", "|", etc. - Standardize common platform name variants (twitter/X→Twitter, insta→Instagram, etc.) - Remove duplicates and invalid entries - Preserve unknown but reasonable platform names - Preserve the original language of platform names ## OUTPUT REQUIREMENTS - Success: ["Platform1", "Platform2"] - No platforms found: [No platforms identified. Please enter a valid platform name.] ## EXAMPLES - Input: "twitter, linkedin" → ["Twitter", "LinkedIn"] - Input: "x and insta" → ["Twitter", "Instagram"] - Input: "invalid content" → [No platforms identified. Please enter a valid platform name.] ``` Note that we've instructed the LLM to output a specific error message for invalid inputs, which will serve as the end trigger for our workflow in the next step.

### 3. Validate Platform Extraction Results: IF/ELSE Node What if a user enters an invalid platform name, like `ohhhhhh` or `BookFace`? We don't want to waste time and tokens generating useless content. In such cases, we can use an IF/ELSE node to create a branch that stops the workflow early. We'll set a condition that checks for the error message from the Parameter Extractor node; if that message is detected, the workflow will route directly to an Output node and end.

1. After the Parameter Extractor node, add an IF/ELSE node. 2. On the IF/ELSE node's panel, define the **IF** condition: **IF** `Parameter Extractor/platform` **contains** `No platforms identified. Please enter a valid platform name.` 3. After the IF/ELSE node, add an Output node to the IF branch. 4. On the Output node's panel, set `Parameter Extractor/platform` as the output variable. ### 4. Separate Uploaded Files by Type: List Operator Node Our users can upload both images and documents as reference materials, but these two types require different handling with `gpt-5.2`: images can be interpreted directly via its vision capability, while documents must first be converted to text before the model can process them. To manage this, we'll use two List Operator nodes to filter and split the uploaded files into separate branches—one for images and one for documents.

1. After the IF/ELSE node, add **two** parallel List Operator nodes to the ELSE branch. 2. Rename one node to `Image` and the other to `Document`. 3. Configure the Image node: 1. Set `User Input/user_file` as the input variable. 2. Enable **Filter Condition**: `{x}type` **in** `Image`. 4. Configure the Document node: 1. Set `User Input/user_file` as the input variable. 2. Enable **Filter Condition**: `{x}type` **in** `Doc`. ### 5. Extract Text from Documents: Doc Extractor Node `gpt-5.2` cannot directly read uploaded documents like PDF or DOCX, so we must first convert them into plain text. This is exactly what a Doc Extractor node does. It takes document files as input and outputs clean, usable text for the next steps.

1. After the Document node, add a Doc Extractor node. 2. On the Doc Extractor node's panel, set `Document/result` as the input variable. ### 6. Integrate All Reference Materials: LLM Node When users provide multiple reference types—draft text, documents, and images—simultaneously, we need to consolidate them into a single, coherent summary. An LLM node will handle this task by analyzing all the scattered pieces to create a comprehensive context that guides subsequent content generation. Integrate Information

1. After the Doc Extractor node, add an LLM node. 2. Connect the Image node to this LLM node as well. 3. Click the LLM node to configure it: 1. Rename it to `Integrate Info`. 2. Enable **VISION** and set `Image/result` as the vision variable. 3. In the system instruction field, paste the following: ```markdown wrap theme={null} # ROLE & TASK You are a content strategist. Analyze the provided draft and reference materials (if any), then create a comprehensive content foundation for multi-platform social media optimization. # ANALYSIS PRINCIPLES - Work exclusively with provided information—no external assumptions - Focus on extraction, synthesis, and strategic interpretation - Identify compelling and actionable elements - Prepare insights adaptable across different platforms # REQUIRED ANALYSIS Deliver structured analysis with: ## 1. CORE MESSAGE - Central theme, purpose, objective - Key value or benefit being communicated ## 2. ESSENTIAL CONTENT ELEMENTS - Primary topics, facts, statistics, data points - Notable quotes, testimonials, key statements - Features, benefits, characteristics mentioned - Dates, locations, contextual details ## 3. STRATEGIC INSIGHTS - What makes content compelling/unique - Emotional/rational appeals present - Credibility factors, proof points - Competitive advantages highlighted ## 4. ENGAGEMENT OPPORTUNITIES - Discussion points, questions emerging - Calls-to-action, next steps suggested - Interactive/participation opportunities - Trending themes touched upon ## 5. PLATFORM OPTIMIZATION FOUNDATION - High-impact: Quick, shareable formats - Professional: Business-focused discussions - Community: Interaction and sharing - Visual: Enhanced with strong visuals ## 6. SUPPORTING DETAILS - Metrics, numbers, quantifiable results - Direct quotes, testimonials - Technical details, specifications - Background context available ``` 4. Click **Add Message** to add a user message, then paste the following. Type `{` or `/` to replace `Doc Extractor/text` and `User Input/draft` with the corresponding variables from the list. ```markdown USER theme={null} Draft: User Input/draft Reference material: Doc Extractor/text ```

### 7. Create Customized Content for Each Platform: Iteration Node Now that the integrated references and target platforms are ready, let's generate a tailored post for each platform using an Iteration node. The node will loop through the list of platforms and run a sub-workflow for each: first analyze the specific platform's style guidelines and best practices, then generate optimized content based on all available information.

1. After the Integrate Info node, add an Iteration node. 2. Inside the Iteration node, add an LLM node and configure it: 1. Rename it to `Identify Style`. 2. In the system instruction field, paste the following: ```markdown wrap theme={null} # ROLE & TASK You are a social media expert. Analyze the platform and provide content creation guidelines. # ANALYSIS REQUIRED For the given platform, provide: ## 1. PLATFORM PROFILE - Platform type and category - Target audience characteristics ## 2. CONTENT GUIDELINES - Optimal content length (characters/words) - Recommended tone (professional/casual/conversational) - Formatting best practices (line breaks, emojis, etc.) ## 3. ENGAGEMENT STRATEGY - Hashtag recommendations (quantity and style) - Call-to-action best practices - Algorithm optimization tips ## 4. TECHNICAL SPECS - Character/word limits - Visual content requirements - Special formatting needs ## 5. PLATFORM-SPECIFIC NOTES - Unique features or recent changes - Industry-specific considerations - Community engagement approaches # OUTPUT REQUIREMENTS - For recognized platforms: Provide specific guidelines - For unknown platforms: Base recommendations on similar platforms - Focus on actionable, practical advice - Be concise but comprehensive ``` 3. Click **Add Message** to add a user message, then paste the following. Type `{` or `/` to replace `Current Iteration/item` with the corresponding variable from the list. ```markdown USER theme={null} Platform: Current Iteration/item ``` 3. After the Identify Style node, add another LLM node and configure it: 1. Rename it to `Create Content`. 2. In the system instruction field, paste the following: ```markdown wrap theme={null} # ROLE & TASK You are an expert social media content creator. Generate publication-ready content that matches platform guidelines, incorporates source information, and follows specified voice/tone and language requirements. # LANGUAGE REQUIREMENT - Generate ALL content exclusively in the target language specified in the user message. You MUST write the entire post in that language, regardless of the language of any source materials. - No mixing of languages whatsoever - Adapt platform terminology to the target language # CONTENT REQUIREMENTS - Follow platform guidelines exactly (format, length, tone, hashtags) - Integrate source information effectively (key messages, data, value props) - Apply voice & tone consistently (if provided) - Optimize for platform-specific engagement - Ensure cultural appropriateness for the specified language # OUTPUT FORMAT - Generate ONLY the final social media post content. No explanations or meta-commentary. Content must be immediately copy-paste ready. - Maximum heading level: ## (H2) - never use # (H1) - No horizontal dividers: avoid --- # QUALITY CHECKLIST ✅ Platform guidelines followed ✅ Source information integrated ✅ Voice/tone consistent (when provided) ✅ Language consistency maintained ✅ Engagement optimized ✅ Publication ready ``` 3. Click **Add Message** to add a user message, then paste the following. Type `{` or `/` to replace all inputs with the corresponding variable from the list. ```markdown USER theme={null} Platform Name: Current Iteration/item Target Language: User Input/language Platform Guidelines: Identify Style/text Source Information: Integrate Info/text Voice & Tone: User Input/voice_and_tone ``` 4. Enable structured output. This allows us to extract specific pieces of information from the LLM's response in a more reliable way, which is crucial for the next step where we format the final output.

1. Next to **Output Variables**, toggle **Structured** on. The `structured_output` variable will appear below. Click **Configure**. 2. In the pop-up schema editor, click **Import From JSON** in the top-right corner, and paste the following: ```json theme={null} { "platform_name": "string", "post_content": "string" } ```

4. Click the Iteration node to configure it: 1. Set `Parameter Extractor/platform` as the input variable. 2. Set `Create Content/structured_output` as the output variable. 3. Enable **Parallel Mode** and set the maximum parallelism to `10`. This is why we included `(≤10)` in the label name for the target platform field back in the User Input node. Iteration Configuration

### 8. Format the Final Output: Template Node The Iteration node generates a post for each platform, but its output is a raw array of data (e.g., `[{"platform_name": "Twitter", "post_content": "..."}]`) that isn't very readable. We need to present the results in a clearer format. That's where the Template node comes in—it allows us to format this raw data into well-organized text using [Jinja2](https://jinja.palletsprojects.com/en/stable/) templating, ensuring the final output is user-friendly and easy to read.

1. After the Iteration node, add a Template node. 2. On the Template node's panel, set `Iteration/output` as the input variable and name it `output`. 3. Paste the following Jinja2 code: ``` {% for item in output %} # 📱 {{ item.platform_name }} {{ item.post_content }} {% endfor %} ``` * `{% for item in output %}` / `{% endfor %}`: Loops through each platform-content pair in the input array. * `{{ item.platform_name }}`: Displays the platform name as an H1 heading with a phone emoji. * `{{ item.post_content }}`: Displays the generated content for that platform. * The blank line between `{{ item.post_content }}` and `{% endfor %}` adds spacing between platforms in the final output. While LLMs can handle output formatting as well, their outputs can be inconsistent and unpredictable. For rule-based formatting that requires no reasoning, the Template node gets things done in a more stable and reliable way at zero token cost. LLMs are incredibly powerful, but knowing when to use the right tool is key to building more reliable and cost-effective AI applications. ### 9. Return the Results to Users: Output Node 1. After the Template node, add an Output node. 2. On the Output node's panel, set the `Template/output` as the output variable. ## Step 3: Test Your workflow is now complete! Let's test it out. 1. Make sure your Checklist is clear. Check Checklist

2. Check your workflow against the reference diagram provided at the beginning to ensure all nodes and connections match. 3. Click **Test Run** in the top-right corner, fill in the input fields, then click **Start Run**. If you're not sure what to enter, try these sample inputs: * **Draft**: `We just launched a new AI writing assistant that helps teams create content 10x faster.` * **Upload File**: Leave empty * **Voice & Tone**: `Friendly and enthusiastic, but professional` * **Target Platform**: `Twitter and LinkedIn` * **Language**: `English` A successful run produces a formatted output with a separate post for each platform, like this: Test Output

Your results may vary depending on the model you're using. Higher-capability models generally produce better output quality. To test how a node reacts to different inputs from previous nodes, you don't need to re-run the entire workflow. Just click **View cached variables** at the bottom of the canvas, find the variable you want to change, and edit its value. If you encounter any errors, check the **Last Run** logs of the corresponding node to identify the exact cause of the problem. ## Step 4: Publish & Share Once the workflow runs as expected and you're happy with the results, click **Publish** > **Publish Update** to make it live and shareable. If you make any changes later, always remember to publish again so the updates take effect. # Connect to External Knowledge Base Source: https://docs.dify.ai/en/use-dify/knowledge/connect-external-knowledge-base Integrate external knowledge sources with Dify applications through API connections to leverage custom RAG systems or third-party knowledge services If your team maintains its own RAG system or hosts content in a third-party knowledge service like [AWS Bedrock](https://aws.amazon.com/bedrock/), you can connect these external sources to Dify instead of migrating content into Dify's built-in knowledge base. This lets your AI applications retrieve information directly from your existing infrastructure while you retain full control over the retrieval logic and content management. ![External Knowledge Base Architecture](https://assets-docs.dify.ai/2025/03/f5fb91d18740c1e2d3938d4d106c4d3c.png) **Connecting an external knowledge base involves three steps**: 1. [Build an API service that Dify can query](#step-1-build-the-retrieval-api). 2. [Register the API endpoint in Dify](#step-2-register-an-external-knowledge-api). 3. [Connect a specific knowledge source through the registered API](#step-3-create-an-external-knowledge-base). When your application runs, Dify sends retrieval requests to your endpoint and uses the returned chunks as context for LLM responses. If you're connecting to LlamaCloud, install the [LlamaCloud plugin](https://marketplace.dify.ai/plugin/langgenius/llamacloud) instead of building a custom API. See the [video walkthrough](https://www.youtube.com/watch?v=FaOzKZRS-2E) for a complete setup demo. If you're building a plugin for another knowledge service, the LlamaCloud plugin's [source code](https://github.com/langgenius/dify-official-plugins/tree/main/extensions/llamacloud) is available for reference. Dify only has retrieval access to external knowledge bases—it cannot modify or manage your external content. You maintain the knowledge base and its retrieval logic independently. ## Step 1: Build the Retrieval API Build an API service that implements the [External Knowledge API specification](/en/use-dify/knowledge/external-knowledge-api). Your service needs a single `POST` endpoint that accepts a search query and returns matching text chunks with similarity scores. ## Step 2: Register an External Knowledge API An External Knowledge API stores your endpoint URL and authentication credentials. Multiple knowledge bases can share one API connection. 1. Go to **Knowledge**, click **External Knowledge API** in the upper-right corner, then click **Add an External Knowledge API**. 2. Fill in the following fields: * **Name**: A label to distinguish this API connection from others. * **API Endpoint**: The base URL of your external knowledge service. Dify appends `/retrieval` automatically when sending requests. * **API Key**: The authentication credential for your service. Dify sends this as a Bearer token in the `Authorization` header. Dify validates the connection by sending a test request to your endpoint when you save. ## Step 3: Create an External Knowledge Base With the API registered, connect an external knowledge source to Dify. This creates a knowledge base in Dify that is linked to your external system. 1. Go to **Knowledge** and click **Connect to an External Knowledge Base**. Connect to External Knowledge Base

2. Fill in the following fields: * **External Knowledge Name** and **Knowledge Description** (optional). * **External Knowledge API**: Select the API connection you registered. * **External Knowledge ID**: The identifier of the specific knowledge source within your external system, passed to your API as the `knowledge_id` field. This is whatever ID your external service uses to distinguish between different knowledge bases. For example, a Bedrock knowledge base ARN or an ID you defined in your own system. The **External Knowledge API** and **External Knowledge ID** cannot be changed after creation. To use a different API or knowledge source, create a new external knowledge base. * **Retrieval Settings**: * **Top K**: Maximum number of chunks to retrieve per query. Higher values return more results but may include less relevant content. * **Score Threshold**: Minimum similarity score for returned chunks. Enable this to filter out low-relevance results. Use higher value for stricter relevance or lower value to include broader matches. When disabled, all results up to the Top K limit are returned regardless of score. Once created, the external knowledge base is available for use in your applications just like any built-in knowledge base. See [Integrate Knowledge Within Application](/en/use-dify/knowledge/integrate-knowledge-within-application) for details. ## Troubleshoot ### Connection Refused or Timeout (Self-Hosted) Dify routes outbound HTTP requests through a Squid-based SSRF proxy. If your external knowledge service runs on the same host as Dify or its domain is not allowlisted, the proxy blocks the request. To allow connections, add your service's domain to the `allowed_domains` ACL in `docker/ssrf_proxy/squid.conf.template`: ```text theme={null} acl allowed_domains dstdomain .marketplace.dify.ai .your-kb-service.com ``` Restart the SSRF proxy container after editing. ### API Response Format Issues If retrieval fails or returns unexpected results, verify your API response against the [External Knowledge API specification](/en/use-dify/knowledge/external-knowledge-api#response). Common issues: * The `metadata` field in each record must be an object (`{}`), not `null`. A `null` value causes errors in the retrieval pipeline. * The `content` and `score` fields must be present in every record. # Configure the Chunk Settings Source: https://docs.dify.ai/en/use-dify/knowledge/create-knowledge/chunking-and-cleaning-text ## What is Chunking? Documents imported into knowledge bases are split into smaller segments called **chunks**. Think of chunking like organizing a large book into chapters and paragraphs—you can't quickly find specific information in one massive block of text, but well-organized sections make retrieval efficient. When users ask questions, the system searches through these chunks for relevant information and provides it to the LLM as context. Without chunking, processing entire documents for every query would be slow and inefficient. **Key Chunk Parameters** * **Delimiter**: The character or sequence where text is split. For example, `\n\n` splits at paragraph breaks, `\n` at line breaks. Delimiters are removed during chunking. For example, using `A` as the delimiter splits `CBACD` into `CB` and `CD`. To avoid information loss, use non-content characters that don't naturally appear in your documents. * **Maximum chunk length**: The maximum size of each chunk in characters. Text exceeding this limit is force-split regardless of delimiter settings. ## Choose a Chunk Mode The chunk mode cannot be changed once the knowledge base is created. However, chunk settings like the delimiter and maximum chunk length can be adjusted at any time. ### Mode Overview In General mode, all chunks share the same settings. Matched chunks are returned directly as retrieval results. **Chunk Settings** Beyond delimiter and maximum chunk length, you can also configure **Chunk overlap** to specify how many characters overlap between adjacent chunks. This helps preserve semantic connections and prevents important information from being split across chunk boundaries. For example, with a 50-character overlap, the last 50 characters of one chunk will also appear as the first 50 characters of the next chunk. In Parent-child mode, text is split into two tiers: smaller **child chunks** and larger **parent chunks**. When a query matches a child chunk, its entire parent chunk is returned as the retrieval result. This solves a common retrieval dilemma: smaller chunks enable precise query matching but lack context, while larger chunks provide rich context but reduce retrieval accuracy. Parent-child mode balances both—retrieving with precision and responding with context. **Parent Chunk Settings** Parent chunks can be created in **Paragraph** or **Full Doc** mode. The document is split into multiple parent chunks based on the specified delimiter and maximum chunk length. Suitable for lengthy documents with well-structured sections where each section provides meaningful context independently. The entire document serves as a single parent chunk. Suitable for small, cohesive documents where the full context is essential for understanding any specific detail. In **Full Doc** mode: * Only the first 10,000 tokens are processed. Content beyond this limit will be truncated. * The parent chunk cannot be edited once created. To modify it, you must upload a new document. **Child Chunk Settings** Each parent chunk is further split into child chunks using their own delimiter and maximum chunk length settings. ### Quick Comparison | Dimension | General Mode | Parent-child Mode | | :------------------------------------------------------------------------------------------ | :----------------------------------------------------- | :------------------------------------------------------------------------------------------------ | | Chunking Strategy | Single-tier: all chunks use the same settings | Two-tier: separate settings for parent and child chunks | | Retrieval Workflow | Matched chunks are directly returned | Child chunks are used for matching queries; parent chunks are returned to provide broader context | | Compatible [Index Method](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods) | High Quality, Economical | High Quality only | | Best For | Simple, self-contained content like glossaries or FAQs | Information-dense documents like technical manuals or research papers where context matters | ## Pre-process Text Before Chunking Before splitting text into chunks, you can clean up irrelevant content to improve retrieval quality. * **Replace consecutive spaces, newlines, and tabs** * Three or more consecutive newlines → two newlines * Multiple spaces → single space * Tabs, form feeds, and special Unicode spaces → regular space * **Remove all URLs and email addresses** This setting is ignored in **Full Doc** mode. ## Enable Summary Auto-Gen Available for self-hosted deployments only. Automatically generate summaries for all chunks to enhance their retrievability. Summaries are embedded and indexed for retrieval as well. When a summary matches a query, its corresponding chunk is also returned. You can manually edit auto-generated summaries or regenerate them for specific documents later. See [Manage Knowledge Content](/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents) for details. If you select a vision-capable LLM, summaries will be generated based on both the chunk text and any attached images. ## Preview Chunks Click **Preview** to see how your content will be chunked. A limited number of chunks will be displayed for a quick review. If the results don't perfectly match your expectations, choose the closest configuration—you can manually fine-tune chunks later. See [Manage Knowledge Content](/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents) for details. For multiple documents, click the file name at the top of the preview panel to switch between them. # Upload Local Files Source: https://docs.dify.ai/en/use-dify/knowledge/create-knowledge/import-text-data/readme Once a knowledge base is created, its data source cannot be changed later. When quick-creating a knowledge base, you can upload local files as its data source: 1. Click **Knowledge** > **Create Knowledge**. 2. Select **Import from file** as the data source, then upload your files. * Maximum number of files per upload: 5 On Dify Cloud, **batch uploading** (up to 50 files per upload) is only available on [paid plans](https://dify.ai/pricing). * Maximum file size: 15 MB For self-hosted deployments, you can adjust these two limits via the environment variables `UPLOAD_FILE_SIZE_LIMIT` and `UPLOAD_FILE_BATCH_LIMIT`. *** **For Images in Uploaded Files** JPG, JPEG, PNG, and GIF images under 2 MB are automatically extracted as attachments to their corresponding chunks. These images can be managed independently and are returned alongside their chunks during retrieval. URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won't affect the extracted images. If you select a multimodal embedding model (marked with a **Vision** icon) in index settings, the extracted images will also be embedded and indexed for retrieval. Each chunk supports up to 10 image attachments; images beyond this limit will not be extracted. For self-hosted deployments, you can adjust the following limits via environment variables: * Maximum image size: `ATTACHMENT_IMAGE_FILE_SIZE_LIMIT` * Maximum number of attachments per chunk: `SINGLE_CHUNK_ATTACHMENT_LIMIT` The above extraction rule applies to: * Images embedded in DOCX files Images embedded in other file types (e.g., PDF) can be extracted by using appropriate document extraction plugins in [knowledge pipelines](/en/use-dify/knowledge/knowledge-pipeline/readme). * Images referenced via accessible URLs using the following Markdown syntax in any file type: * `![alt text](image_url)` * `![alt text](image_url "optional title")` # Sync Data from Notion Source: https://docs.dify.ai/en/use-dify/knowledge/create-knowledge/import-text-data/sync-from-notion Dify datasets support importing from Notion and setting up **synchronization** so that data updates in Notion are automatically synced to Dify. ### Authorization Verification 1. When creating a dataset and selecting the data source, click **Sync from Notion Content -- Bind Now** and follow the prompts to complete the authorization verification. 2. Alternatively, you can go to **Settings -- Data Sources -- Add Data Source**, click on the Notion source **Bind**, and complete the authorization verification. ![Alternatively, You Can Go to Settings -- Data Sources -- Add Data Source, Click](https://assets-docs.dify.ai/2024/12/f1d5bcdcfbd57407e0bce1597df4daad.png) ### Importing Notion Data After completing the authorization verification, go to the create dataset page, click **Sync from Notion Content**, and select the authorized pages you need to import. ![Completing the Authorization Verification, Go to the Create Dataset Page, Click](https://assets-docs.dify.ai/2025/04/f9199ff4747b5aaff563e226412723d0.png) ### Chunking and Cleaning Next, choose a [chunking mode](/en/use-dify/knowledge/create-knowledge/chunking-and-cleaning-text) and [indexing method](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods) for your knowledge base, then save it and wait for the automatically processing. Dify not only supports importing standard Notion pages but can also consolidate and save page attributes from database-type pages. ***Note: images and files cannot be imported, and data from tables will be converted to text.*** ![\_Note](https://assets-docs.dify.ai/2025/04/723f7782853698598726d09997383747.png) ### Synchronizing Notion Data If your Notion content has been updated, you can sync the changes by clicking the **Sync** button for the corresponding page in the document list of your knowledge base. Syncing involves an embedding process, which will consume tokens from your embedding model. ![If Your Notion Content Has Been Updated, You Can Sync the Changes by Clicking](https://assets-docs.dify.ai/2024/12/af7cabd98c3aac392819d9041cc408de.png) ### Integration Configuration Method for Community Edition Notion Notion offers two integration options: **internal integration** and **public integration**. For more details on the differences between these two methods, please refer to the [official Notion documentation](https://developers.notion.com/guides/get-started/authorization). #### 1. Using Internal Integration First, create an integration in the integration settings page [Create Integration](https://www.notion.so/my-integrations). By default, all integrations start as internal integrations; internal integrations will be associated with the workspace you choose, so you need to be the workspace owner to create an integration. Specific steps: Click the **New integration** button. The type is **Internal** by default (cannot be modified). Select the associated space, enter the integration name, upload a logo, and click **Submit** to create the integration successfully. ![](https://assets-docs.dify.ai/2024/12/223a190a2e61e488fb821c5e3f0e9883.png) After creating the integration, you can update its settings as needed under the Capabilities tab and click the **Show** button under Secrets to copy the secrets. ![Creating the Integration, You Can Update Its Settings as Needed Under The](https://assets-docs.dify.ai/2024/12/83c1f1699ec4165b56ae8fea304d35f5.png) After copying, go back to the Dify source code, and configure the relevant environment variables in the **.env** file. The environment variables are as follows: ``` NOTION_INTEGRATION_TYPE = internal or NOTION_INTEGRATION_TYPE = public NOTION_INTERNAL_SECRET=you-internal-secret ``` #### **Using Public Integration** **You need to upgrade the internal integration to a public integration.** Navigate to the Distribution page of the integration, and toggle the switch to make the integration public. When switching to the public setting, you need to fill in additional information in the Organization Information form below, including your company name, website, and redirect URL, then click the **Submit** button. ![](https://assets-docs.dify.ai/2024/12/c37759d54f8e72685e1cacffa23d2e9f.png) After successfully making the integration public on the integration settings page, you will be able to access the integration key in the Keys tab: ![Successfully Making the Integration Public on the Integration Settings Page,](https://assets-docs.dify.ai/2024/12/c4af8b95298c6b86d80406bec09c31e7.png) Go back to the Dify source code, and configure the relevant environment variables in the **.env** file. The environment variables are as follows: ``` NOTION_INTEGRATION_TYPE=public NOTION_CLIENT_SECRET=your-client-secret NOTION_CLIENT_ID=your-client-id ``` After configuration, you can operate the Notion data import and synchronization functions in the dataset. # Import Data from Website Source: https://docs.dify.ai/en/use-dify/knowledge/create-knowledge/import-text-data/sync-from-website The knowledge base supports crawling content from public web pages using third-party tools such as [Jina Reader](https://jina.ai/reader/) and [Firecrawl](https://www.firecrawl.dev/), parsing it into Markdown content, and importing it into the knowledge base. [Firecrawl](https://www.firecrawl.dev/) and [Jina Reader](https://jina.ai/reader/) are both open-source web parsing tools that can convert web pages into clean Markdown format text that is easy for LLMs to recognize, while providing easy-to-use API services. The following sections will introduce the usage methods for Firecrawl and Jina Reader respectively. ## Firecrawl ### **1. Configure Firecrawl API credentials** Click on the avatar in the upper right corner, then go to the **DataSource** page, and click the **Configure** button next to Firecrawl. ![Configuring Firecrawl Credentials](https://assets-docs.dify.ai/2024/12/d468cf996f591b4b2bd0ffb5de62bad4.png) Log in to the [Firecrawl website](https://www.firecrawl.dev/) to complete registration, get your API Key, and then enter and save it in Dify. ![Get the API Key and Save It in Dify](https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FRncMhlfeYTrpujwzDIqw%2Fuploads%2FtAwcLoAYT1A2v12pfJC3%2Fimage.png?alt=media\&token=3b5b784f-2808-431f-8595-2638d038c190) ### 2. Scrape target webpage On the knowledge base creation page, select **Sync from website**, choose Firecrawl as the provider, and enter the target URL to be crawled. The configuration options include: Whether to crawl sub-pages, Page crawling limit, Page scraping max depth, Excluded paths, Include only paths, and Content extraction scope. After completing the configuration, click **Run** to preview the parsed pages. ![Execute Scraping](https://assets-docs.dify.ai/2024/12/3e63b4ced9770e21d5132c3aa8e5d2de.png) ### 3. Review import results After importing the parsed text from the webpage, it is stored in the knowledge base documents. View the import results and click **Add URL** to continue importing new web pages. *** ## Jina Reader ### 1. Configuring Jina Reader Credentials Click on the avatar in the upper right corner, then go to the **DataSource** page, and click the **Configure** button next to Jina Reader. ![Configuring Jina Reader](https://assets-docs.dify.ai/2024/12/28b37f9b36fe808b2d3302c48fce5ea3.png) Log in to the [Jina Reader website](https://jina.ai/reader/), complete registration, obtain the API Key, then fill it in and save. ### 2. Using Jina Reader to Crawl Web Content On the knowledge base creation page, select **Sync from website**, choose Jina Reader as the provider, and enter the target URL to be crawled. ![Web Crawling Configuration](https://assets-docs.dify.ai/2024/12/f9170b2a2ab1be94bc85ff3ed3c3e723.png) Configuration options include: whether to crawl subpages, maximum number of pages to crawl, and whether to use sitemap for crawling. After completing the configuration, click the **Run** button to preview the page links to be crawled. ![Executing the Crawl Process](https://assets-docs.dify.ai/2024/12/a875f21a751551c03109c76308c577ee.png) After importing the parsed text from web pages into the knowledge base, you can review the imported results in the documents section. To add more web pages, click the **Add URL** button on the right to continue importing new pages. ![Importing Parsed Web Text into the Knowledge Base](https://assets-docs.dify.ai/2024/12/03494dc3c882ac1c74b464ea931e2533.png) After crawling is complete, the content from the web pages will be incorporated into the knowledge base. # Quick Create Knowledge Source: https://docs.dify.ai/en/use-dify/knowledge/create-knowledge/introduction To quick-create and configure a knowledge base: 1. Click **Knowledge** > **Create Knowledge**, then [upload local files](/en/use-dify/knowledge/create-knowledge/import-text-data/readme), [sync data from Notion](/en/use-dify/knowledge/create-knowledge/import-text-data/sync-from-notion), or [webpages](/en/use-dify/knowledge/create-knowledge/import-text-data/sync-from-website), or create an empty knowledge base. 2. [Configure the chunk settings](/en/use-dify/knowledge/create-knowledge/chunking-and-cleaning-text) and preview the chunking results. This stage involves content preprocessing and structuring, where long texts are divided into multiple smaller chunks. 3. [Specify the index method and retrieval settings](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods). Once the knowledge base receives a user query, it searches existing documents according to preset retrieval methods and extracts highly relevant content chunks. 4. Wait for the data processing to complete. ![Wait for the Data Processing to Complete](https://assets-docs.dify.ai/2024/12/a3362a1cd384cb2b539c9858de555518.png) # Specify the Index Method and Retrieval Settings Source: https://docs.dify.ai/en/use-dify/knowledge/create-knowledge/setting-indexing-methods After selecting the chunking mode, the next step is to define the index method for structured content. ## Select the Index Method Similar to the search engines use efficient indexing algorithms to match search results most relevant to user queries, the selected index method directly impacts the retrieval efficiency of the LLM and the accuracy of its responses to knowledge base content. The knowledge base offers two index methods: **High-Quality** and **Economical**, each with different retrieval setting options. Once a knowledge base is created in the High Quality index method, it cannot switch to Economical later. The High Quality index method uses an embedding model to convert content chunks into vector representations. This process is called embedding. Think of these vectors as coordinates in a multi-dimensional space—the closer two points are, the more similar their meanings. This allows the system to find relevant information based on semantic similarity, not just exact keyword matches. To enable cross-modal retrieval—retrieving both text and images based on semantic relevance—select a multimodal embedding model (marked with a **Vision** icon). Images extracted from documents will then be embedded and indexed for retrieval. Knowledge bases using such embedding models are labeled **Multimodal** on their cards. Multimodal Knowledge Base

The High-Quality index method supports three retrieval strategies: vector search, full-text search, or hybrid search. Learn more in [Configure the Retrieval Settings](#configure-the-retrieval-settings). ### Q\&A Mode Q\&A mode is available for self-hosted deployments only. When this mode is enabled, the system segments the uploaded text and automatically generates Q\&A pairs for each segment after summarizing its content. Compared with the common **Q to P** strategy (user questions matched with text paragraphs), the Q\&A mode uses a **Q to Q** strategy (questions matched with questions). This approach is particularly effective because the text in FAQ documents **is often written in natural language with complete grammatical structures**. > The **Q to Q** strategy makes the matching between questions and answers clearer and better supports scenarios with high-frequency or highly similar questions. ![Q\&a Chunk](https://assets-docs.dify.ai/2024/12/70960a237d4f5eaed2dbf46a2cca2bf7.png) When a user asks a question, the system identifies the most similar question and returns the corresponding chunk as the answer. This approach is more precise, as it directly matches the user’s query, helping them retrieve the exact information they need. ![Difference Between Q to P and Q to Q Indexing Method](https://assets-docs.dify.ai/2024/12/8745ccabff56290eae329a9d3592f745.png) Using 10 keywords per chunk for retrieval, no tokens are consumed at the expense of reduced retrieval accuracy. For the retrieved blocks, only the inverted index method is provided to select the most relevant blocks. If the performance of the economical indexing method does not meet your expectations, you can upgrade to the High-Quality indexing method in the Knowledge settings page. ![Economical Mode](https://assets-docs.dify.ai/2024/12/3b86e6b484da39452c164cb6372a7242.png) ## Configure the Retrieval Settings Once the knowledge base receives a user query, it searches existing documents according to preset retrieval methods and extracts highly relevant content chunks. These chunks provide essential context for the LLM, ultimately affecting the accuracy and credibility of its answers. Common retrieval methods include: 1. Semantic Retrieval based on vector similarity—where text chunks and queries are converted into vectors and matched via similarity scoring. 2. Keyword Matching using an inverted index (a standard search engine technique). Both methods are supported in Dify’s knowledge base. Both retrieval methods are supported in Dify’s knowledge base. The specific retrieval options available depend on the chosen indexing method. **High Quality** In the **High-Quality** Indexing Method, Dify offers three retrieval settings: **Vector Search, Full-Text Search, and Hybrid Search**. ![Retrieval Settings](https://assets-docs.dify.ai/2024/12/9b02fc353324221cc91f185a350775b6.png) **Vector Search** **Definition**: Vectorize the user’s question to generate a query vector, then compare it with the corresponding text vectors in the knowledge base to find the nearest chunks. ![Vector Search Settings](https://assets-docs.dify.ai/2024/12/620044faa47a5037f85b32a27a56fce5.png) **Vector Search Settings：** **Rerank Model**: Disabled by default. When enabled, a third-party Rerank model will sort the text chunks returned by Vector Search to optimize results. This helps the LLM access more precise information and improve output quality. Before enabling this option, go to **Settings** → **Model Providers** and configure the Rerank model’s API key. If the selected embedding model is multimodal, select a multimodal rerank model (marked with a **Vision** icon) as well. Otherwise, retrieved images will be excluded from reranking and the retrieval results. > Enabling this feature will consume tokens from the Rerank model. For more details, refer to the associated model’s pricing page. **TopK**: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is **3**, and higher numbers will recall more text chunks. **Score Threshold**: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is **0.5**. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved. > The TopK and Score configurations are only effective during the Rerank phase. Therefore, to apply either of these settings, it is necessary to add and enable a Rerank model. *** **Full-Text Search** **Definition:** Indexing all terms in the document, allowing users to query any terms and return text fragments containing those terms. ![Full-Text Search Settings](https://assets-docs.dify.ai/2024/12/513bff1ca38ec746b3246502b0311b39.png) **Rerank Model**: Disabled by default. When enabled, a third-party Rerank model will sort the text chunks returned by Full-Text Search to optimize results. This helps the LLM access more precise information and improve output quality. Before enabling this option, go to **Settings** → **Model Providers** and configure the Rerank model’s API key. If the selected embedding model is multimodal, select a multimodal rerank model (marked with a **Vision** icon) as well. Otherwise, retrieved images will be excluded from reranking and the retrieval results. > Enabling this feature will consume tokens from the Rerank model. For more details, refer to the associated model’s pricing page. **TopK**: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is **3**, and higher numbers will recall more text chunks. **Score Threshold**: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is **0.5**. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved. > The TopK and Score configurations are only effective during the Rerank phase. Therefore, to apply either of these settings, it is necessary to add and enable a Rerank model. *** **Hybrid Search** **Definition**: This process combines full-text search and vector search, performing both simultaneously. It includes a reordering step to select the best-matching results from both search outcomes based on the user’s query. ![Hybrid Retrieval Setting](https://assets-docs.dify.ai/2024/12/bd2621bfe8a1a8e21fca0743ec495a9e.png) In this mode, you can specify **"Weight settings"** without needing to configure the Rerank model API, or enable **Rerank model** for retrieval. * **Weight Settings** This feature enables users to set custom weights for semantic priority and keyword priority. Keyword search refers to performing a full-text search within the knowledge base, while semantic search involves vector search within the knowledge base. * **Semantic Value of 1** This activates only the semantic search mode. Utilizing embedding models, even if the exact terms from the query do not appear in the knowledge base, the search can delve deeper by calculating vector distances, thus returning relevant content. Additionally, when dealing with multilingual content, semantic search can capture meaning across different languages, providing more accurate cross-language search results. * **Keyword Value of 1** This activates only the keyword search mode. It performs a full match against the input text in the knowledge base, suitable for scenarios where the user knows the exact information or terminology. This approach consumes fewer computational resources and is ideal for quick searches within a large document knowledge base. * **Custom Keyword and Semantic Weights** In addition to enabling only semantic search or keyword search, we provide flexible custom weight settings. You can continuously adjust the weights of the two methods to identify the optimal weight ratio that suits your business scenario. *** **Rerank Model** Disabled by default. When enabled, a third-party Rerank model will sort the text chunks returned by Hybrid Search to optimize results. This helps the LLM access more precise information and improve output quality. Before enabling this option, go to **Settings** → **Model Providers** and configure the Rerank model’s API key. If the selected embedding model is multimodal, select a multimodal rerank model (marked with a **Vision** icon) as well. Otherwise, retrieved images will be excluded from reranking and the retrieval results. > Enabling this feature will consume tokens from the Rerank model. For more details, refer to the associated model’s pricing page. The **"Weight Settings"** and **"Rerank Model"** settings support the following options: **TopK**: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is **3**, and higher numbers will recall more text chunks. **Score Threshold**: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is **0.5**. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved. **Economical** In **Economical Indexing** mode, only the inverted index approach is available. An inverted index is a data structure designed for fast keyword retrieval within documents, commonly used in online search engines. Inverted indexing supports only the **TopK** setting. **TopK:** Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is **3**, and higher numbers will recall more text chunks.

![](https://assets-docs.dify.ai/2025/04/b417cd028131d34779993fbcbb8dbdd7.png)

### Reference After specifying the retrieval settings, you can refer to the following documentation to review how keywords match with content chunks in different scenarios. Learn how to test and cite your knowledge base retrieval # External Knowledge API Source: https://docs.dify.ai/en/use-dify/knowledge/external-knowledge-api API specification that your external knowledge service must implement to integrate with Dify This page defines the API contract your external knowledge service must implement for Dify to retrieve content from it. Once your API is ready, see [Connect to External Knowledge Base](/en/use-dify/knowledge/connect-external-knowledge-base) to register it in Dify. ## Authentication Dify sends the API Key you configured as a Bearer token in every request: ```text theme={null} Authorization: Bearer {API_KEY} ``` You define the authentication logic on your side. Dify only passes the key—it does not validate it. ## Request ```text theme={null} POST {your-endpoint}/retrieval Content-Type: application/json Authorization: Bearer {API_KEY} ``` Dify appends `/retrieval` to the endpoint URL you configured. If you registered `https://your-service.com`, Dify sends requests to `https://your-service.com/retrieval`. ### Body | Property | Required | Type | Description | | :------------------- | :------- | :----- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `knowledge_id` | Yes | string | The identifier of the knowledge source in your external system. This is the value you entered in the **External Knowledge ID** field when connecting. Use it to route queries to the correct knowledge source. | | `query` | Yes | string | The user's search query. | | `retrieval_setting` | Yes | object | Retrieval parameters. See [below](#retrieval_setting). | | `metadata_condition` | No | object | Metadata filtering conditions. See [below](#metadata_condition). | #### `retrieval_setting` | Property | Required | Type | Description | | :---------------- | :------- | :---- | :--------------------------------------------------------------------------------------------- | | `top_k` | Yes | int | Maximum number of results to return. | | `score_threshold` | Yes | float | Minimum similarity score (0-1). When score threshold is disabled in Dify, this value is `0.0`. | #### `metadata_condition` Dify passes metadata conditions to your API but does not currently provide a UI for users to configure them. This parameter is available for programmatic use only. | Property | Required | Type | Description | | :----------------- | :------- | :------------- | :----------------------------- | | `logical_operator` | No | string | `and` or `or`. Default: `and`. | | `conditions` | Yes | array\[object] | List of filter conditions. | Each object in `conditions`: | Property | Required | Type | Description | | :-------------------- | :------- | :-------------------------------- | :-------------------------------------------------------- | | `name` | Yes | string | Metadata field name to filter on. | | `comparison_operator` | Yes | string | Comparison operator. See supported values below. | | `value` | No | string, number, or array\[string] | Comparison value. Omit when using `empty` or `not empty`. | | Operator | Description | | :------------- | :--------------------------------- | | `contains` | Contains a value | | `not contains` | Does not contain a value | | `start with` | Starts with a value | | `end with` | Ends with a value | | `is` | Equals a value | | `is not` | Does not equal a value | | `in` | Matches any value in a list | | `not in` | Does not match any value in a list | | `empty` | Is empty | | `not empty` | Is not empty | | `=` | Equals (numeric) | | `≠` | Not equal (numeric) | | `>` | Greater than | | `<` | Less than | | `≥` | Greater than or equal to | | `≤` | Less than or equal to | | `before` | Before a date | | `after` | After a date | ### Example Request ```json theme={null} { "knowledge_id": "your-knowledge-id", "query": "What is Dify?", "retrieval_setting": { "top_k": 3, "score_threshold": 0.5 } } ``` ## Response Return HTTP 200 with a JSON body containing a `records` array. If no results match the query, return an empty array: `{"records": []}`. ### `records` | Property | Type | Description | | :--------- | :----- | :----------------------------------------------------------------------------- | | `content` | string | The retrieved text chunk. Dify uses this as the context passed to the LLM. | | `score` | float | Similarity score (0–1). Used for score threshold filtering and result ranking. | | `title` | string | Source document title. | | `metadata` | object | Arbitrary key-value pairs preserved by Dify. | Dify does not reject records with missing fields, but omitting `content` or `score` will produce incomplete or unranked results. If you include `metadata` in a record, it must be an object (`{}`), not `null`. A `null` metadata value causes errors in Dify's retrieval pipeline. ### Example Response ```json theme={null} { "records": [ { "content": "This is the document for external knowledge.", "score": 0.98, "title": "knowledge.txt", "metadata": { "path": "s3://dify/knowledge.txt", "description": "dify knowledge document" } }, { "content": "The Innovation Engine for GenAI Applications", "score": 0.66, "title": "introduce.txt", "metadata": {} } ] } ``` ## Error Handling Dify checks the HTTP status code of your response. A non-200 status raises an error that surfaces to the user. You can optionally return structured error information in JSON: | Property | Type | Description | | :----------- | :----- | :------------------------------------------ | | `error_code` | int | An application-level error code you define. | | `error_msg` | string | A human-readable error description. | The following are suggested error codes. These are conventions, not enforced by Dify: | Code | Suggested Usage | | :--- | :---------------------------------- | | 1001 | Invalid Authorization header format | | 1002 | Authorization failed | | 2001 | Knowledge base not found | ### Example Error Response ```json theme={null} { "error_code": 1002, "error_msg": "Authorization failed. Please check your API key." } ``` # Integrate Knowledge within Apps Source: https://docs.dify.ai/en/use-dify/knowledge/integrate-knowledge-within-application ### Creating an Application Integrated with Knowledge Base A **"Knowledge Base"** can be used as an external information source to provide precise answers to user questions via LLM. You can associate an existing knowledge base with any [application type](/en/use-dify/getting-started/key-concepts#dify-app) in Dify. Taking a chat assistant as an example, the process is as follows: 1. Go to **Knowledge -- Create Knowledge -- Upload file** 2. Go to **Studio -- Create Application -- Select Chatbot** 3. Enter **Context**, click **Add**, and select one of the knowledge base created 4. Use **Metadata Filtering** to refine document search in your knowledge base 5. In **Context Settings -- Retrieval Setting**, configure the **Retrieval Setting** 6. Enable **Citation and Attribution** in **Add Features** 7. In **Debug and Preview**, input user questions related to the knowledge base for debugging 8. After debugging, click **Publish** button to make an AI application based on your own knowledge! *** ### Connecting Knowledge and Setting Retrieval Mode In applications that utilize multiple knowledge bases, it is essential to configure the retrieval mode to enhance the precision of retrieved content. To set the retrieval mode for the knowledge bases, navigate to **Context -- Retrieval Settings -- Rerank Setting**. #### Retrieval Setting The retriever scans all knowledge bases linked to the application for text content relevant to the user's question. The results are then consolidated. Below is the technical flowchart for the Multi-path Retrieval mode: ![](https://assets-docs.dify.ai/2025/03/037f48c5c162fb8902600674ab973c29.png) This method simultaneously queries all knowledge bases connected in **"Context"**, seeking relevant text chucks across multiple knowledge bases, collecting all content that aligns with the user's question, and ultimately applying the Rerank strategy to identify the most appropriate content to respond to the user. This retrieval approach offers more comprehensive and accurate results by leveraging multiple knowledge bases simultaneously. ![](https://assets-docs.dify.ai/2024/12/fca4f030e71a857e15a753f508e1b042.png) For instance, in application A, with three knowledge bases K1, K2, and K3. When a user send a question, multiple relevant pieces of content will be retrieved and combined from these knowledge bases. To ensure the most pertinent content is identified, the Rerank strategy is employed to find the content that best relates to the user's query, enhancing the precision and reliability of the results. In practical Q\&A scenarios, the sources of content and retrieval methods for each knowledge base may differ. To manage the mixed content returned from retrieval, the Rerank strategy acts as a refined sorting mechanism. It ensures that the candidate content aligns well with the user's question, optimizing the ranking of results across multiple knowledge bases to identify the most suitable content, thereby improving answer quality and overall user experience. Considering the costs associated with using Rerank and the needs of the business, the multi-path retrieval mode provides two Rerank settings: **Weighted Score** This setting uses internal scoring mechanisms and does not require an external Rerank model, thus **avoiding any additional processing costs**. You can select the most appropriate content matching strategy by adjusting the weight ratio sliders for semantics or keywords. * **Semantic Value of 1** This mode activates semantic retrieval only. By utilizing the Embedding model, the search depth can be enhanced even if the exact words from the query do not appear in the knowledge base, as it calculates vector distances to return the relevant content. Furthermore, when dealing with multilingual content, semantic retrieval can capture meanings across different languages, yielding more accurate cross-language search results. * **Keyword Value of 1** This mode activates keyword retrieval only. It matches the user's input text against the full text of the knowledge base, making it ideal for scenarios where the user knows the exact information or terminology. This method is resource-efficient, making it suitable for quickly retrieving information from large document repositories. * **Custom Keyword and Semantic Weights** In addition to enabling only semantic or keyword retrieval modes, we offer flexible custom Weight Score. You can determine the best weight ratio for your business scenario by continuously adjusting the weights of both. **Rerank Model** The Rerank model is an external scoring system that calculates the similarity score between the user's question and each candidate document provided, improving the results of semantic ranking and returning a list of documents sorted by similarity score from high to low. While this method incurs some additional costs, it is more adept at handling complex knowledge base content, such as content that combines semantic queries and keyword matches, or cases involving multilingual returned content. Dify currently supports multiple Rerank models. To use external Rerank models, you'll need to provide an API Key. Enter the API Key for the Rerank model (such as Cohere, Jina AI, etc.) on the "Model Provider" page. ![Dify Currently Supports Multiple Rerank Models](https://assets-docs.dify.ai/2025/03/2ea86356a57f2ba8a57f9661cae4a305.png) **Adjustable Parameters** * **TopK**: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is **3**, and higher numbers will recall more text chunks. * **Score Threshold**: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is **0.5**. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved. ### Metadata Filtering #### Chatflow/Workflow The **Knowledge Retrieval** node allows you to filter documents using metadata fields. #### Steps 1. Select Filter Mode: * **Disabled (Default):** No metadata filtering. * **Automatic:** Filters auto-configure from query variables in the **Knowledge Retrieval** node. > Note: Automatic Mode requires model selection for document retrieval. ![Model\_Selection](https://assets-docs.dify.ai/2025/03/fe387793ad9923660f9f9470aacff01b.png) * **Manual:** Configure filters manually. ![Manual](https://assets-docs.dify.ai/2025/03/ec6329e265e035e3a0d6941c9313a19d.png) 2. For Manual Mode, follow these steps: 1. Click **Conditions** to open the configuration panel. ![Conditions](https://assets-docs.dify.ai/2025/03/cd80d150f6f5646350b7ac8dfee46429.png) 2. Click **+Add Condition**: * Select metadata fields within your chosen knowledge base from the dropdown list. > Note: When multiple knowledge bases are selected, only common metadata fields are shown in the list. * Use the search box to find specific fields. ![Add\_Condition](https://assets-docs.dify.ai/2025/03/72678c4174f753f306378b748fbe6635.png) 3. Click **+Add Condition** to add more fields. ![Add\_More\_Fields](https://assets-docs.dify.ai/2025/03/aeb518c40aabdf467c9d2c23016d0a16.png) 4. Configure filter conditions: | Field Type | Operator | Description and Examples | | ---------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------- | | String | is | Exact match required. Example: `is "Published"` returns only documents marked exactly as "Published". | | | is not | Excludes exact matches. Example: `is not "Draft"` returns all documents except those marked as "Draft". | | | is empty | Returns documents where the field has no value. | | | is not empty | Returns documents where the field has any value. | | | contains | Matches partial text. Example: `contains "Report"` returns "Monthly Report", "Annual Report", etc. | | | not contains | Excludes documents containing specified text. Example: `not contains "Draft"` returns documents without "Draft" in the field. | | | starts with | Matches text at beginning. Example: `starts with "Doc"` returns "Doc1", "Document", etc. | | | ends with | Matches text at end. Example: `ends with "2024"` returns "Report 2024", "Summary 2024", etc. | | Number | = | Exact number match. Example: `= 10` returns documents marked with exactly 10. | | | ≠ | Excludes specific number. Example: `≠ 5` returns all documents except those marked with 5. | | | > | Greater than. Example: `> 100` returns documents with values above 100. | | | \< | Less than. Example: `< 50` returns documents with values below 50. | | | ≥ | Greater than or equal to. Example: `≥ 20` returns documents with values 20 or higher. | | | ≤ | Less than or equal to. Example: `≤ 200` returns documents with values 200 or lower. | | | is empty | Field has no value assigned. For example, `is empty` returns all documents where this field has no number assigned. | | | is not empty | Field has a value assigned. For example, `is not empty` returns all documents where this field has a number assigned. | | Date | is | Exact date match. Example: `is "2024-01-01"` returns documents dated January 1, 2024. | | | before | Prior to date. Example: `before "2024-01-01"` returns documents dated before January 1, 2024. | | | after | After date. Example: `after "2024-01-01"` returns documents dated after January 1, 2024. | | | is empty | Returns documents with no date value. | | | is not empty | Returns documents with any date value. | 5. Add filter values: * **Variable:** Select from existing **Chatflow/Workflow** variables. ![Variable](https://assets-docs.dify.ai/2025/03/4c2c55ffcf0f72553fabdf23f86597d0.png) * **Constant:** Enter specific values. > Time-type fields can only be filtered by constants The date picker is for time-type fields. ![Date\_Picker](https://assets-docs.dify.ai/2025/03/593da1575ddc995d938bd0cc3847cf3c.png) Filter values are case-sensitive and require exact matches. Example: a filter `starts with "App"` or `contains "App"` will match "Apple" but not "apple" or "APPLE". 6. Set logic operators: * `AND`: Match all conditions * `OR`: Match any condition ![Logic](https://assets-docs.dify.ai/2025/03/822dac015308dc5c01768afc0697c1ad.png) 7. Click outside the panel to save your settings. #### Chatbot Access **Metadata Filtering** below **Knowledge** (bottom-left). Configuration steps are the same as in **Chatflow/Workflow**. ![Chatbot](https://assets-docs.dify.ai/2025/03/9d9a64bde687a686f24fd99d6f193c57.png) ### View Linked Applications in the Knowledge Base On the left side of the knowledge base, you can see all linked Apps. Hover over the circular icon to view the list of all linked apps. Click the jump button on the right to quickly browser them. ![](https://assets-docs.dify.ai/2024/12/28899b9b0eba8996f364fb74e5b94c7f.png) ### Frequently Asked Questions 11. **How should I choose Rerank settings in multi-recall mode?** If users know the exact information or terminology, you can use keyword search for precise matching. In that case, set **"Keywords" to 1** under Weight Settings. If the knowledge base doesn't contain the exact terms or if a cross-lingual query is involved, we recommend setting **"Semantic" to 1** under Weight Settings. If you are familiar with real user queries and want to adjust the ratio of semantics to keywords, they can manually tweak the ratio under **Weight Settings**. If the knowledge base is complex, making simple semantic or keyword matches insufficient—and you need highly accurate answers and are willing to pay more—consider using a **Rerank Model** for content retrieval. 2. **What should I do if I encounter issues finding the "Weight Score" or the requirement to configure a Rerank model?** Here's how the knowledge base retrieval method affects Multi-path Retrieval: ![How the Knowledge Base Retrieval Method Affects Multi-Path Retrieval](https://assets-docs.dify.ai/2025/03/a64394f5df4266c34ed10330d9518946.png) 3. **What should I do if I cannot adjust the "Weight Score" when referencing multiple knowledge bases and an error message appears?** This issue occurs because the embedding models used in the multiple referenced knowledge bases are inconsistent, prompting this notification to avoid conflicts in retrieval content. It is advisable to set and enable the Rerank model in the "Model Provider" or unify the retrieval settings of the knowledge bases. 4. **Why can't I find the "Weight Score" option in multi-recall mode, and only see the Rerank model?** Please check whether your knowledge base is using the "Economical" index mode. If so, switch it to the "High Quality" index mode. # Authorize Data Source Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/authorize-data-source Dify supports connections to various external data sources. To ensure data security and access control, different data sources require appropriate authorization configurations. Dify provides two main authorization methods: **API Key** and **OAuth**. ## Accessing Data Source Authorization In Dify, you can access data source authorization through the following two methods: ### I. Knowledge Pipeline Orchestration When orchestrating a knowledge pipeline, select the data source node that requires authorization. Click **Connect** on the right panel. Knowledge Pipeline Authorization

### II. Settings Click your avatar in the upper right corner and select **Settings**. Navigate to **Data Sources** and find the data source you wish to authorize. Settings Authorization

## Supported Data Source Authorization | Data Source | API Key | OAuth | | ------------ | ------- | ----- | | Notion | ✅ | ✅ | | Jina Reader | ✅ | | | Firecrawl | ✅ | | | Google Drive | | ✅ | | Dropbox | | ✅ | | OneDrive | | ✅ | ## Authorization Processes ### API Key Authorization API Key authorization is a key-based authentication method suitable for enterprise-level services and developer tools. You need to generate API Keys from the corresponding service providers and configure them in Dify. #### Process 1. On the **Data Source** page, navigate to the corresponding data source. Click **Configure** and then **Add API Key**.

2. In the pop-up window, fill in the **Authorization Name** and **API Key**. Click **Save** to complete the setup. API Key Configuration

The API key will be securely encrypted. Once completed, you can start using the data source (e.g., Jina Reader) for knowledge pipeline orchestration. API Key Complete

### OAuth Authorization OAuth is an open standard authorization protocol that allows users to authorize third-party applications to access their resources on specific service providers without exposing passwords. #### Process 1. On the **Data Source** page, select an OAuth-supported data source. Click **Configure** and then **Add OAuth**.

2. Review the permission scope and click **Allow Access**.

#### OAuth Client Settings Dify provides two OAuth client configuration methods: **Default** and **Custom**.

The default client is primarily supported in the SaaS version, using OAuth client parameters that are pre-configured and maintained by Dify. Users can add OAuth credentials with one click without additional configuration. Custom client is supported across all versions of Dify. Users need to register OAuth applications on third-party platforms and obtain client parameters themselves. This is mainly suitable for data sources that don't have default configuration in the SaaS version, or when enterprises have special security compliance requirements. **Process for Custom OAuth** 1. On the **Data Source** page, select an OAuth-supported data source. Click **Configure** and then the **Setting icon** on the right side of **Add OAuth**. Custom OAuth Settings

2. Choose **Custom**, enter the **Client ID** and **Client Secret**. Click **Save and Authorize** to complete the authorization. Custom OAuth Configuration

# Step 1: Create Knowledge Pipeline Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/create-knowledge-pipeline

Navigate to Knowledge at the top, then click Create from Knowledge Pipeline on the left. There're three ways for you to get started. ### Build from Scratch

Click Blank Knowledge Pipeline to build a custom pipeline from scratch. Choose this option when you need custom processing strategies based on specific data source and business requirements. ### Templates Dify offers two types of templates: **Built-in Pipeline** and **Customized**. Both template cards display name of knowledge base, description, and tags (including chunk structure). Create Knowledge Pipeline 4 01

#### Built-in Pipeline Built-in pipelines are official knowledge base templates pre-configured by Dify. These templates are optimized for common document structures and use cases. Simply click **Choose** to get started. Built-in Templates

**Types** | Name | Chunk Structure | Index Method | Retrieval Setting | Description | | ------------------- | ----------------- | ------------ | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | General Mode-ECO | General | Economical | Inverted Index | Divide document content into smaller paragraphs, directly used for matching user queries and retrieval. | | Parent-child-HQ | Parent-Child | High Quality | Hybrid Search | Adopt advanced chunking strategy, dividing document text into larger parent chunks and smaller child chunks. The parent chunks contain child chunks which ensure both retrieval precision and maintain contextual integrity. | | Simple Q\&A | Question & Answer | High Quality | Vector Search | Convert tabular data into question-answer format, using question matching to quickly hit corresponding answer information. | | LLM Generated Q\&A | Question & Answer | High Quality | Vector Search | Generate structured question-answer pairs with large language models based on original text paragraphs. Find relevant answer by using question matching mechanism. | | Convert to Markdown | Parent-child | High Quality | Hybrid Search - Weighted Score | Designed for Office native file formats such as DOCX, XLSX, and PPTX, converting them to Markdown format for better information processing. ⚠️ Note: PDF files are not recommended. | To preview the selected built-in pipeline, click **Details** on any template card. Then, check information in the popup window, including: orchestration structure, pipeline description, and chunk structure. Click **Use this Knowledge Pipeline** for orchestration. Template Details

#### Customized

Customized templates are user-created and published knowledge pipeline. You can choose a template to start, export the DSL, or view detailed information for any template. Template Actions

To create a knowledge base from a template, click **Choose** on the template card. You can also create knowledge base by clicking **Use this Knowledge Pipeline** when previewing a template. Click **More** to edit pipeline information, export pipeline, or delete the template. ### Import Pipeline Import DSL

Import a pipeline of a previously exported knowledge pipeline to quickly reuse existing configurations and modify them for different scenarios or requirements. Navigate to the bottom left of the page and click **Import from a DSL File**. Dify DSL is a YAML-based standard that defines AI application configurations, including model parameters, prompt design, and workflow orchestration. Similar to workflow DSL, knowledge pipeline uses the same YAML format standard to define processing workflows and configurations within a knowledge base. What's in a knowledge pipeline DSL: | Name | Description | | ----------------------- | ------------------------------------------------------------------ | | Data Sources | Local files, websites, online documents, online drive, web crawler | | Data Processing | Document extraction, content chunking, cleaning strategies | | Knowledge Configuration | Indexing methods, retrieval settings, storage parameters | | Node Orchestration | Arrangement and sequence | | User Input Form | Custom parameter fields (if configured) | # Step 2: Orchestrate Knowledge Pipeline Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration Imagine setting up a factory production line where each station (node) performs a specific task, and you connect them to assemble widgets into a final product. This is knowledge pipeline orchestration—a visual workflow builder that allows you to configure data processing sequences through a drag-and-drop interface. It provides control over document ingestion, processing, chunking, indexing, and retrieval strategies. In this section, you'll learn about the knowledge pipeline process, understand different nodes, how to configure them, and customize your own data processing workflows to efficiently manage and optimize your knowledge base. ### Interface Status When entering the knowledge pipeline orchestration canvas, you’ll see: * **Tab Status**: Documents, Retrieval Test, and Settings tabs will be grayed out and unavailable at the moment * **Essential Steps**: You must complete knowledge pipeline orchestration and publishing before uploading files Your starting point depends on the template choice you made previously. If you chose **Blank Knowledge Pipeline**, you'll see a canvas that contains Knowledge Base node only. There'll be a note with guide next to the node that walks you through the general steps of pipeline creation. Blank Pipeline

If you selected a specific pipeline template, there'll be a ready-to-use workflow that you can use or modify on the canvas right away. Template Pipeline

## The Complete Knowledge Pipeline Process Before we get started, let's break down the knowledge pipeline process to understand how your documents are transformed into a searchable knowledge base. The knowledge pipeline includes these key steps: Data Source → Data Processing (Extractor + Chunker) → Knowledge Base Node (Chunk Structure + Retrieval Setting) → User Input Field → Test & Publish 1. **Data Source**: Content from various data sources (local files, Notion, web pages, etc.) 2. **Data Processing**: Process and transform data content * Extractor: Parse and structure document content * Chunker: Split structured content into manageable segments 3. **Knowledge Base**: Set up chunk structure and retrieval settings 4. **User Input Field**: Define parameters that pipeline users need to input for data processing 5. **Test & Publish**: Validate and officially activate the knowledge base *** ## Step 1: Data Source In a knowledge base, you can choose single or multiple data sources. Currently, Dify supports 4 types of data sources: **file upload, online drive, online documents, and web crawler**. Visit the [Dify Marketplace](https://marketplace.dify.ai) for more data sources. ### File Upload Upload local files through drag-and-drop or file selection.

**Configuration Options** | Item | Description | | ------------- | ------------------------------------------------------------------------------------------------- | | File Format | Support PDF, XLSX, DOCX, etc. Users can customize their selection | | Upload Method | Upload local files or folders through drag-and-drop or file selection. Batch upload is supported. | **Limitations** | Item | Description | | ------------- | ------------------------------------------------------------------------------------------------- | | File Quantity | Maximum 50 files per upload | | File Size | Each file must not exceed 15MB | | Storage | Limits on total document uploads and storage space may vary for different SaaS subscription plans | **Output Variables** | Output Variable | Format | | --------------- | --------------- | | `{x} Document` | Single document |

*** ### Online Document #### Notion Integrate with your Notion workspace to seamlessly import pages and databases, always keeping your knowledge base automatically updated.

**Configuration Options** | Item | Option | Output Variable | Description | | --------- | -------- | --------------- | ------------------------------------ | | Extractor | Enabled | `{x} Content` | Structured and processed information | | | Disabled | `{x} Document` | Original text |

*** ### Web Crawler Transform web content into formats that can be easily read by large language models. The knowledge base supports Jina Reader and Firecrawl. #### Jina Reader An open-source web parsing tool providing simple and easy-to-use API services, suitable for fast crawling and processing web content.

**Parameter Configuration** | Parameter | Type | Description | | ---------------- | -------- | ------------------------------------ | | URL | Required | Target webpage address | | Crawl sub-page | Optional | Whether to crawl linked pages | | Use sitemap | Optional | Crawl by using website sitemap | | Limit | Required | Set maximum number of pages to crawl | | Enable Extractor | Optional | Choose data extraction method |

#### Firecrawl An open-source web parsing tool that provides more refined crawling control options and API services. It supports deep crawling of complex website structures, recommended for batch processing and precise control.

**Parameter Configuration** | Parameter | Type | Description | | ------------------------- | -------- | -------------------------------------------------------------------------- | | URL | Required | Target webpage address | | Limit | Required | Set maximum number of pages to crawl | | Crawl sub-page | Optional | Whether to crawl linked pages | | Max depth | Optional | How many levels deep the crawler will traverse from the starting URL | | Exclude paths | Optional | Specify URL patterns that should not be crawled | | Include only paths | Optional | Crawl specified paths only | | Extractor | Optional | Choose data processing method | | Extract Only Main Content | Optional | Isolate and retrieve the primary, meaningful text and media from a webpage |

*** ### Online Drive Connect your online cloud storage services (e.g., Google Drive, Dropbox, OneDrive) and let Dify automatically retrieve your files. Simply select and import the documents you need for processing, without manually downloading and re-uploading files. Need help with authorization? Please check [Authorize Data Source](/en/use-dify/knowledge/knowledge-pipeline/authorize-data-source) for detailed guidance on authorizing different data sources. *** ## Step 2: Set Up Data Processing Tools In this stage, these tools extract, chunk, and transform the content for optimal knowledge base storage and retrieval. Think of this step like meal preparation. We clean raw materials up, chop them into bite-sized pieces, and organize everything, so the dish can be cooked up quickly when someone orders it. To develop a custom data processing plugin that extracts multimodal data for multimodal embedding and retrieval, see [Build Tool Plugins for Multimodal Data Processing in Knowledge Pipelines](/en/develop-plugin/dev-guides-and-walkthroughs/develop-multimodal-data-processing-tool). ### Doc Processor Documents come in different formats - PDF, XLSX, DOCX. However, LLM can't read these files directly. That's where extractors come in. They support multiple formats and handle the conversion, so your content is ready for the next step of the LLMs. You can choose Dify's Doc Extractor to process files, or select tools based on your needs from Marketplace which offers Dify Extractor and third-party tools such as Unstructured. Images in documents can be extracted using appropriate document processors. Extracted images are attached to their corresponding chunks, can be managed independently, and are returned alongside those chunks during retrieval. URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won't affect the extracted images. Each chunk supports up to 10 image attachments; images beyond this limit will not be extracted. If no images are extracted by the selected processor, Dify will automatically extract JPG, JPEG, PNG, and GIF images under 2 MB that are referenced via accessible URLs using the following Markdown syntax: * `![alt text](image_url)` * `![alt text](image_url "optional title")` For self-hosted deployments, you can adjust these limits via environment variables: * Maximum image size: `ATTACHMENT_IMAGE_FILE_SIZE_LIMIT` * Maximum number of attachments per chunk: `SINGLE_CHUNK_ATTACHMENT_LIMIT` If you select a multimodal embedding model (marked with a **Vision** icon) in index settings, the extracted images will also be embedded and indexed for retrieval. #### Doc Extractor Knowledge Pipeline Orchestration 4 01

As an information processing center, document extractor node identifies and reads files from input variables, extracts information, and finally converts them into a format that works with the next node. For more information, please refer to the [Document Extractor](/en/use-dify/nodes/doc-extractor). #### Dify Extractor Dify Extractor is a built-in document parser presented by Dify. It supports multiple common file formats and is specially optimized for Doc files. It can extract and store images from documents and return image URLs.

#### Unstructured

[Unstructured](https://marketplace.dify.ai/plugin/langgenius/unstructured) transforms documents into structured, machine-readable formats with highly customizable processing strategies. It offers multiple extraction strategies (auto, hi\_res, fast, OCR-only) and chunking methods (by\_title, by\_page, by\_similarity) to handle diverse document types, offering detailed element-level metadata including coordinates, confidence scores, and layout information. It's recommended for enterprise document workflows, processing of mixed file types, and cases that require precise control over document processing parameters.

Explore more tools in the [Dify Marketplace](https://marketplace.dify.ai). *** ### Chunker Similar to human limited attention span, large language models cannot process huge amount of information simultaneously. Therefore, after information extraction, the chunker splits large document content into smaller and manageable segments (called "chunks"). Different documents require different chunking strategies. A product manual works best when split by product features, while research papers should be divided by logical sections. Dify offers 3 types of chunkers for various document types and use cases. #### Overview of Different Chunkers | Chunker Type | Highlights | Best for | | -------------------- | ----------------------------------------------------- | ----------------------------------------------------- | | General Chunker | Fixed-size chunks with customizable delimiters | Simple documents with basic structure | | Parent-child Chunker | Dual-layer structure: precise matching + rich context | Complex documents requiring rich context preservation | | Q\&A Processor | Processes question-answer pairs from spreadsheets | Structured Q\&A data from CSV/Excel files | #### Common Text Pre-processing Rules All chunkers support these text cleaning options: | Preprocessing Option | Description | | --------------------------------------------- | ---------------------------------------------------------------------------------- | | Replace consecutive spaces, newlines and tabs | Clean up formatting by replacing multiple whitespace characters with single spaces | | Remove all URLs and email addresses | Automatically detect and remove web links and email addresses from text | #### General Chunker Basic document chunking processing, suitable for documents with relatively simple structures. You can configure text chunking and text preprocessing rules according to the following configuration. **Input and Output Variable** | Type | Variable | Description | | --------------- | ------------------ | --------------------------------------------------------------------------- | | Input Variable | `{x} Content` | Complete document content that the chunker will split into smaller segments | | Output Variable | `{x} Array[Chunk]` | Array of chunked content, each segment optimized for retrieval and analysis | **Chunk Settings** | Configuration Item | Description | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Delimiter | Default value is `\n` (line breaks for paragraph segmentation). You can customize chunking rules following regex. The system will automatically execute segmentation when the delimiter appears in text. | | Maximum Chunk Length | Specifies the maximum character limit within a segment. When this length is exceeded, forced segmentation will occur. | | Chunk Overlap | When segmenting data, there is some overlap between segments. This overlap helps improve information retention and analysis accuracy, enhancing recall effectiveness. | #### Parent-child Chunker By using a dual-layer segmentation structure to resolve the contradiction between context and accuracy, parent-child clunker achieves the balance between precise matching and comprehensive contextual information in Retrieval Augmented Generation (RAG) systems. **How Parent-child Chunker Works** Child Chunks for query matching: Small, precise information segments (usually single sentences) to match user queries with high accuracy. Parent Chunks provide rich context: Larger content blocks (paragraphs, sections, or entire documents) that contain the matching child chunks, giving the large language model (LLM) comprehensive background information. | Type | Variable | Description | | --------------- | ------------------------ | --------------------------------------------------------------------------- | | Input Variable | `{x} Content` | Complete document content that the chunker will split into smaller segments | | Output Variable | `{x} Array[ParentChunk]` | Array of parent chunks | **Chunk Settings** | Configuration Item | Description | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | Parent Delimiter | Set delimiter for parent chunk splitting | | Parent Maximum Chunk Length | Set maximum character count for parent chunks | | Child Delimiter | Set delimiter for child chunk splitting | | Child Maximum Chunk Length | Set maximum character count for child chunks | | Parent Mode | Choose between Paragraph (split text into paragraphs) or "Full Document" (use entire document as parent chunk) for direct retrieval | #### Q\&A Processor Combining extraction and chunking in one node, Q\&A Processor is specifically designed for structured Q\&A datasets from CSV and Excel files. Perfect for FAQ lists, shift schedules, and any spreadsheet data with clear question-answer pairs. **Input and Output Variable** | Type | Variable | Description | | --------------- | -------------------- | ------------- | | Input Variable | `{x} Document` | A single file | | Output Variable | `{x} Array[QAChunk]` | QA chunk | **Variable Configuration** | Configuration Item | Description | | -------------------------- | ------------------------------ | | Column Number for Question | Set content column as question | | Column Number for Answer | Set column answer as answer | *** ## Step 3: Configure Knowledge Base Node Now that your documents are processed and chunked, it's time to set up how they'll be stored and retrieved. Here, you can select different indexing methods and retrieval strategies based on your specific needs. Knowledge base node configuration includes: Input Variable, Chunk Structure, Index Method, and Retrieval Settings. ### Chunk Structure

Chunk structure determines how the knowledge base organizes and indexes your document content. Choose the structure mode that best fits your document type, use case, and cost. The knowledge base supports three chunk modes: **General Mode, Parent-child Mode, and Q\&A Mode**. If you're creating a knowledge base for the first time, we recommend choosing Parent-child Mode. **Important Reminder**: Chunk structure cannot be modified once saved and published. Please choose carefully. #### General Mode Suitable for most standard document processing scenarios. It provides flexible indexing options—you can choose appropriate indexing methods based on different quality and cost requirements. General mode supports both high-quality and economical indexing methods, as well as various retrieval settings. #### Parent-child Mode It provides precise matching and corresponding contextual information during retrieval, suitable for professional documents that need to maintain complete context. Parent-child mode supports HQ (High Quality) mode only, offering child chunks for query matching and parent chunks for contextual information during retrieval. #### Q\&A Mode Create documents that pair questions with answers when using structured question-answer data. These documents are indexed based on the question portion, enabling the system to retrieve relevant answers based on query similarity. Q\&A Mode supports HQ (High Quality) mode only. ### Input Variable Input variables receive processing results from data processing nodes as the data source for knowledge base. You need to connect the output from chunker to the knowledge base as input. The node supports different types of standard inputs based on the selected chunk structure: * **General Mode**: x Array\[Chunk] - General chunk array * **Parent-child Mode**: x Array\[ParentChunk] - Parent chunk array * **Q\&A Mode**: x Array\[QAChunk] - Q\&A chunk array ### Index Method & Retrieval Settings The index method determines how your knowledge base builds content indexes, while retrieval settings provide corresponding retrieval strategies based on the selected index method. Think of it in this way: the index method determines how to organize your documents, while retrieval settings tell users what methods they can use to find documents. The knowledge base provides two index methods: **High Quality** and **Economical**, each offering different retrieval setting options. The High Quality method uses embedding models to convert chunks into numerical vectors, helping to compress and store large amounts of information more effectively. This enables the system to find semantically relevant accurate answers even when the user's question wording doesn't exactly match the document. To enable cross-modal retrieval—retrieving both text and images based on semantic relevance—select a multimodal embedding model (marked with a **Vision** icon). Images extracted from documents will then be embedded and indexed for retrieval. Knowledge bases using such embedding models are labeled **Multimodal** on their cards. Multimodal Knowledge Base

In the Economical method, each block uses 10 keywords for retrieval without calling embedding models, generating no costs. For more details, see [Specify the Index Method and Retrieval Settings](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods). | Index Method | Available Retrieval Settings | Description | | ------------ | ---------------------------- | ----------------------------------------------------------------------- | | High Quality | Vector Retrieval | Understand deeper meaning of queries based on semantic similarity | | | Full-text Retrieval | Keyword-based retrieval providing comprehensive search capabilities | | | Hybrid Retrieval | Combine both semantic and keywords | | Economical | Inverted Index | Common search engine retrieval method, matches queries with key content | If the selected embedding model is multimodal, select a multimodal rerank model (marked with a **Vision** icon) as well. Otherwise, retrieved images will be excluded from reranking and the retrieval results. You can also refer to the table below for information on configuring chunk structure, index methods, parameters, and retrieval settings. | Chunk Structure | Index Methods | Parameters | Retrieval Settings | | ----------------- | -------------------------------------------- | ------------------------------------------------------- | ----------------------------------------------------------------------------------------- | | General mode | High Quality

Economical | Embedding Model

Number of Keywords | Vector Retrieval
Full-text Retrieval
Hybrid Retrieval
Inverted Index | | Parent-child Mode | High Quality Only | Embedding Model | Vector Retrieval
Full-text Retrieval
Hybrid Retrieval | | Q\&A Mode | High Quality Only | Embedding Model | Vector Retrieval
Full-text Retrieval
Hybrid Retrieval | ### Summary Auto-Gen Available for self-hosted deployments only. Automatically generate summaries for all chunks to enhance their retrievability. Summaries are embedded and indexed for retrieval as well. When a summary matches a query, its corresponding chunk is also returned. You can manually edit auto-generated summaries or regenerate them for specific documents later. See [Manage Knowledge Content](/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents) for details. If you select a vision-capable LLM, summaries will be generated based on both the chunk text and any attached images. *** ## Step 4: Create User Input Form User input forms are essential for collecting the initial information your pipeline needs to run effectively. Similar to [the User Input node](/en/use-dify/nodes/user-input) in workflow, this form gathers necessary details from users - such as files to upload, specific parameters for document processing - ensuring your pipeline has all the information it needs to deliver accurate results. This way, you can create specialized input forms for different use scenarios, improving pipeline flexibility and usability for various data sources or document processing steps. ### Create User Input Form There're two ways to create user input field: 1. **Pipeline Orchestration Interface**\ Click on the **Input field** to start creating and configuring input forms.\\ 2. **Node Parameter Panel**\ Select a node. Then, in parameter input on the right-side panel, click + Create user input for new input items. New input items will also be collected in the Input Field. !\[Node Parameter Panel]\(/images/use-dify/knowledge/knowledge-pipeline-orchestration-10.png) ### Add User Input Fields #### Unique Inputs for Each Entrance Knowledge Pipeline Orchestration 11

These inputs are specific to each data source and its downstream nodes. Users only need to fill out these fields when selecting the corresponding data source, such as different URLs for different data sources. **How to create**: Click the `+` button on the right side of a data source to add fields for that specific data source. These fields can only be referenced by that data source and its subsequently connected nodes. These Inputs Are Specific to Each Data Source and Its Downstream Nodes

#### Global Inputs for All Entrances Knowledge Pipeline Orchestration 13

Global shared inputs can be referenced by all nodes. These inputs are suitable for universal processing parameters, such as delimiters, maximum chunk length, document processing configurations, etc. Users need to fill out these fields regardless of which data source they choose. **How to create**: Click the `+` button on the right side of Global Inputs to add fields that can be referenced by any node. ### Supported Input Field Types The knowledge pipeline supports seven types of input variables:

| Field Type | Description | | ---------- | --------------------------------------------------------------------------------------------------- | | Text | Short text input by knowledge base users, maximum length 256 characters | | Paragraph | Long text input for longer character strings | | Select | Fixed options preset by the orchestrator for users to choose from, users cannot add custom content | | Boolean | Only true/false values | | Number | Only accepts numerical input | | Single | Upload a single file, supports multiple file types (documents, images, audio, and other file types) | | File List | Batch file upload, supports multiple file types (documents, images, audio, and other file types) |

For more information about supported field types, see [User Input](/en/use-dify/nodes/user-input). ### Field Configuration Options All input field types include: required, optional, and additional settings. You can set whether fields are required by checking the appropriate option. | Setting | Name | Description | Example | | ------------------------- | ------------- | ----------------------------------------------------------------------- | -------------------------------------------------------- | | Required Settings | Variable Name | Internal system identifier, usually named using English and underscores | `user_email` | | | Display Name | Interface display name, usually concise and readable text | User Email | | Type-specific Settings | | Special requirements for different field types | Text field max length 100 characters | | Additional Settings | Default Value | Default value when user hasn't provided input | Number field defaults to 0, text field defaults to empty | | | Placeholder | Hint text displayed when input box is empty | "Please enter your email" | | | Tooltip | Explanatory text to guide user input, usually displayed on mouse hover | "Please enter a valid email address" | | Special Optional Settings | | Additional setting options based on different field types | Validation of email format | After completing configuration, click the preview button in the upper right corner to browse the form preview interface. You can drag and adjust field groupings. If an exclamation mark appears, it indicates that the reference is invalid after moving.

*** ## Step 5: Name the Knowledge Base Name Knowledge Base

By default, the knowledge base name will be "Untitled + number", permissions are set to "Only me", and the icon will be an orange book. If you import it from a DSL file, it will use the saved icon. Edit knowledge base information by clicking **Settings** in the left panel and fill in the information below: * **Name & Icon**\ Pick a name for your knowledge base.\ Choose an emoji, upload an image, or paste an image URL as the icon of this knowledge base. * **Knowledge Description**\ Provide a brief description of your knowledge base. This helps the AI better understand and retrieve your data. If left empty, Dify will apply the default retrieval strategy. * **Permissions**\ Select the appropriate access permissions from the dropdown menu. *** ## Step 6: Testing You're almost there! This is the final step of the knowledge pipeline orchestration. After completing the orchestration, you need to validate all the configuration first. Then, do some running tests and confirm all the settings. Finally, publish the knowledge pipeline. ### Configuration Completeness Check Before testing, it's recommended to check the completeness of your configuration to avoid test failures due to missing configurations. Click the checklist button in the upper right corner, and the system will display any missing parts. Knowledge Pipeline Orchestration 16

After completing all configurations, you can preview the knowledge base pipeline's operation through test runs, confirm that all settings are accurate, and then proceed with publishing. ### Test Run Knowledge Pipeline Orchestration 17

1. **Start Test**: Click the "Test Run" button in the upper right corner 2. **Import Test File**: Import files in the data source window that pops up on the right **Important Note**: For better debugging and observation, only one file upload is allowed per test run. 3. **Fill Parameters**: After successful import, fill in corresponding parameters according to the user input form you configured earlier 4. **Start Test Run**: Click next step to start testing the entire pipeline During testing, you can access [History Logs](/en/use-dify/monitor/logs) (track all run records with timestamps, execution status, and input/output summaries) and [Variable Inspector](/en/use-dify/debug/variable-inspect) (a dashboard at the bottom showing input/output data for each node to help identify issues and verify data flow) for efficient troubleshooting and error fixing. Testing Tools

# Step 5: Manage and Use Knowledge Base Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/manage-knowledge-base After creating your knowledge base, continuous management and optimization will provide accurate contextual information for your applications. These are the options for follow-up maintenance. ### Knowledge Pipeline View and modify your orchestrated pipeline nodes and configurations. Find more information in [Manage Knowledge](/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents). Knowledge Management

# Step 3: Publish Knowledge Pipeline Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/publish-knowledge-pipeline After completing pipeline orchestration and debugging, click **Publish** and **Confirm** in the pop-up window.

Important reminder: Once published, the chunk structure cannot be modified.

Once it is published, you can:

**Add Documents (Go to add documents)**\ Click this option to jump to the knowledge base data source selection interface, where you can directly upload documents.

**Access API (Access API Reference)**\ Go to the API documentation page where you can get the knowledge base API calling methods and instructions.

**Publish as a Knowledge Pipeline**\ you can optionally use **Publish as a Knowledge Pipeline** to save it as a reusable template that will appear in the Customized section for future use.

Limitation: **Publish as a Knowledge Pipeline** is not available in Sandbox plan. To save and publish as a knowledge pipeline, please [upgrade](https://dify.ai/pricing) to professional or team plans. # Create Knowledge from a Knowledge Pipeline Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/readme A knowledge pipeline is a document processing workflow that transforms raw data into searchable knowledge bases. Think of orchestrating a workflow, now you can visually combine and configure different processing nodes and tools to optimize data processing for better accuracy and relevance. Every knowledge pipeline normally follows a structured flow through four key steps: **Data Sources → Data Extraction → Data Processing → Knowledge Storage** Each step serves a specific purpose: gathering content from various sources, converting it to processable text, refining it for search, and storing it in a format that enables fast, accurate retrieval. Dify provides built-in pipeline templates that is optimized for certain use cases, or you can also create knowledge pipelines from scratch. In this session, we will go through creating options, general process of building knowledge pipelines, and how to manage it. Start from built-in templates, blank knowledge pipeline or import existing pipeline. Get to know how the knowledge pipeline works, orchestrate different nodes and make sure it’s ready to use. Let's make it ready for document processing. Add documents and process them into the searchable knowledge base. Maintain documents, test retrieval, modify settings, and more. # Step 4: Upload Files Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/upload-files After publishing knowledge pipeline, there're two ways to upload files as below: A: Click **Go to Documents** in the success notification to add or manage documents. After entering Documents page, click **Add File** to upload.

B: Click **Go to Add Documents** to add documents.

### Upload Process 1. **Select Data Source**\ Choose from the data source types configured in your pipeline. Dify currently supports 4 types of data sources: File Upload (pdf, docx, etc.), Online Drive (Google Drive, OneDrive, etc.), Online Doc (Notion), and Web Crawler (Jina Reader, Firecrawl). Please visit [Dify Marketplace](https://marketplace.dify.ai/) to install additional data sources. 2. **Fill in Processing Parameters and Preview**\ If you configured user input fields during pipeline orchestration, users will need to fill in the required parameters and variables at this step. After completing the form, click **Preview** to see chunking results. Click **Save & Process** to complete knowledge base creation and start data processing. Important reminder: Chunk structure remains consistent with the pipeline configuration and won't change with user input parameters. Parameter Input

3. **Process Documents**\ Track the progress of document processing. After embedding is completed, click **Go to Document**. Processing Progress

4. **Access Documents List**\ Click **Go to Documents** to view the Documents page, where you can browse all uploaded file, processing status, etc.

# Knowledge Request Rate Limit Source: https://docs.dify.ai/en/use-dify/knowledge/knowledge-request-rate-limit ## What is Knowledge Request Rate Limit? On Dify Cloud, the knowledge request rate limit refers to the maximum number of actions that a workspace can perform in the knowledge base within one minute. These actions include creating datasets, managing documents, and running queries in apps or workflows. ## Limitations of Different Subscription Versions Knowledge Request Rate Limit will vary by subscription level: * **Sandbox**: 10/min * **Professional**: 100/min * **Team**: 1,000/min For example, if a Sandbox user performs 10 hit tests within one minute, their workspace will be temporarily unable to perform the restricted actions during the following minute. ## Which Actions will be Limited by Knowledge Request Rate Limit when I Perform them? When you perform the following actions, you will be limited by the frequency of Knowledge Base requests: 1. Creating empty datasets 2. Deleting datasets 3. Updating dataset settings 4. Uploading documents 5. Deleting documents 6. Updating documents 7. Disabling documents 8. Enabling documents 9. Archiving documents 10. Restoring archived documents 11. Pausing document processing 12. Resuming document processing 13. Adding segments 14. Deleting segments 15. Updating segments 16. Bulk importing segments 17. Performing hit tests 18. Querying the knowledge in apps or workflows (*Note: Multi-path recalls count as a single request.*) # Manage Knowledge Settings Source: https://docs.dify.ai/en/use-dify/knowledge/manage-knowledge/introduction Only the workspace owner, administrators, and editors can modify the knowledge base settings. In a knowledge base, click the **Settings** icon in the left sidebar to enter its settings page. | Settings | Description | | :----------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Name & Icon | Identifies the knowledge base. | | Description | Indicates the knowledge base's purpose and content. | | Permissions | Defines which workspace members can access the knowledge base.Members granted access to a knowledge base have all the permissions listed in [Manage Knowledge Content](/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents). | | Index Method | Defines how document chunks are processed and organized for retrieval. For more details, see [Select the Index Method](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods#select-the-index-method). | | Embedding Model | Specifies the embedding model used to convert document chunks into vector representations.Changing the embedding model will re-embed all chunks. | | Summary Auto-Gen | Automatically generate summaries for document chunks.Once enabled, this only applies to newly added documents and chunks. For existing chunks, select the document(s) in the document list and click **Generate summary**. | | Retrieval Settings | Defines how the knowledge base retrieves relevant content. For more details, see [Configure the Retrieval Settings](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods#configure-the-retrieval-settings). | # Manage Knowledge via API Source: https://docs.dify.ai/en/use-dify/knowledge/manage-knowledge/maintain-dataset-via-api Manage your knowledge bases programmatically using the Dify Knowledge Base API Dify provides a complete set of Knowledge Base APIs that let you manage knowledge bases, documents, and chunks programmatically. This is useful for automating data synchronization or integrating knowledge base operations into CI/CD pipelines. API access is enabled by default when you create a knowledge base. To start making API calls, all you need is your API credentials: an endpoint and a key. A single Knowledge Base API key has access to **all visible knowledge bases** under the same account. Handle your credentials carefully to avoid unintended data exposure. ## Get Your API Endpoint and Key Navigate to **Knowledge** in Dify. In the top-right corner, click **Service API** to open the API configuration panel. From here you can: * Get the Service API endpoint. This is the base URL for all Knowledge Base API requests. * Click **API Key** to create new keys and manage existing ones. Store your API key securely on the server side. Never expose it in client-side code or public repositories. ## Manage API Access for a Knowledge Base Every knowledge base is accessible via the Service API by default. If you want to restrict API access to a specific knowledge base, open it, then click **API Access** in the bottom-left corner and toggle it off. ## API Reference See the [Knowledge Base API Reference](https://docs.dify.ai/api-reference/knowledge-bases/list-knowledge-bases) for the complete list of endpoints, request/response schemas, error codes, and interactive examples. # Manage Knowledge Content Source: https://docs.dify.ai/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents ## Manage Documents In a knowledge base, each imported item—whether a local file, a Notion page, or a web page—becomes a document. From the document list, you can view and manage all these documents to keep your knowledge accurate, relevant, and up-to-date. Click the knowledge base name at the top to quickly switch between knowledge bases. Manage Knowledge Documents

| Action | Description | | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Add | Import a new document. | | Modify Chunk Settings | Modify a document's chunking settings (excluding the chunk structure).Each document can have its own chunking settings, while the chunk structure is shared across the knowledge base and cannot be changed once set. | | Delete | Permanently remove a document. **Deletion cannot be undone**. | | Enable / Disable | Temporarily include or exclude a document from retrieval. On Dify Cloud, documents that have not been updated or retrieved for a certain period are automatically disabled to optimize performance.

The inactivity period varies by subscription plan:

Sandbox: 7 days
Professional & Team: 30 days

For Professional and Team plans, these documents can be re-enabled **with one click**. | | Generate Summary | Automatically generates summaries for all chunks in a document. Only available for self-hosted deployments when **Summary Auto-Gen** is enabled.Existing summaries will be overwritten. | | Archive / Unarchive | Archive a document that you no longer need for retrieval but still want to keep. Archived documents are read-only and can be unarchived at any time. | | Edit | Modify the content of a document by editing its chunks. See [Manage Chunks](#manage-chunks) for details. | | Rename | Change the name of a document. | ## Manage Chunks According to its chunk settings, every document is split into content chunks—the basic units for retrieval. From the chunk list within a document, you can view and manage all its chunks to improve the retrieval efficiency and accuracy. Click the document name in the upper-left corner to quickly switch between documents. Manage Knowledge Chunks

| Action | Description | | :----------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Add | Add one or batch add multiple new chunks.

For documents chunked with Parent-child mode, both new parent and child chunks can be added. *Add chunks* is a paid feature on Dify Cloud. [Upgrade to Professional or Team](https://dify.ai/pricing) to use it. | | Delete | Permanently remove a chunk. **Deletion cannot be undone**. | | Enable / Disable | Temporarily include or exclude a chunk from retrieval. Disabled chunks cannot be edited. | | Edit | Modify the content of a chunk. Edited chunks are marked **Edited**.

For knowledge bases using the Parent-child chunk mode:

When editing a parent chunk, you can choose to regenerate its child chunks or keep them unchanged.
Editing a child chunk does not update its parent chunk.

| | Add / Edit / Delete Keywords | Add or modify keywords (up to 10) for a chunk to improve its retrievability. Only available for knowledge bases using the Economical index method. | | Add / Delete Image Attachments | Remove images extracted from documents or upload new ones within their corresponding chunk.

URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won't affect the extracted images. Each chunk can have up to 10 image attachments, which are returned alongside it during retrieval; images beyond this limit will not be extracted.

For self-hosted deployments, you can adjust this limit via the environment variable `SINGLE_CHUNK_ATTACHMENT_LIMIT`.If you select a multimodal embedding model (marked with a **Vision** icon), the extracted images will also be embedded and indexed for retrieval. | | Add / Edit / Delete Summary | Add, modify, or remove a summary for a chunk.

Summaries are embedded and indexed for retrieval as well. When a summary matches a query, its corresponding chunk is also returned.Add identical summaries to multiple chunks to enable grouped retrieval, allowing related chunks to be returned together (subject to the Top K limit). | ## Best Practices ### Check Chunk Quality After a document is chunked, carefully review each chunk to ensure it's semantically complete and appropriately sized for optimal retrieval accuracy and response relevance. Common issues to watch for: * Chunks are **too short**—may lack sufficient context, leading to semantic loss and inaccurate answers. * Chunks are **too long**—may include irrelevant information, introducing semantic noise and lowering retrieval precision. * Chunks are **semantically incomplete**—caused by forced chunking that cuts through sentences or paragraphs, resulting in missing or misleading content during retrieval. ### Use Child Chunks as Retrieval Hooks for Parent Chunks For documents chunked with Parent-child mode, the system searches across child chunks but returns the parent chunks. Since editing a child chunk does not update its parent, you can treat child chunks as semantic tags or retrieval hints for their parent chunks. To do this, rewrite child chunks into **keywords**, **summaries**, or **common user queries**. For example, if a parent chunk covers technical "LED Status Indicators", you could rephrase its child chunks as: * *blinking light, won't turn on, red light, connection error, frozen* (keywords) * *Guide to interpreting LED colors and troubleshooting hardware power or pairing issues* (summaries) * *What does a solid red light mean?* (queries) ### Use Summaries to Bridge Query-Content Gaps While high-quality indexing enables semantic search, raw chunks can still be hard to retrieve when they are too specific, noisy, or structurally complex to align well with user queries. Summaries bridge this gap by providing a condensed semantic layer that makes the chunk's core intent explicit. Use summaries when: * **User queries differ from document language**: For technical documentation written formally, add summaries in the way users actually ask questions. * **Concepts are implicit or buried in details**: Add high-level summaries that surface the core concepts and intent, so the chunk can be matched without relying on small details scattered across the text. * **Raw text is non-textual**: When a chunk is primarily code, tables, logs, transcripts, or otherwise hard to match semantically, add descriptive summaries that clearly label what the chunk contains. * **Related chunks should be retrieved together**: Apply identical summaries to a series of related chunks to enable grouped retrieval. This semantic glue allows multiple parts of a topic to be retrieved together, providing richer context. The number of returned related chunks is subject to the Top K limit defined in the retrieval settings. # Manage Document Metadata Source: https://docs.dify.ai/en/use-dify/knowledge/metadata ## What is Metadata? ### Overview Metadata is information that describes your data - essentially "data about data". Just as a book has a table of contents to help you understand its structure, metadata provides context about your data's content, origin, purpose, etc., making it easier for you to find and manage information in your knowledge base. This guide aims to help you understand metadata and effectively manage your knowledge base. ### Core Concepts * **Field:** The label of a metadata field (e.g., "author", "language"). * **Value:** The information stored in a metadata field (e.g., "Jack", "English"). ![field\_name\_and\_value](https://assets-docs.dify.ai/2025/03/b6a197aa21ab92db93869fcbfa156b62.png) * **Value Count:** The number of values contained in a metadata field，including duplicates. (e.g., "3"). ![metadata\_field](https://assets-docs.dify.ai/2025/03/330f26e90438cf50167c4cb6ce30e458.png) * **Value Type:** The type of value a field can contain. * Dify supports three value types: * String: For text-based information * Number: For numerical data * Time: For dates/timestamps ![value\_type](https://assets-docs.dify.ai/2025/03/f6adc7418869334805361535c8cd6874.png) ## How to Manage My Metadata? ### Manage Metadata Fields in the Knowledge Base You can create, modify, and delete metadata fields in the knowledge base. > Any changes you make to metadata fields here affect your knowledge base globally. #### Get Started with the Metadata Panel **Access the Metadata Panel** To access the Metadata Panel, go to **Knowledge Base** page and click **Metadata**. ![Metadata\_Entrance](https://assets-docs.dify.ai/2025/03/bd43305d49cc1511683b4a098c8f6e5a.png) ![Metadata\_Panel](https://assets-docs.dify.ai/2025/03/6000c85b5d2e29a2a5af5e0a047a7a59.png) **Built-in vs Custom Metadata**

	Built-in Metadata	Custom Metadata
Location	Lower section of the Metadata panel	Upper section of the Metadata panel
Activation	Disabled by default; requires manual activation	Add as needed
Generation	System automatically extracts and generates field values	User-defined and manually added
Editing	Fields and values cannot be modified once generated	Fields and values can be edited or deleted
Scope	Applies to all existing and new documents when enabled	Stored in metadata list; requires manual assignment to documents
Fields	System-defined fields include: document\_name (string) uploader (string) upload\_date (time) last\_update\_date (time) source (string)	No default fields; all fields must be manually created
Value Types	String: For text values Number: For numerical values Time: For dates and timestamps	String: For text values Number: For numerical values Time: For dates and timestamps

#### Create New Metadata Fields To create a new metadata field: 1. Click **+Add Metadata** to open the **New Metadata** dialog. ![New\_Metadata](https://assets-docs.dify.ai/2025/03/5086db42c40be64e54926b645c38c9a0.png) 2. Choose the value type. 3. Name the field. > Naming rules: Use lowercase letters, numbers, and underscores only. ![value\_type](https://assets-docs.dify.ai/2025/03/f6adc7418869334805361535c8cd6874.png) 4. Click **Save** to apply changes. ![Save\_Field](https://assets-docs.dify.ai/2025/03/f44114cc58d4ba11ba60adb2d04c9b4c.png) #### Edit Metadata Fields To edit a metadata field: 1. Click the edit icon next to a field to open the **Rename** dialog. ![Rename\_Field](https://assets-docs.dify.ai/2025/03/94327185cbe366bf99221abf2f5ef55a.png) 2. Enter the new name in the **Name** field. > Note: You can only modify the field name, not the value type. ![rename\_field\_2](https://assets-docs.dify.ai/2025/03/2f814f725df9aeb1a0048e51d736d969.png) 3. Click **Save** to apply changes. > Note: Field changes update across all related documents in your knowledge base. ![Same\_Renamed\_Field](https://assets-docs.dify.ai/2025/03/022e42c170b40c35622b9b156c8cc159.png) #### Delete Metadata Fields To delete a metadata field, click the delete icon next to a field to delete it. > Note: Deleting a field deletes it and all its values from all documents in your knowledge base. ![Delete\_Field](https://assets-docs.dify.ai/2025/03/022e42c170b40c35622b9b156c8cc159.png) ### Edit Metadata #### Bulk Edit Metadata in the Metadata Editor You can edit metadata in bulk in the knowledge base. **Access the Metadata Editor** To access the Metadata Editor: 1. In the knowledge base, select documents using the checkboxes on the left. ![Edit\_Metadata\_Entrance](https://assets-docs.dify.ai/2025/03/18b0c435604db6173acba41662474446.png) 2. Click **Metadata** in the bottom action bar to open the Metadata Editor. ![Edit\_Metadata](https://assets-docs.dify.ai/2025/03/719f3c31498f23747fed7d7349fd64ba.png) **Bulk Add Metadata** To add metadata in bulk: 1. Click **+Add Metadata** in the editor to: ![add\_metadata](https://assets-docs.dify.ai/2025/03/d4e4f87447c3e445d5b7507df1126c7b.png) * Add existing fields from the dropdown or from the search box. ![existing\_field](https://assets-docs.dify.ai/2025/03/ea9aab2c4071bf2ec75409b05725ac1f.png) * Create new fields via **+New Metadata**. > New fields are automatically added to the knowledge base. ![new\_metadata\_field](https://assets-docs.dify.ai/2025/03/e32211f56421f61b788943ba40c6959e.png) * Access the Metadata Panel to manage metadata fields via **Manage**. ![manage\_field](https://assets-docs.dify.ai/2025/03/82561edeb747b100c5295483c6238ffa.png) 2. *(Optional)* Enter values for new fields. ![value\_for\_field](https://assets-docs.dify.ai/2025/03/aabfe789f607a1db9062beb493213376.png) > The date picker is for time-type fields. ![date\_picker](https://assets-docs.dify.ai/2025/03/65df828e605ebfb4947fccce189520a3.png) 3. Click **Save** to apply changes. **Bulk Update Metadata** To update metadata in bulk: 1. In the editor: * **Add Values:** Type directly in the field boxes. * **Reset Values:** Click the blue dot that appears on hover. ![reset\_values](https://assets-docs.dify.ai/2025/03/01c0cde5a6eafa48e1c6e5438fc2fa6b.png) * **Delete Values:** Clear the field or delete the **Multiple Value** card. ![multiple\_values](https://assets-docs.dify.ai/2025/03/5c4323095644d2658881b783246914f1.png) * **Delete fields:** Click the delete icon (fields appear struck through and grayed out). > Note: This only deletes the field from this document, not from your knowledge base. ![delete\_fields](https://assets-docs.dify.ai/2025/03/1b0318b898f951e307e3dc8cdc2f48d3.png) 2. Click **Save** to apply changes. **Set Update Scope** Use **Apply to All Documents** to control changes: * **Unchecked (Default)**: Updates only documents that already have the field. * **Checked**: Adds or updates fields across all selected documents. ![apply\_all\_changes](https://assets-docs.dify.ai/2025/03/4550c68960802c24271492b63a39ad05.png) #### Edit Metadata on the Document Details Page You can edit a single document's metadata on its details page. **Access Metadata Edit Mode** To edit a single document's metadata: On the document details page, click **Start labeling** to begin editing. ![Details\_Page](https://assets-docs.dify.ai/2025/03/066cb8eaa89f6ec17aacd8b09f06771c.png) ![Start\_Labeling](https://assets-docs.dify.ai/2025/03/4806c56e324589e1711c407f6a1443de.png) **Add Metadata** To add a single document's metadata fields and values: 1. Click **+Add Metadata** to: ![Add\_Metadata](https://assets-docs.dify.ai/2025/03/f9ba9b10bbcf6eaca787eed4fcde44da.png) * Create new fields via **+New Metadata**. > New fields are automatically added to the knowledge base. ![New\_Fields](https://assets-docs.dify.ai/2025/03/739e7e51436259fca45d16065509fabb.png) * Add existing fields from the dropdown or from the search box. ![Existing\_Fields](https://assets-docs.dify.ai/2025/03/5b1876e8bc2c880b3b774c97eba371ab.png) * Access the Metadata Panel via **Manage**. ![Manage\_Metadata](https://assets-docs.dify.ai/2025/03/8dc74a1d2cdd87294e58dbc3d6dd161b.png) 2. *(Optional)* Enter values for new fields. ![Values\_for\_Fields](https://assets-docs.dify.ai/2025/03/488107cbea73fd4583e043234fe2fd2e.png) 3. Click **Save** to apply changes. **Edit Metadata** To update a single document's metadata fields and values: 1. Click **Edit** in the top right to begin editing. ![Edit\_Mode](https://assets-docs.dify.ai/2025/03/bb33a0f9c6980300c0f979f8dc0d274d.png) 2. Edit metadata: * **Update Values:** Type directly in value fields or delete it. > Note: You can only modify the value, not the value name. * **Delete Fields:** Click the delete icon. > Note: This only deletes the field from this document, not from your knowledge base. ![Edit\_Metadata](https://assets-docs.dify.ai/2025/03/4c0c4d83d3ad240568f316abfccc9c2c.png) 3. Click **Save** to apply changes. ## How to Filter Documents with Metadata? See **Metadata Filtering** in *[Integrate Knowledge Base within Application](/en/use-dify/knowledge/integrate-knowledge-within-application)*. ## FAQ * **What can I do with metadata?** * Find information faster with smart filtering. * Control access to sensitive content. * Organize data more effectively. * Automate workflows based on metadata rules. * **Fields vs Values: What is the difference?** | | Definition | Characteristics | Examples | | ------------------------------------------ | -------------------------------------------------------------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | | Metadata Fields in the Metadata Panel | System-defined attributes that describe document properties | Global fields accessible across all documents in the knowledge base | Author, Type, Date, etc. | | Metadata Value on a document's detail page | Custom metadata tagged according to individual document requirements | Unique metadata values assigned based on document content and context | The "Author" field in Document A is set to "Mary" value, while in Document B it is set to "John" value. | * **How do different delete options work?** | Action | Steps | Impact | Outcome | | ---------------------------------------- | ------------------------------------------------------- | ------------------------------ | -------------------------------------------------------------------- | | Delete field in the Metadata Panel | In the Metadata Panel, click delete icon next to field | Global - affects all documents | Field and all values permanently deleted from the knowledge base | | Delete field in the Metadata Editor | In the Metadata Editor, click delete icon next to field | Selected documents only | Field deleted from selected documents; remains in the knowledge base | | Delete field on the document detail page | In the Edit Mode, click delete icon next to field | Current document only | Field deleted from current document; remains in the knowledge base | # Knowledge Source: https://docs.dify.ai/en/use-dify/knowledge/readme ## Introduction Knowledge in Dify is a collection of your own data that can be integrated into your AI apps. It allows you to provide LLMs with domain-specific information as context, ensuring their responses are more accurate, relevant, and less prone to hallucinations. This is made possible through Retrieval-Augmented Generation (RAG). It means that instead of relying solely on its pre-trained public data, the LLM uses your custom knowledge as an additional source of truth: 1. (Retrieval) When a user asks a question, the system first **retrieves the most relevant** information from the incorporated knowledge. 2. (Augmented) This retrieved information is then combined with the user's original query and sent to the LLM as **augmented context**. 3. (Generation) The LLM uses this context to generate a **more precise** answer. Knowledge is stored and managed in knowledge bases. You can create multiple knowledge bases, each tailored to different domains, use cases, or data sources, and selectively integrate them into your application as needed. ## Build with Knowledge With Dify knowledge, you can build AI apps that are grounded in your own data and domain-specific expertise. Here are some common use cases: * **Customer support chatbots**: Build smarter support bots that provide accurate answers from your up-to-date product documentation, FAQs, and troubleshooting guides. * **Internal knowledge portals**: Build AI-powered search and Q\&A systems for employees to quickly access company policies and procedures. * **Content generation tools**: Build intelligent writing tools that generate reports, articles, or emails based on specific background materials. * **Research & analysis applications**: Build applications that assist in research by retrieving and summarizing information from specific knowledge repositories like academic papers, market reports, or legal documents. ## Create Knowledge * **[Quick create](/en/use-dify/knowledge/create-knowledge/introduction)**: Import data, define processing rules, and let Dify handle the rest. Fast and beginner-friendly. * **[Create from a knowledge pipeline](/en/use-dify/knowledge/knowledge-pipeline/readme)**: Orchestrate more complex, flexible data processing workflows with custom steps and various plugins. * **[Connect to an external knowledge base](/en/use-dify/knowledge/connect-external-knowledge-base)**: Sync directly from external knowledge bases via APIs to leverage existing data without migration. ## Manage & Optimize Knowledge * **[Manage content](/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents)**: View, add, modify, or delete documents and chunks to keep your knowledge current, accurate, and retrieval-ready. * **[Test and validate retrieval](/en/use-dify/knowledge/test-retrieval)**: Simulate user queries to test how well your knowledge base retrieves relevant information. * **[Enhance retrieval with metadata](/en/use-dify/knowledge/metadata)**: Add metadata to documents to enable filter-based searches and further improve retrieval precision. * **[Adjust knowledge base settings](/en/use-dify/knowledge/manage-knowledge/introduction)**: Modify the index method, embedding model, and retrieval strategy at any time. ## Use Knowledge **[Integrate into applications](/en/use-dify/knowledge/integrate-knowledge-within-application)**: Ground your AI app in your own knowledge. *** **Read More**: * [Dify v1.1.0: Filtering Knowledge Retrieval with Customized Metadata](https://dify.ai/blog/dify-v1-1-0-filtering-knowledge-retrieval-with-customized-metadata) * [Dify v0.15.0: Introducing Parent-child Retrieval for Enhanced Knowledge](https://dify.ai/blog/introducing-parent-child-retrieval-for-enhanced-knowledge) * [Introducing Hybrid Search and Rerank to Improve the Retrieval Accuracy of the RAG System](https://dify.ai/blog/hybrid-search-rerank-rag-improvement) * [Dify.AI's New Dataset Feature Enhancements: Citations and Attributions](https://dify.ai/blog/difyai-new-dataset-features) * [Text Embedding: Basic Concepts and Implementation Principles](https://dify.ai/blog/text-embedding-basic-concepts-and-implementation-principles) * [Enhance Dify RAG with InfraNodus: Expand Your LLM's Context](https://dify.ai/blog/enhance-dify-rag-with-infranodus-expand-your-llm-s-context) * [Dify.AI x Jina AI: Dify now Integrates Jina Embedding Model](https://dify.ai/blog/integrating-jina-embeddings-v2-dify-enhancing-rag-applications) # Test Knowledge Retrieval Source: https://docs.dify.ai/en/use-dify/knowledge/test-retrieval In a knowledge base, click the **Retrieval Testing** icon in the left sidebar to enter the testing page. Here, you can simulate user queries to test how well the knowledge base retrieves relevant information and experiment with different retrieval settings for optimal performance. Retrieval settings adjusted here are temporary and only apply to the current test session. For more about retrieval settings, see [Configure the Retrieval Settings](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods#configure-the-retrieval-settings). The **Records** section logs all retrieval events associated with this knowledge base, including: * Queries tested directly on the **Retrieval Testing** page * Retrieval requests made by any linked app—whether during test runs or in production Test retrievals and regular retrievals share the same API endpoint. # Dashboard Source: https://docs.dify.ai/en/use-dify/monitor/analysis Monitor performance, costs, and user engagement through Dify's built-in analytics dashboard The dashboard tracks four metrics over time to show how your application performs: Monitoring Dashboard

**Total Messages:** Conversation volume\ **Active Users:** Users with meaningful interactions (more than one exchange)\ **Average User Interactions:** Engagement depth per session\ **Token Usage:** Resource consumption and costs Use the time selector to view trends over different periods. Click **"Tracing app performance"** to connect external observability platforms like Langfuse or LangSmith for deeper analytics. # Annotation System Source: https://docs.dify.ai/en/use-dify/monitor/annotation-reply Build a curated library of high-quality responses to improve consistency and bypass AI generation Annotations let you create a curated library of perfect responses for specific questions. When users ask similar questions, Dify returns your pre-written answers instead of generating new responses, ensuring consistency and eliminating AI hallucinations for critical topics. ## When to Use Annotations **Enterprise Standards** Create definitive answers for policy questions, product information, or customer service scenarios where consistency is critical. **Rapid Prototyping** Quickly improve demo applications by curating high-quality responses without retraining models or complex prompt engineering. **Quality Assurance** Ensure certain sensitive or important questions always receive your approved responses rather than potentially variable AI-generated content. ## How Annotations Work When annotation reply is enabled: 1. User asks a question 2. System searches existing annotations for semantic matches 3. If a match above the similarity threshold is found, returns the curated response 4. If no match, proceeds with normal AI generation 5. Track which annotations get used and how often This creates a "fast path" for known good answers while maintaining AI flexibility for new questions. ## Setting Up Annotations **Enable in App Configuration** Navigate to **Orchestrate → Add Features** and enable annotation reply. Configure the similarity threshold and embedding model for matching. **Similarity Threshold:** Higher values require closer matches. Start with moderate settings and adjust based on hit rates. **Embedding Model:** Used to vectorize questions for semantic matching. Changing the model regenerates all embeddings. ## Creating Annotations **From Conversations** In debug mode or logs, click on AI responses and edit them into the perfect answer. Save as an annotation for future use. **Bulk Import** Download the template, create Q\&A pairs in the specified format, and upload for batch annotation creation. **Manual Entry** Add annotations directly in the Logs & Annotations interface with custom questions and responses. ## Managing Annotation Quality **Hit Tracking** Monitor which annotations are matched, how often they're used, and the similarity scores of matches. This shows which annotations provide value. **Continuous Refinement** Review hit history to improve annotation coverage and accuracy. Questions that consistently miss your annotations indicate gaps in coverage. **A/B Testing** Compare user satisfaction rates before and after annotation implementation to measure impact. ## Annotation Analytics **Hit Rate Analysis** Track which annotations are frequently matched and which are never used. Remove unused annotations and expand successful patterns. **Question Patterns** Identify common user question types that would benefit from annotation coverage. **Match Quality** Review similarity scores to ensure annotations are triggering for appropriate questions without false matches. # Integrate with Alibaba Cloud Monitor Source: https://docs.dify.ai/en/use-dify/monitor/integrations/integrate-aliyun ## What is Alibaba Cloud Monitor Alibaba Cloud provides a fully managed, maintenance-free observability platform that enables one-click monitoring, tracing, and evaluation of Dify applications. Alibaba Cloud Monitor natively supports Python/Golang/Java applications through [LoongSuite](https://github.com/alibaba/loongsuite-python-agent) agents and open-source OpenTelemetry agents. In addition to one-click monitoring of Dify LLM applications, it also supports end-to-end observability of Dify components and their upstream and downstream dependencies through non-invasive agents. For more details, please refer to the [Cloud Monitor documentation](https://www.alibabacloud.com/help/en/cms/cloudmonitor-1-0/product-overview/what-is-cloudmonitor?spm=a3c0i.63551.2277339270.1.76c7112eeKEvSr). *** ## How to Configure Alibaba Cloud Monitor ### 1. Get Alibaba Cloud Endpoint and License Key 1. Log in to the [ARMS console](https://account.alibabacloud.com/login/login.htm?spm=5176.12901015-2.0.0.68d74b84XRatpU), and click **Integration Center** in the left navigation bar. 2. In the **Server-side Applications** area, click the **OpenTelemetry** card. 3. In the **OpenTelemetry** panel that appears, select **gRPC** as the export protocol, and select the connection method and region according to your actual deployment. ![Get Alibaba Cloud Access Point](https://dify-public-resources.oss-cn-hangzhou.aliyuncs.com/dify-doc/get_endpoint.png) 4. Save the **Public Endpoint** and **Authentication Token (License Key)**. The Endpoint does not include a port number, for example `http://tracing-cn-heyuan.arms.aliyun.com`. ### 2. Configure Cloud Monitor in Dify **Prerequisites**: Dify Cloud or Community Edition version must be ≥ v1.6.0 1. Log in to the Dify console and navigate to the application you want to monitor. 2. Open **Monitoring** in the left navigation bar. 3. Click **Tracing app performance**, then click **Configure** in the **Cloud Monitor** area. ![Configure Alibaba Cloud Monitor](https://dify-public-resources.oss-cn-hangzhou.aliyuncs.com/dify-doc/config_cms.png) 4. In the dialog that appears, enter the **License Key** and **Endpoint** obtained in step 1, and customize the **App Name** (the application name displayed in the ARMS console), then click **Save & Enable**. *** ## View Monitoring Data in Alibaba Cloud Monitor After configuration, debug or production data from applications in Dify can be monitored in Cloud Monitor. ### Method 1: Jump to ARMS Console from Dify Application In the Dify console, select an application with tracing enabled, go to **Tracing Configuration**, and click **View** in the **Cloud Monitor** area. ### Method 2: View Directly in ARMS Console Go to the corresponding Dify application in the **LLM Application Monitoring > Application List** page of the ARMS console. *** ## Access More Data Cloud Monitor provides multi-language non-invasive agents that support accessing various components of the Dify cluster to achieve end-to-end tracing. | Dify Component | Agent | Details | | -------------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Nginx | OpenTelemetry Agent | [Use OpenTelemetry for Nginx Tracing](https://www.alibabacloud.com/help/en/opentelemetry/user-guide/use-opentelemetry-to-perform-tracing-analysis-on-nginx?spm=a2c63.l28256.help-menu-search-90275.d_1) | | API | LoongSuite-Python Agent | [loongsuite-python-agent](https://github.com/alibaba/loongsuite-python-agent/blob/main/README.md) | | Sandbox | LoongSuite-Go Agent | [loongsuite-go-agent](https://github.com/alibaba/loongsuite-go-agent/blob/main/README.md) | | Worker | OpenTelemetry Agent | [Submit Python Application Data via OpenTelemetry](https://www.alibabacloud.com/help/en/opentelemetry/user-guide/use-managed-service-for-opentelemetry-to-submit-the-trace-data-of-python-applications?spm=a2c63.p38356.help-menu-90275.d_2_0_5_0.18ee53a4EGoGuS) | | Plugin-Daemon | LoongSuite-Go Agent | [loongsuite-go-agent](https://github.com/alibaba/loongsuite-go-agent/blob/main/README.md) | *** ## Monitoring Data List Cloud Monitor supports collecting data from Dify's Workflow/Chatflow/Chat/Agent applications, including execution details of workflows and workflow nodes, covering model calls, tool calls, knowledge retrieval, execution details of various process nodes, as well as metadata such as conversations and user information. ### Workflow/Chatflow Trace Information

Workflow	Alibaba Cloud Monitor Trace
workflow\_id	Unique identifier of the Workflow
conversation\_id	Conversation ID
workflow\_run\_id	ID of this run
tenant\_id	Tenant ID
elapsed\_time	Duration of this run
status	Run status
version	Workflow version
total\_tokens	Total tokens used in this run
file\_list	List of processed files
triggered\_from	Source that triggered this run
workflow\_run\_inputs	Input data for this run
workflow\_run\_outputs	Output data for this run
error	Errors that occurred during this run
query	Query used during runtime
workflow\_app\_log\_id	Workflow application log ID
message\_id	Associated message ID
start\_time	Run start time
end\_time	Run end time

**Workflow Trace Metadata** * workflow\_id - Unique identifier of the Workflow * conversation\_id - Conversation ID * workflow\_run\_id - ID of this run * tenant\_id - Tenant ID * elapsed\_time - Duration of this run * status - Run status * version - Workflow version * total\_tokens - Total tokens used in this run * file\_list - List of processed files * triggered\_from - Trigger source ### Message Trace Information

Message	Alibaba Cloud Monitor Trace
message\_id	Message ID
message\_data	Message data
user\_session\_id	User's session\_id
conversation\_model	Conversation model
message\_tokens	Number of tokens in the message
answer\_tokens	Number of tokens in the answer
total\_tokens	Total tokens in message and answer
error	Error information
inputs	Input data
outputs	Output data
file\_list	List of processed files
start\_time	Start time
end\_time	End time
message\_file\_data	File data associated with the message
conversation\_mode	Conversation mode

**Message Trace Metadata** * conversation\_id - ID of the conversation to which the message belongs * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * agent\_based - Whether it is agent-based * workflow\_run\_id - Workflow run ID * from\_source - Message source * message\_id - Message ID ### Dataset Retrieval Trace Information

Dataset Retrieval	Alibaba Cloud Monitor Trace
message\_id	Message ID
inputs	Input content
documents	Document data
start\_time	Start time
end\_time	End time
message\_data	Message data

**Dataset Retrieval Trace Metadata** * message\_id - Message ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * agent\_based - Whether it is agent-based * workflow\_run\_id - Workflow run ID * from\_source - Message source ### Tool Trace Information

Tool	Alibaba Cloud Monitor Trace
message\_id	Message ID
tool\_name	Tool name
start\_time	Start time
end\_time	End time
tool\_inputs	Tool inputs
tool\_outputs	Tool outputs
message\_data	Message data
error	Error information (if any)
inputs	Input content of the message
outputs	Answer content of the message
tool\_config	Tool configuration
time\_cost	Time cost
tool\_parameters	Tool parameters
file\_url	URL of associated file

**Tool Trace Metadata** * message\_id - Message ID * tool\_name - Tool name * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * tool\_config - Tool configuration * time\_cost - Time cost * error - Error information * tool\_parameters - Tool parameters * message\_file\_id - Message file ID * created\_by\_role - Creator role * created\_user\_id - Creator user ID # Integrate with Arize Source: https://docs.dify.ai/en/use-dify/monitor/integrations/integrate-arize ### What is Arize Enterprise-grade LLM observability, online & offline evaluation, monitoring, and experimentation—powered by OpenTelemetry. Purpose-built for LLM & agent-driven applications. For more details, please refer to [Arize](https://arize.com). *** ### How to Configure Arize #### 1. Register/Login to [Arize](https://app.arize.com/auth/join) #### 2. Get your Arize API Key Retrieve your Arize API Key from the user menu at the top-right. Click on **API Key**, then on the API Key to copy it: ![Arize API Key](https://i.ibb.co/JwBmQxnf/dify-docs-arize-api-key.png) #### 3. Integrating Arize with Dify Configure Arize in the Dify application. Open the application you need to monitor, open **Monitoring** in the side menu, and select **Tracing app performance** on the page. ![Tracing App Performance](https://i.ibb.co/v6cL6rPs/dify-docs-arize-in-use.png) After clicking configure, paste the **API Key**, **Space ID** and **project name** created in Arize into the configuration and save. ![Configure Arize](https://i.ibb.co/m5Xww8gL/dify-docs-arize-config.png) Once successfully saved, you can view the monitoring status on the current page. ![Configure Arize](https://i.ibb.co/xtggVmb7/dify-docs-arize-in-service.png) ### Monitoring Data List #### **Workflow/Chatflow Trace Information** **Used to track workflows and chatflows**

Workflow	Arize Trace
workflow\_app\_log\_id/workflow\_run\_id	id
user\_session\_id	- placed in metadata
name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	metadata
error	error
\[workflow]	tags
"conversation\_id/none for workflow"	conversation\_id in metadata

**Workflow Trace Info** * workflow\_id - Unique identifier of the workflow * conversation\_id - Conversation ID * workflow\_run\_id - ID of the current run * tenant\_id - Tenant ID * elapsed\_time - Time taken for the current run * status - Run status * version - Workflow version * total\_tokens - Total tokens used in the current run * file\_list - List of processed files * triggered\_from - Source that triggered the current run * workflow\_run\_inputs - Input data for the current run * workflow\_run\_outputs - Output data for the current run * error - Errors encountered during the current run * query - Query used during the run * workflow\_app\_log\_id - Workflow application log ID * message\_id - Associated message ID * start\_time - Start time of the run * end\_time - End time of the run * workflow node executions - Information about workflow node executions * Metadata * workflow\_id - Unique identifier of the workflow * conversation\_id - Conversation ID * workflow\_run\_id - ID of the current run * tenant\_id - Tenant ID * elapsed\_time - Time taken for the current run * status - Run status * version - Workflow version * total\_tokens - Total tokens used in the current run * file\_list - List of processed files * triggered\_from - Source that triggered the current run #### **Message Trace Information** **Used to track LLM-related conversations**

Chat	Arize LLM
message\_id	id
user\_session\_id	- placed in metadata
"llm"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	metadata
\["message", conversation\_mode]	tags
conversation\_id	conversation\_id in metadata

**Message Trace Info** * message\_id - Message ID * message\_data - Message data * user\_session\_id - User session ID * conversation\_model - Conversation mode * message\_tokens - Number of tokens in the message * answer\_tokens - Number of tokens in the answer * total\_tokens - Total number of tokens in the message and answer * error - Error information * inputs - Input data * outputs - Output data * file\_list - List of processed files * start\_time - Start time * end\_time - End time * message\_file\_data - File data associated with the message * conversation\_mode - Conversation mode * Metadata * conversation\_id - Conversation ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * agent\_based - Whether the message is agent-based * workflow\_run\_id - Workflow run ID * from\_source - Message source #### **Moderation Trace Information** **Used to track conversation moderation**

Moderation	Arize Tool
user\_id	- placed in metadata
“moderation"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["moderation"]	tags

**Moderation Trace Info** * message\_id - Message ID * user\_id: User ID * workflow\_app\_log\_id - Workflow application log ID * inputs - Moderation input data * message\_data - Message data * flagged - Whether the content is flagged for attention * action - Specific actions taken * preset\_response - Preset response * start\_time - Moderation start time * end\_time - Moderation end time * Metadata * message\_id - Message ID * action - Specific actions taken * preset\_response - Preset response #### **Suggested Question Trace Information** **Used to track suggested questions**

Suggested Question	Arize LLM
user\_id	- placed in metadata
"suggested\_question"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["suggested\_question"]	tags

**Message Trace Info** * message\_id - Message ID * message\_data - Message data * inputs - Input content * outputs - Output content * start\_time - Start time * end\_time - End time * total\_tokens - Number of tokens * status - Message status * error - Error information * from\_account\_id - ID of the sending account * agent\_based - Whether the message is agent-based * from\_source - Message source * model\_provider - Model provider * model\_id - Model ID * suggested\_question - Suggested question * level - Status level * status\_message - Status message * Metadata * message\_id - Message ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * workflow\_run\_id - Workflow run ID * from\_source - Message source #### **Dataset Retrieval Trace Information** **Used to track knowledge base retrieval**

Dataset Retrieval	Arize Retriever
user\_id	- placed in metadata
"dataset\_retrieval"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["dataset\_retrieval"]	tags
message\_id	parent\_run\_id

**Dataset Retrieval Trace Info** * message\_id - Message ID * inputs - Input content * documents - Document data * start\_time - Start time * end\_time - End time * message\_data - Message data * Metadata * message\_id - Message ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * agent\_based - Whether the message is agent-based * workflow\_run\_id - Workflow run ID * from\_source - Message source #### **Tool Trace Information** **Used to track tool invocation**

Tool	Arize Tool
user\_id	- placed in metadata
tool\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["tool", tool\_name]	tags

#### **Tool Trace Info** * message\_id - Message ID * tool\_name - Tool name * start\_time - Start time * end\_time - End time * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * message\_data - Message data * error - Error information, if any * inputs - Inputs for the message * outputs - Outputs of the message * tool\_config - Tool configuration * time\_cost - Time cost * tool\_parameters - Tool parameters * file\_url - URL of the associated file * Metadata * message\_id - Message ID * tool\_name - Tool name * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * tool\_config - Tool configuration * time\_cost - Time cost * error - Error information, if any * tool\_parameters - Tool parameters * message\_file\_id - Message file ID * created\_by\_role - Role of the creator * created\_user\_id - User ID of the creator **Generate Name Trace Information** **Used to track conversation title generation**

Generate Name	Arize Tool
user\_id	- placed in metadata
"generate\_conversation\_name"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["generate\_name"]	tags

Workflow	LangFuse Trace
workflow\_app\_log\_id/workflow\_run\_id	id
user\_session\_id	user\_id
name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
Model token consumption	usage
metadata	metadata
error	level
error	status\_message
\[workflow]	tags
\["message", conversation\_mode]	session\_id
conversion\_id	parent\_observation\_id

**Workflow Trace Info** * workflow\_id - Unique ID of Workflow * conversation\_id - Conversation ID * workflow\_run\_id - Workflow ID of this runtime * tenant\_id - Tenant ID * elapsed\_time - Elapsed time at this runtime * status - Runtime status * version - Workflow version * total\_tokens - Total token used at this runtime * file\_list - List of files processed * triggered\_from - Source that triggered this runtime * workflow\_run\_inputs - Input of this workflow * workflow\_run\_outputs - Output of this workflow * error - Error Message * query - Queries used at runtime * workflow\_app\_log\_id - Workflow Application Log ID * message\_id - Relevant Message ID * start\_time - Start time of this runtime * end\_time - End time of this runtime * workflow node executions - Workflow node runtime information * Metadata * workflow\_id - Unique ID of Workflow * conversation\_id - Conversation ID * workflow\_run\_id - Workflow ID of this runtime * tenant\_id - Tenant ID * elapsed\_time - Elapsed time at this runtime * status - Operational state * version - Workflow version * total\_tokens - Total token used at this runtime * file\_list - List of files processed * triggered\_from - Source that triggered this runtime #### Message Trace Info **For trace llm conversation**

Message	LangFuse Generation/Trace
message\_id	id
user\_session\_id	user\_id
name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
Model token consumption	usage
metadata	metadata
error	level
error	status\_message
\["message", conversation\_mode]	tags
conversation\_id	session\_id
conversion\_id	parent\_observation\_id

**Message Trace Info** * message\_id - Message ID * message\_data - Message data * user\_session\_id - Session ID for user * conversation\_model - Conversation model * message\_tokens - Message tokens * answer\_tokens - Answer Tokens * total\_tokens - Total Tokens from Message and Answer * error - Error Message * inputs - Input data * outputs - Output data * file\_list - List of files processed * start\_time - Start time * end\_time - End time * message\_file\_data - Message of relevant file data * conversation\_mode - Conversation mode * Metadata * conversation\_id - Conversation ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - Sending user's ID * from\_account\_id - Sending account's ID * agent\_based - Whether agent based * workflow\_run\_id - Workflow ID of this runtime * from\_source - Message source * message\_id - Message ID #### Moderation Trace Information **Used to track conversation moderation**

Moderation	LangFuse Generation/Trace
user\_id	user\_id
moderation	name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
metadata	metadata
\[moderation]	tags
message\_id	parent\_observation\_id

**Message Trace Info** * message\_id - Message ID * user\_id - user ID * workflow\_app\_log\_id workflow\_app\_log\_id * inputs - Input data for review * message\_data - Message Data * flagged - Whether it is flagged for attention * action - Specific actions to implement * preset\_response - Preset response * start\_time - Start time of review * end\_time - End time of review * Metadata * message\_id - Message ID * action - Specific actions to implement * preset\_response - Preset response #### Suggested Question Trace Information **Used to track suggested questions**

Suggested Question	LangFuse Generation/Trace
user\_id	user\_id
suggested\_question	name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
metadata	metadata
\[suggested\_question]	tags
message\_id	parent\_observation\_id

**Message Trace Info** * message\_id - Message ID * message\_data - Message data * inputs - Input data * outputs - Output data * start\_time - Start time * end\_time - End time * total\_tokens - Total tokens * status - Message Status * error - Error Message * from\_account\_id - Sending account ID * agent\_based - Whether agent based * from\_source - Message source * model\_provider - Model provider * model\_id - Model ID * suggested\_question - Suggested question * level - Status level * status\_message - Message status * Metadata * message\_id - Message ID * ls\_provider - Model Provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - Sending user's ID * from\_account\_id - Sending Account ID * workflow\_run\_id - Workflow ID of this runtime * from\_source - Message source #### Dataset Retrieval Trace Information **Used to track knowledge base retrieval**

Dataset Retrieval	LangFuse Generation/Trace
user\_id	user\_id
dataset\_retrieval	name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
metadata	metadata
\[dataset\_retrieval]	tags
message\_id	parent\_observation\_id

**Dataset Retrieval Trace Info** * message\_id - Message ID * inputs - Input Message * documents - Document data * start\_time - Start time * end\_time - End time * message\_data - Message data * Metadata * message\_id - Message ID * ls\_provider - Model Provider * ls\_model\_name - Model ID * status - Model status * from\_end\_user\_id - Sending user's ID * from\_account\_id - Sending account's ID * agent\_based - Whether agent based * workflow\_run\_id - Workflow ID of this runtime * from\_source - Message Source #### Tool Trace Information **Used to track tool invocation**

Tool	LangFuse Generation/Trace
user\_id	user\_id
tool\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
metadata	metadata
\["tool", tool\_name]	tags
message\_id	parent\_observation\_id

**Tool Trace Info** * message\_id - Message ID * tool\_name - Tool Name * start\_time - Start time * end\_time - End time * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * message\_data - Message data * error - Error Message，if exist * inputs - Input of Message * outputs - Output of Message * tool\_config - Tool config * time\_cost - Time cost * tool\_parameters - Tool Parameters * file\_url - URL of relevant files * Metadata * message\_id - Message ID * tool\_name - Tool Name * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * tool\_config - Tool config * time\_cost - Time. cost * error - Error Message * tool\_parameters - Tool parameters * message\_file\_id - Message file ID * created\_by\_role - Created by role * created\_user\_id - Created user ID #### Generate Name Trace **Used to track conversation title generation**

Generate Name	LangFuse Generation/Trace
user\_id	user\_id
generate\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	input
outputs	output
metadata	metadata
\[generate\_name]	tags

**Generate Name Trace Info** * conversation\_id - Conversation ID * inputs - Input data * outputs - Generated session name * start\_time - Start time * end\_time - End time * tenant\_id - Tenant ID * Metadata * conversation\_id - Conversation ID * tenant\_id - Tenant ID ### Langfuse Prompt Management The [Langfuse Prompt Management Plugin](https://github.com/gao-ai-com/dify-plugin-langfuse) (community maintained) lets you use prompts that are [managed and versioned in Langfuse](https://langfuse.com/docs/prompt-management/get-started) in your Dify applications, enhancing your LLM application development workflow. Key features include: * **Get Prompt:** Fetch specific prompts managed in Langfuse. * **Search Prompts:** Search for prompts in Langfuse using various filters. * **Update Prompt:** Create new versions of prompts in Langfuse and set tags/labels. This integration streamlines the process of managing and versioning your prompts, contributing to more efficient development and iteration cycles. You can find the plugin and installation instructions [here](https://github.com/gao-ai-com/dify-plugin-langfuse). # Integrate with LangSmith Source: https://docs.dify.ai/en/use-dify/monitor/integrations/integrate-langsmith ### What is LangSmith LangSmith is a platform for building production-grade LLM applications. It is used for developing, collaborating, testing, deploying, and monitoring LLM applications. For more details, please refer to [LangSmith](https://www.langchain.com/langsmith). *** ### How to Configure LangSmith #### 1. Register/Login to [LangSmith](https://www.langchain.com/langsmith) #### 2. Create a Project Create a project in LangSmith. After logging in, click **New Project** on the homepage to create your own project. The **project** will be used to associate with **applications** in Dify for data monitoring. ![Create a Project in LangSmith](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/58e20105fcc0771ca2431e8e5dcc42d3.png) Once created, you can view all created projects in the Projects section. ![View Created Projects in LangSmith](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/642c0ff7edfdfe77fba43aa22cc3fa71.png) #### 3. Create Project Credentials Find the project settings **Settings** in the left sidebar. ![Project Settings](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/c49a1fc769215193928ff0d880422f89.png) Click **Create API Key** to create project credentials. ![Create a Project API Key](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/7082286b0d12af4bc0c84d9a3acf8b1b.png) Select **Personal Access Token** for subsequent API authentication. ![Create an API Key](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/75a69bd4dd02f0ffc0313589ae12fb36.png) Copy and save the created API key. ![Copy API Key](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/723e96a13e8f722d6df714b11ffd0bb1.png) #### 4. Integrating LangSmith with Dify Configure LangSmith in the Dify application. Open the application you need to monitor, open **Monitoring** in the side menu, and select **Tracing app performance** on the page. ![Tracing App Performance](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/b6c7e5d4c2ca2092d59465cca27bc69c.png) After clicking configure, paste the **API Key** and **project name** created in LangSmith into the configuration and save. ![Configure LangSmith](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/93dfabcadb7b2ff597f54beb5e642124.png) The configured project name needs to match the project set in LangSmith. If the project names do not match, LangSmith will automatically create a new project during data synchronization. Once successfully saved, you can view the monitoring status on the current page. ![View Configuration Status](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/43369dc4de8f606c166fae2efab97d73.png) ### Viewing Monitoring Data in LangSmith Once configured, the debug or production data from applications within Dify can be monitored in LangSmith. ![Debugging Applications in Dify](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/a1370fdbb79257cba31a565ac6764802.png) When you switch to LangSmith, you can view detailed operation logs of Dify applications in the dashboard. ![Viewing Application Data in LangSmith](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/2833b2ffa20927b5328e9624b065beea.png) Detailed LLM operation logs through LangSmith will help you optimize the performance of your Dify application. ![Viewing Application Data in LangSmith](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/monitoring/integrate-external-ops-tools/beeb4ee50c80de8db7400c1f65727c8c.png) ### Monitoring Data List #### **Workflow/Chatflow Trace Information** **Used to track workflows and chatflows**

Workflow	LangSmith Chain
workflow\_app\_log\_id/workflow\_run\_id	id
user\_session\_id	- placed in metadata
name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	extra
error	error
\[workflow]	tags
"conversation\_id/none for workflow"	conversation\_id in metadata
conversion\_id	parent\_run\_id

Chat	LangSmith LLM
message\_id	id
user\_session\_id	- placed in metadata
name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	extra
error	error
\["message", conversation\_mode]	tags
conversation\_id	conversation\_id in metadata
conversion\_id	parent\_run\_id

Moderation	LangSmith Tool
user\_id	- placed in metadata
“moderation"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	extra
\[moderation]	tags
message\_id	parent\_run\_id

Suggested Question	LangSmith LLM
user\_id	- placed in metadata
suggested\_question	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	extra
\[suggested\_question]	tags
message\_id	parent\_run\_id

Dataset Retrieval	LangSmith Retriever
user\_id	- placed in metadata
dataset\_retrieval	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	extra
\[dataset\_retrieval]	tags
message\_id	parent\_run\_id

Tool	LangSmith Tool
user\_id	- placed in metadata
tool\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	extra
\["tool", tool\_name]	tags
message\_id	parent\_run\_id

Generate Name	LangSmith Tool
user\_id	- placed in metadata
generate\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	extra
\[generate\_name]	tags

Workflow	Opik Trace
workflow\_app\_log\_id/workflow\_run\_id	id
user\_session\_id	- placed in metadata
name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	metadata
error	error
\[workflow]	tags
"conversation\_id/none for workflow"	conversation\_id in metadata

Chat	Opik LLM
message\_id	id
user\_session\_id	- placed in metadata
"llm"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	metadata
\["message", conversation\_mode]	tags
conversation\_id	conversation\_id in metadata

Moderation	Opik Tool
user\_id	- placed in metadata
“moderation"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["moderation"]	tags

Suggested Question	Opik LLM
user\_id	- placed in metadata
"suggested\_question"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["suggested\_question"]	tags

Dataset Retrieval	Opik Retriever
user\_id	- placed in metadata
"dataset\_retrieval"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["dataset\_retrieval"]	tags
message\_id	parent\_run\_id

Tool	Opik Tool
user\_id	- placed in metadata
tool\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["tool", tool\_name]	tags

Generate Name	Opik Tool
user\_id	- placed in metadata
"generate\_conversation\_name"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["generate\_name"]	tags

Workflow	Phoenix Trace
workflow\_app\_log\_id/workflow\_run\_id	id
user\_session\_id	- placed in metadata
name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	metadata
error	error
\[workflow]	tags
"conversation\_id/none for workflow"	conversation\_id in metadata

Chat	Phoenix LLM
message\_id	id
user\_session\_id	- placed in metadata
"llm"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
Model token consumption	usage\_metadata
metadata	metadata
\["message", conversation\_mode]	tags
conversation\_id	conversation\_id in metadata

Moderation	Phoenix Tool
user\_id	- placed in metadata
“moderation"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["moderation"]	tags

Suggested Question	Phoenix LLM
user\_id	- placed in metadata
"suggested\_question"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["suggested\_question"]	tags

Dataset Retrieval	Phoenix Retriever
user\_id	- placed in metadata
"dataset\_retrieval"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["dataset\_retrieval"]	tags
message\_id	parent\_run\_id

Tool	Phoenix Tool
user\_id	- placed in metadata
tool\_name	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["tool", tool\_name]	tags

Generate Name	Phoenix Tool
user\_id	- placed in metadata
"generate\_conversation\_name"	name
start\_time	start\_time
end\_time	end\_time
inputs	inputs
outputs	outputs
metadata	metadata
\["generate\_name"]	tags

**Generate Name Trace Info** * conversation\_id - Conversation ID * inputs - Input data * outputs - Generated conversation name * start\_time - Start time * end\_time - End time * tenant\_id - Tenant ID * Metadata * conversation\_id - Conversation ID * tenant\_id - Tenant ID # Integrate with W&B Weave Source: https://docs.dify.ai/en/use-dify/monitor/integrations/integrate-weave Dify Cloud | Community version ≥ v1.3.1 ### What is W\&b Weave Weights & Biases (W\&B) Weave is a framework for tracking, experimenting with, evaluating, deploying, and improving LLM-based applications. Designed for flexibility and scalability, Weave supports every stage of your LLM application development workflow. For more details, please refer to [Weave](https://docs.wandb.ai/weave). *** ### How to Configure Weave #### 1. Register/Login Register/Login to [W\&B Weave](https://wandb.ai/signup) and get your API key. Then, copy your API key from [here](https://wandb.ai/authorize). #### 2. Integrating W\&B Weave with Dify Configure Weave in the Dify application. Open the application you need to monitor, open **Monitoring** in the side menu, and select **Tracing app performance** on the page. ![Configure Weave in the Dify Application](https://assets-docs.dify.ai/2025/04/c33e8fda75ee9052ed23c8690e314862.png) After clicking configure, paste the **API Key** and **project name**, also specify the **W\&B entity**(optionally, default is your username) into the configuration and save. ![Configure, Paste the API Key and Project Name, Also Specify the W\&B](https://assets-docs.dify.ai/2025/04/60bce1ae7b883825b13526d172ae0073.png) Once successfully saved, you can view the monitoring status on the current page. ![Once Successfully Saved, You Can View the Monitoring Status on the Current Page](https://assets-docs.dify.ai/2025/04/9486cee7bbb61f069842c9ea860e679c.png) ### Viewing Monitoring Data in Weave Once configured, the debug or production data from applications within Dify can be monitored in Weave. ![Once Configured, the Debug or Production Data from Applications Within Dify Can](https://assets-docs.dify.ai/2025/04/a1c5aa80325e6d0223d48a178393baec.png) When you switch to Weave, you can view detailed operation logs of Dify applications in the dashboard. ![When You Switch to Weave, You Can View Detailed Operation Logs of Dify](https://assets-docs.dify.ai/2025/04/2cb04027c00b606029fcc26af2801bfe.png) Detailed LLM operation logs through Weave will help you optimize the performance of your Dify application. ### Monitoring Data List #### **Workflow/Chatflow Trace Information** **Used to track workflows and chatflows** | Workflow | Weave Trace | | ---------------------------------------- | ---------------------------- | | workflow\_app\_log\_id/workflow\_run\_id | id | | user\_session\_id | placed in metadata | | workflow\_\{id} | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | Model token consumption | usage\_metadata | | metadata | extra | | error | error | | workflow | tags | | "conversation\_id/none for workflow" | conversation\_id in metadata | | conversion\_id | parent\_run\_id | **Workflow Trace Info** * workflow\_id - Unique identifier of the workflow * conversation\_id - Conversation ID * workflow\_run\_id - ID of the current run * tenant\_id - Tenant ID * elapsed\_time - Time taken for the current run * status - Run status * version - Workflow version * total\_tokens - Total tokens used in the current run * file\_list - List of processed files * triggered\_from - Source that triggered the current run * workflow\_run\_inputs - Input data for the current run * workflow\_run\_outputs - Output data for the current run * error - Errors encountered during the current run * query - Query used during the run * workflow\_app\_log\_id - Workflow application log ID * message\_id - Associated message ID * start\_time - Start time of the run * end\_time - End time of the run * workflow node executions - Information about workflow node executions * Metadata * workflow\_id - Unique identifier of the workflow * conversation\_id - Conversation ID * workflow\_run\_id - ID of the current run * tenant\_id - Tenant ID * elapsed\_time - Time taken for the current run * status - Run status * version - Workflow version * total\_tokens - Total tokens used in the current run * file\_list - List of processed files * triggered\_from - Source that triggered the current run #### **Message Trace Information** **Used to track LLM-related conversations** | Chat | Weave Trace | | ----------------------------- | ---------------------------- | | message\_id | id | | user\_session\_id | placed in metadata | | "message\_\{id}" | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | Model token consumption | usage\_metadata | | metadata | extra | | error | error | | "message", conversation\_mode | tags | | conversation\_id | conversation\_id in metadata | | conversion\_id | parent\_run\_id | **Message Trace Info** * message\_id - Message ID * message\_data - Message data * user\_session\_id - User session ID * conversation\_model - Conversation mode * message\_tokens - Number of tokens in the message * answer\_tokens - Number of tokens in the answer * total\_tokens - Total number of tokens in the message and answer * error - Error information * inputs - Input data * outputs - Output data * file\_list - List of processed files * start\_time - Start time * end\_time - End time * message\_file\_data - File data associated with the message * conversation\_mode - Conversation mode * Metadata * conversation\_id - Conversation ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * agent\_based - Whether the message is agent-based * workflow\_run\_id - Workflow run ID * from\_source - Message source #### **Moderation Trace Information** **Used to track conversation moderation** | Moderation | Weave Trace | | ------------ | ------------------ | | user\_id | placed in metadata | | “moderation" | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | metadata | extra | | moderation | tags | | message\_id | parent\_run\_id | **Moderation Trace Info** * message\_id - Message ID * user\_id: User ID * workflow\_app\_log\_id - Workflow application log ID * inputs - Moderation input data * message\_data - Message data * flagged - Whether the content is flagged for attention * action - Specific actions taken * preset\_response - Preset response * start\_time - Moderation start time * end\_time - Moderation end time * Metadata * message\_id - Message ID * action - Specific actions taken * preset\_response - Preset response #### **Suggested Question Trace Information** **Used to track suggested questions** | Suggested Question | Weave Trace | | ------------------- | ------------------ | | user\_id | placed in metadata | | suggested\_question | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | metadata | extra | | suggested\_question | tags | | message\_id | parent\_run\_id | **Message Trace Info** * message\_id - Message ID * message\_data - Message data * inputs - Input content * outputs - Output content * start\_time - Start time * end\_time - End time * total\_tokens - Number of tokens * status - Message status * error - Error information * from\_account\_id - ID of the sending account * agent\_based - Whether the message is agent-based * from\_source - Message source * model\_provider - Model provider * model\_id - Model ID * suggested\_question - Suggested question * level - Status level * status\_message - Status message * Metadata * message\_id - Message ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * workflow\_run\_id - Workflow run ID * from\_source - Message source #### **Dataset Retrieval Trace Information** **Used to track knowledge base retrieval** | Dataset Retrieval | Weave Trace | | ------------------ | ------------------ | | user\_id | placed in metadata | | dataset\_retrieval | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | metadata | extra | | dataset\_retrieval | tags | | message\_id | parent\_run\_id | **Dataset Retrieval Trace Info** * message\_id - Message ID * inputs - Input content * documents - Document data * start\_time - Start time * end\_time - End time * message\_data - Message data * Metadata * message\_id - Message ID * ls\_provider - Model provider * ls\_model\_name - Model ID * status - Message status * from\_end\_user\_id - ID of the sending user * from\_account\_id - ID of the sending account * agent\_based - Whether the message is agent-based * workflow\_run\_id - Workflow run ID * from\_source - Message source #### **Tool Trace Information** **Used to track tool invocation** | Tool | Weave Trace | | ------------------ | ------------------ | | user\_id | placed in metadata | | tool\_name | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | metadata | extra | | "tool", tool\_name | tags | | message\_id | parent\_run\_id | #### **Tool Trace Info** * message\_id - Message ID * tool\_name - Tool name * start\_time - Start time * end\_time - End time * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * message\_data - Message data * error - Error information, if any * inputs - Inputs for the message * outputs - Outputs of the message * tool\_config - Tool configuration * time\_cost - Time cost * tool\_parameters - Tool parameters * file\_url - URL of the associated file * Metadata * message\_id - Message ID * tool\_name - Tool name * tool\_inputs - Tool inputs * tool\_outputs - Tool outputs * tool\_config - Tool configuration * time\_cost - Time cost * error - Error information, if any * tool\_parameters - Tool parameters * message\_file\_id - Message file ID * created\_by\_role - Role of the creator * created\_user\_id - User ID of the creator **Generate Name Trace Information** **Used to track conversation title generation** | Generate Name | Weave Trace | | -------------- | ------------------ | | user\_id | placed in metadata | | generate\_name | name | | start\_time | start\_time | | end\_time | end\_time | | inputs | inputs | | outputs | outputs | | metadata | extra | | generate\_name | tags | **Generate Name Trace Info** * conversation\_id - Conversation ID * inputs - Input data * outputs - Generated conversation name * start\_time - Start time * end\_time - End time * tenant\_id - Tenant ID * Metadata * conversation\_id - Conversation ID * tenant\_id - Tenant ID # Logs Source: https://docs.dify.ai/en/use-dify/monitor/logs Monitor real-time conversations, debug issues, and collect user feedback Conversation logs provide detailed visibility into every interaction with your AI application. Use them to debug specific issues, understand user behavior patterns, and collect feedback for continuous improvement. ## What Gets Logged **All User Interactions** Every conversation through your web app or API is logged with complete input/output history, timing data, and system metadata. **User Feedback** Thumbs up/down ratings and user comments are captured alongside the conversations they reference. **System Context** Model used, token consumption, response times, and any errors or warnings during processing. **Exclusions:** Debugging sessions and prompt testing are not included in logs. ## Using the Logs Console Access logs from your application's navigation menu. The interface shows: * **Conversation Timeline:** Chronological list of user interactions * **Message Details:** Full conversation context with AI responses * **Performance Data:** Response times and token usage per interaction * **User Feedback:** Ratings and comments from users and team members ## Debugging with Logs **Failed Interactions** Quickly identify conversations where the AI provided poor responses, failed to understand user intent, or encountered errors. **Performance Issues** Spot slow responses, high token usage, or system errors that affect user experience. **User Journey Analysis** Follow individual users through multiple conversations to understand usage patterns and pain points. ## Feedback Collection **User Ratings** Users can provide thumbs up/down feedback on AI responses. Track satisfaction trends over time. **Team Annotations** Team members can add internal notes and improved responses directly in the log interface. **Feedback Analysis** Identify common complaint patterns, successful interaction types, and areas needing improvement. ## Log Retention Ensure your application complies with local data privacy regulations. Publish a privacy policy and obtain user consent where required. * **Sandbox**: Logs are retained for 30 days. * **Professional & Team**: Unlimited log retention during active subscription. * **Self-hosted**: Unlimited by default; configurable via environment variables `WORKFLOW_LOG_CLEANUP_ENABLED`, `WORKFLOW_LOG_RETENTION_DAYS`, and `WORKFLOW_LOG_CLEANUP_BATCH_SIZE`. ## Improving Applications with Logs **Pattern Recognition** Look for recurring user questions that your application handles poorly. These indicate opportunities for prompt improvements or knowledge base updates. **Response Quality** Use feedback patterns to identify which types of responses work well and which need refinement. **Performance Optimization** Track response times and token usage to identify inefficient prompts or model configurations. **Content Gaps** Spot topics or question types where your application consistently struggles, indicating areas for knowledge base expansion. ## Privacy Considerations Logs contain complete user conversations and may include sensitive information. Implement appropriate access controls and ensure compliance with applicable data protection regulations. Consider configuring shorter retention periods for applications handling sensitive data or implement log anonymization where appropriate. # Agent Source: https://docs.dify.ai/en/use-dify/nodes/agent Give LLMs autonomous control over tools for complex task execution The Agent node gives your LLM autonomous control over tools, enabling it to iteratively decide which tools to use and when to use them. Instead of pre-planning every step, the Agent reasons through problems dynamically, calling tools as needed to complete complex tasks. ![Agent Node Configuration Interface](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/node/1f4d803ff68394d507abd3bcc13ba0f3.png) ## Agent Strategies Agent strategies define how your Agent thinks and acts. Choose the approach that best matches your model's capabilities and task requirements. ![Available Agent Strategy Options](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/node/f14082c44462ac03955e41d66ffd4cca.png) Uses the LLM's native function calling capabilities to directly pass tool definitions through the tools parameter. The LLM decides when and how to call tools using its built-in mechanism. Best for models like GPT-4, Claude 3.5, and other models with robust function calling support. Uses structured prompts that guide the LLM through explicit reasoning steps. Follows a **Thought → Action → Observation** cycle for transparent decision-making. Works well with models that may not have native function calling or when you need explicit reasoning traces. Install additional strategies from **Marketplace → Agent Strategies** or contribute custom strategies to the [community repository](https://github.com/langgenius/dify-plugins). ![Function Calling Strategy Configuration](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/node/10505cd7c6f0b3ba10161abb88d9e36b.png) ## Configuration ### Model Selection Choose an LLM that supports your selected agent strategy. More capable models handle complex reasoning better but cost more per iteration. Ensure your model supports function calling if using that strategy. ### Tool Configuration Configure the tools your Agent can access. Each tool requires: **Authorization** - API keys and credentials for external services configured in your workspace **Description** - Clear explanation of what the tool does and when to use it (this guides the Agent's decision-making) **Parameters** - Required and optional inputs the tool accepts with proper validation ### Instructions and Context Define the Agent's role, goals, and context using natural language instructions. Use Jinja2 syntax to reference variables from upstream workflow nodes. **Query** specifies the user input or task the Agent should work on. This can be dynamic content from previous workflow nodes. ![Agent Configuration Parameters](https://assets-docs.dify.ai/dify-enterprise-mintlify/en/guides/workflow/node/54c8e4f0eaa7379bd8c1b5ac6305b326.png) ### Execution Controls **Maximum Iterations** sets a safety limit to prevent infinite loops. Configure based on task complexity - simple tasks need 3-5 iterations, while complex research might require 10-15. **Memory** controls how many previous messages the Agent remembers using TokenBufferMemory. Larger memory windows provide more context but increase token costs. This enables conversational continuity where users can reference previous actions. ### Tool Parameter Auto-Generation Tools can have parameters configured as **auto-generated** or **manual input**. Auto-generated parameters (`auto: false`) are automatically populated by the Agent, while manual input parameters require explicit values that become part of the tool's permanent configuration.

Edition	Login Methods
Community	Email and password only
Cloud	GitHub, Google, Email with verification code