LLM - Dify Docs

The LLM node invokes language models to process text, images, and documents. It sends prompts to your configured models and captures their responses, supporting structured outputs, context management, and multimodal inputs.

Configure at least one model provider in Integrations > Model Provider before using LLM nodes.

Model Selection and Parameters

Choose from any model provider you’ve configured. Different models excel at different tasks - GPT-4 and Claude 3.5 handle complex reasoning well but cost more, while GPT-3.5 Turbo balances capability with affordability. For local deployment, use Ollama, LocalAI, or Xinference.

Model Selection and Parameter Configuration

Model parameters control response generation. Temperature ranges from 0 (deterministic) to 1 (creative). Top P limits word choices by probability. Frequency Penalty reduces repetition. Presence Penalty encourages new topics. You can also use presets: Precise, Balanced, or Creative.

Prompt Configuration

Your interface adapts based on model type. Chat models use message roles (System for behavior, User for input, Assistant for examples), while completion models use simple text continuation. Reference workflow variables in prompts using double curly braces: {{variable_name}}. Variables are replaced with actual values before reaching the model.

System: You are a technical documentation expert.
User: {{user_input}}

Context Variables

Context variables inject external knowledge while preserving source attribution. This enables RAG applications where LLMs answer questions using your specific documents.

Connect a Knowledge Retrieval node’s output to your LLM node’s context input, then reference it:

Answer using only this context:
{{knowledge_retrieval.result}}

Question: {{user_question}}

When using context variables from knowledge retrieval, Dify automatically tracks citations so users see information sources.

Structured Outputs

Force models to return specific data formats like JSON for programmatic use. Configure through three methods:

Visual Editor
JSON Schema
AI Generation

User-friendly interface for simple structures. Add fields with names and types, mark required fields, set descriptions. The editor generates JSON Schema automatically.

Write schemas directly for complex structures with nested objects, arrays, and validation rules.

{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["positive", "negative", "neutral"]
    }
  },
  "required": ["sentiment"]
}

Models with native JSON support handle structured outputs reliably. For others, Dify includes the schema in prompts, but results may vary.

Memory and File Processing

Enable Memory to maintain context across multiple LLM calls within a Chatflow conversation. When enabled, previous interactions will be included in subsequent prompts as formatted user - assistant outputs. You can customize what goes into the user prompts by editing the USER template. Memory is node-specific and doesn’t persist between different conversations. For File Processing, add file variables to prompts for multimodal models. GPT-4V handles images, Claude processes PDFs directly, while other models might need preprocessing.

Vision Configuration

When processing images, you can control the detail level:

High detail - Better accuracy for complex images but uses more tokens
Low detail - Faster processing with fewer tokens for simple images

The default variable selector for vision is userinput.files which automatically picks up files from the User Input node.

Jinja2 Template Support

LLM prompts support Jinja2 templating for advanced variable handling. When you use Jinja2 mode (edition_type: "jinja2"), you can:

{% for item in search_results %}
{{ loop.index }}. {{ item.title }}: {{ item.content }}
{% endfor %}

Jinja2 variables are processed separately from regular variable substitution, allowing for loops, conditionals, and complex data transformations within prompts.

Streaming Output

LLM nodes support streaming output by default. Each text chunk is yielded as a RunStreamChunkEvent, enabling real-time response display. File outputs (images, documents) are processed and saved automatically during streaming.

Separate Reasoning from Responses

Some reasoning models wrap their thinking in <think>...</think> tags inside their response. By default, those tags are included in the text output, so the reasoning flows downstream together with the answer. Turn on the Enable reasoning tag separation toggle to split them: the text output keeps only the answer, and the thinking moves to a separate reasoning_content output variable. While the toggle is off, reasoning_content stays empty. In API calls, this toggle appears as the reasoning_format parameter. When the toggle is on, reasoning_format is separated, and streaming API clients receive the reasoning as dedicated reasoning_chunk events, outside the answer stream. For event details, see Send Chat Message and Run Workflow.

This setting affects only models that wrap their reasoning in <think> tags.

Error Handling

Configure retry behavior for failed LLM calls. Set maximum retry attempts, intervals between retries, and backoff multipliers. Define fallback strategies like default values, error routing, or alternative models when retries aren’t sufficient.

​Model Selection and Parameters

​Prompt Configuration

​Context Variables

​Structured Outputs

​Memory and File Processing

​Vision Configuration

​Jinja2 Template Support

​Streaming Output

​Separate Reasoning from Responses

​Error Handling

Model Selection and Parameters

Prompt Configuration

Context Variables

Structured Outputs

Memory and File Processing

Vision Configuration

Jinja2 Template Support

Streaming Output

Separate Reasoning from Responses

Error Handling