Termbase
A
Agent
An autonomous AI system capable of making decisions and executing tasks based on environmental information. In the Dify platform, agents combine the comprehension capabilities of large language models with the ability to interact with external tools, automatically completing a series of operations ranging from simple to complex, such as searching for information, calling APIs, or generating content.
Agentic Workflow
A task orchestration method that allows AI systems to autonomously solve complex problems through multiple steps. For example, an agentic workflow can first understand a user’s question, then query a knowledge base, call computational tools, and finally integrate information to generate a complete answer, all without human intervention.
Automatic Speech Recognition (ASR)
Technology that converts human speech into text and serves as the foundation for voice interaction applications. This technology allows users to interact with AI systems by speaking rather than typing, and is widely used in scenarios such as voice assistants, meeting transcription, and accessibility services.
B
Backbone of Thought (BoT)
A structured thinking framework that provides the main structure for reasoning in large language models. It helps models maintain a clear thinking path when processing complex problems, similar to the outline of an academic paper or the skeleton of a decision tree.
C
Chunking
A processing technique that splits long text into smaller content blocks, enabling retrieval systems to find relevant information more precisely. A good chunking strategy considers both the semantic integrity of the content and the context window limitations of language models, thereby improving the quality of retrieval and generation.
Citation and Attribution
Features that allow AI systems to clearly indicate the sources of information, increasing the credibility and transparency of responses. When the system generates answers based on knowledge base content, it can automatically annotate the referenced document name, page number, or URL, enabling users to understand the origin of the information.
Chain of Thought (CoT)
A prompting technique that guides large language models to display their step-by-step thinking process. For example, when solving a math problem, the model first lists the known conditions, then follows reasoning steps to solve it one by one, and finally reaches a conclusion. The entire process resembles human thinking.
D
Domain-Specific Language (DSL)
A programming language or configuration format designed for a specific application domain. Dify DSL is an application engineering file standard based on YAML format, used to define various configurations of AI applications, including model parameters, prompt design, and workflow orchestration, allowing non-professional developers to build complex AI applications.
E
Extract, Transform, Load (ETL)
A classic data processing workflow: extracting raw data, transforming it into a format suitable for analysis, and then loading it into the target system. In AI document processing, ETL may include extracting text from PDFs, cleaning formats, splitting content, calculating embedding vectors, and finally loading into a vector database, preparing for RAG systems.
F
Frequency Penalty
A text generation control parameter that increases output diversity by reducing the probability of generating frequently occurring vocabulary. The higher the value, the more the model tends to use diverse vocabulary and expressions; at a value of 0, the model will not specifically avoid reusing the same vocabulary.
Function Calling
The capability of large language models to recognize when to call specific functions and provide the required parameters. For example, when a user asks about the weather, the model can automatically call a weather API, construct the correct parameter format (city, date), and then generate a response based on the API’s returned results.
G
General Chunking Pattern
A simple text splitting strategy that divides documents into mutually independent content blocks. This pattern is suitable for documents with clear structures and relatively independent paragraphs, such as product manuals or encyclopedia entries, where each chunk can be understood independently without heavily relying on context.
Graph of Thought (GoT)
A method of representing the thinking process as a network structure, capturing complex relationships between concepts. Unlike the linear Chain of Thought, the Graph of Thought can express branching, cyclical, and multi-path thinking patterns, suitable for dealing with complex problems that have multiple interrelated factors.
H
Hybrid Search
A search method that combines the advantages of keyword matching and semantic search to provide more comprehensive retrieval results. For example, when searching for “apple nutritional components,” hybrid search can find both documents containing the keywords “apple” and “nutrition,” as well as content discussing related semantic concepts like “fruit health value,” selecting the optimal results through weight adjustment or reranking.
I
Inverted Index
A core data structure of search engines that records which documents each word appears in. Unlike traditional indexes that find content from documents, inverted indexes find documents from vocabulary, greatly improving full-text retrieval speed. For example, the index entry for the term “artificial intelligence” would list all document IDs and positions containing this term.
K
Keyword Search
A search method based on exact matching that finds documents containing specific vocabulary. This method is computationally efficient and suitable for scenarios where users clearly know the terms they want to find, such as product models, proper nouns, or specific commands, but may miss content expressed using synonyms or related concepts.
Knowledge Base
A database that stores structured information in AI applications, providing a source of professional knowledge for models. In the Dify platform, knowledge bases can contain various documents (PDF, Word, web pages, etc.), which are processed for AI retrieval and used to generate accurate, well-founded answers, particularly suitable for building domain expert applications.
Knowledge Retrieval
The process of finding information from a knowledge base that is most relevant to a user’s question, and is a key component of RAG systems. Effective knowledge retrieval not only finds relevant content but also controls the amount of information returned, avoiding irrelevant content that could interfere with the model, while providing sufficient background to ensure accurate and complete answers.
L
Large Language Model (LLM)
An AI model trained on massive amounts of text that can understand and generate human language. Modern LLMs (such as the GPT series, Claude, etc.) can write articles, answer questions, write code, and even conduct reasoning. They are the core engines of various AI applications, especially suitable for scenarios requiring language understanding and generation.
Local Model Inference
The process of running AI models on a user’s own device rather than relying on cloud services. This approach provides better privacy protection (data does not leave the local environment) and lower latency (no network transmission required), making it suitable for processing sensitive data or scenarios requiring offline work, though it is typically limited by the computational capacity of local devices.
M
Model-as-a-Service (MaaS)
A cloud service model where providers offer access to pre-trained models through APIs. Users don’t need to worry about training, deploying, or maintaining models; they simply call the API and pay for usage, significantly lowering the development threshold and infrastructure costs of AI applications. It’s suitable for quickly validating ideas or building prototypes.
Max_tokens
A parameter that controls the maximum number of characters the model generates in a single response. One token is approximately equivalent to 4 characters or 3/4 of an English word. Setting a reasonable maximum token count can control the length of the answer, avoid overly verbose output, and ensure complete expression of necessary information. For example, a brief summary might be set to 200 tokens, while a detailed report might require 2000 tokens.
Memory
The ability of AI systems to save and use historical interaction information, keeping multi-turn conversations coherent. Effective memory mechanisms enable AI to understand contextual references, remember user preferences, and track long-term goals, thereby providing personalized and continuous user experiences, avoiding repeatedly asking for information that has already been provided.
Metadata Filtering
A technique that utilizes document attribute information (such as title, author, date, classification tags) for content filtering. For example, users can restrict retrieval to technical documents within a specific date range, or only query reports from a specific department, thereby narrowing the scope before retrieval, improving search efficiency and result relevance.
Multimodal Model
A model capable of processing multiple types of input data, such as text, images, audio, etc. These models break the single-perception limitations of traditional AI and can understand image content, analyze video scenes, recognize voice emotions, creating possibilities for more comprehensive information understanding, suitable for complex application scenarios requiring cross-media understanding.
Multi-tool-call
The ability of a model to call multiple different tools in a single response. For example, when processing a request like “Compare tomorrow’s weather in Beijing and Shanghai and recommend suitable clothing,” the model can simultaneously call weather APIs for both cities, then provide reasonable suggestions based on the returned results, improving the efficiency of handling complex tasks.
Multi-path Retrieval
A strategy for obtaining information in parallel through multiple retrieval methods. For example, the system can simultaneously use keyword search, semantic matching, and knowledge graph queries, then merge and filter the results, improving the coverage and accuracy of information retrieval, particularly suitable for handling complex or ambiguous user queries.
P
Parent-Child Chunking
An advanced text splitting strategy that creates two levels of content blocks: parent blocks retain the complete context, while child blocks provide precise matching points. The system first uses child blocks to determine the location of relevant content, then retrieves the corresponding parent blocks to provide complete background, balancing retrieval precision and context completeness, suitable for processing complex documents such as research papers or technical manuals.
Presence Penalty
A parameter setting that prevents language models from repeating content. It encourages models to explore new expressions by reducing the probability of generating vocabulary that has already appeared. The higher the parameter value, the less likely the model is to repeat previously generated content, helping to avoid common circular arguments or repetitive problem statements in AI responses.
Predefined Model
A ready-made model trained and provided by AI vendors that users can directly call without training themselves. These closed-source models (such as GPT-4, Claude, etc.) are typically trained and optimized on a large scale, powerful and easy to use, suitable for rapid application development or teams lacking independent training resources.
Prompt
Input text that guides AI models to generate specific responses. Well-designed prompts can significantly improve output quality, including elements such as clear instructions, providing examples, setting format requirements, etc. For example, different prompts can guide the same model to generate academic articles, creative stories, or technical analysis, making them one of the most critical factors affecting AI output.
Q
Q&A Mode
A special indexing strategy that automatically generates question-answer pairs for document content, implementing “question-to-question” matching. When a user asks a question, the system looks for semantically similar pre-generated questions and returns the corresponding answers. This mode is particularly suitable for FAQ content or structured knowledge points, providing a more precise question-answering experience.
R
Retrieval-Augmented Generation (RAG)
A technical architecture that combines external knowledge retrieval and language generation. The system first retrieves information from a knowledge base related to the user’s question, then provides this information as context to the language model, generating well-founded, accurate answers. RAG overcomes the limited knowledge and hallucination problems of language models, particularly suitable for application scenarios requiring the latest or specialized knowledge.
Reasoning and Acting (ReAct)
An AI agent framework that enables models to alternate between thinking and executing operations. In the problem-solving process, the model first analyzes the current state, formulates a plan, then calls appropriate tools (such as search engines, calculators), and thinks about the next step based on the tool’s returned results, forming a thinking-action-thinking cycle until the problem is solved. It is suitable for complex tasks requiring multiple steps and external tools.
ReRank
A technique for secondary sorting of preliminary retrieval results to improve the relevance of final results. For example, the system might first quickly retrieve a large number of candidate content through efficient algorithms, then use more complex but precise models to reevaluate and sort these results, placing the most relevant content first, balancing retrieval efficiency and result quality.
Rerank Model
A model specifically designed to evaluate the relevance of retrieval results to queries and reorder them. Unlike preliminary retrieval, these models typically use more complex algorithms, consider more semantic factors, and can more accurately determine how well content matches user intent. For example, models like Cohere Rerank and BGE Reranker can significantly improve the quality of search and recommendation system results.
Response_format
A specification of the structure type for model output, such as plain text, JSON, or HTML. Setting a specific response format can make AI output easier to process by programs or integrate into other systems. For example, requiring the model to answer in JSON format ensures the output has a consistent structure, facilitating direct parsing and display by frontend applications.
Reverse Calling
A bidirectional mechanism for plugins to interact with platforms, allowing plugins to actively call platform functionality. In Dify, this means third-party plugins can not only be called by AI but can also use Dify’s core features in return, such as triggering workflows or calling other plugins, greatly enhancing the system’s extensibility and flexibility.
Retrieval Test
A functionality for verifying the effectiveness of knowledge base retrieval, allowing developers to simulate user queries and evaluate system return results. This testing helps developers understand the boundaries of the system’s retrieval capabilities, discover and fix potential issues such as missed detection, false detection, or poor relevance, and is an indispensable tool for optimizing RAG systems.
S
Score Threshold
A similarity threshold for filtering retrieval results, where only content with scores exceeding the set value is returned. Setting a reasonable threshold can avoid irrelevant information interfering with model generation, improving the accuracy of answers. For example, if the threshold is set to 0.8 (out of 1.0), only highly relevant content will be adopted, but it may result in incomplete information; lowering the threshold will include more content but may introduce noise.
Semantic Search
A retrieval method based on understanding and matching text meaning rather than simple keyword matching. It uses vector embedding technology to convert text into mathematical representations, then calculates the semantic similarity between queries and documents. This method can find content that is expressed differently but has similar meanings, understand synonyms and contextual relationships, and even support cross-language retrieval, particularly suitable for complex or natural language form queries.
Session Variables
A mechanism for storing multi-turn dialogue context information, allowing AI to maintain coherent interactions. For example, the system can remember user preferences (such as “concise answers”), identity information, or interaction history status, avoiding repeated inquiries and providing personalized experiences. In Dify, developers can define and manage these variables to build applications that truly remember users.
Speech-to-Text (STT)
Technology that converts users’ voice input into text data. This technology allows users to interact with AI systems by speaking rather than typing, improving the naturalness and convenience of interaction, particularly suitable for mobile devices, driving scenarios, or accessibility applications, and is the foundation for voice assistants and real-time transcription applications.
Stream-tool-call
A real-time processing mode that allows AI systems to call external tools while generating responses, without waiting until the complete answer is generated before processing. This approach greatly improves the response speed for complex tasks, making the user experience more smooth, suitable for interactive scenarios requiring multiple tool calls.
Streaming Response
A real-time response mechanism where AI systems return content to users as it is generated, rather than waiting until all content is generated before displaying it at once. This approach significantly improves the user waiting experience, especially for long answers, allowing users to immediately see partial content and begin reading, providing a more natural interaction experience similar to immediate feedback in human conversations.
T
Temperature
A parameter controlling the randomness of language model output, typically between 0-1. Lower temperature (close to 0) makes model output more deterministic and conservative, favoring high-probability vocabulary, suitable for factual answers; higher temperature (close to 1) makes output more diverse and creative, suitable for creative writing. For example, weather forecasts might use a low temperature of 0.1, while story creation might use a high temperature of 0.8.
Text Embedding
The process of converting text into numerical vectors, enabling AI systems to understand and process language. These vectors capture the semantic features of vocabulary and sentences, allowing computers to measure similarity between texts, cluster related content, or retrieve matching information. Different embedding models (such as OpenAI’s text-embedding-ada-002 or Cohere’s embed-multilingual) are optimized for different languages and application scenarios.
Tool Calling
The ability of AI systems to identify and use external functionality, greatly expanding the model’s capability boundaries. For example, language models themselves cannot access real-time data, but by calling a weather API, they can provide current weather information; by calling database query tools, they can retrieve the latest product inventory; by calling calculators, they can perform complex calculations, enabling AI to solve problems beyond their training data range.
TopK
A parameter controlling the number of retrieval results returned, specifying to retain the top K text fragments with the highest similarity. Setting an appropriate TopK value is crucial for RAG system performance: too small a value may lose key information, while too large a value may introduce noise and increase the language model’s processing burden. For example, simple questions might only need TopK=3, while complex questions might require TopK=10 to obtain sufficient background.
TopP (Nucleus Sampling)
A text generation control method that selects the next word only from the most likely vocabulary with cumulative probability reaching threshold P. Unlike fixed selection of the highest-probability word or completely random selection, TopP balances determinism and creativity. For example, TopP=0.9 means the model only considers vocabulary accounting for 90% of the probability and ignores low-probability options, avoiding both completely predictable output and excessively random content.
Tree of Thought (ToT)
A thinking method for exploring multiple reasoning paths, allowing models to analyze problems from different perspectives. Similar to human “if…then…” thinking patterns, Tree of Thought lets models generate multiple possible thinking branches, evaluate the feasibility of each branch, and then select the optimal path to continue, particularly suitable for solving complex problems requiring trial and error or consideration of multiple possibilities.
Text-to-Speech (TTS)
Technology that converts written text into natural speech, enabling AI systems to communicate with users through voice. Modern TTS systems can generate natural speech close to human quality, supporting multiple languages, tones, and emotional expressions, widely used in audiobooks, navigation systems, voice assistants, and accessibility services, providing more natural interaction experiences for different scenarios and users.
V
Vector Database
A database system specialized in storing and searching vector embeddings, serving as the infrastructure for efficient semantic retrieval. Unlike traditional databases, vector databases are optimized for high-dimensional vector similarity search, capable of quickly finding semantically similar content from millions of documents. Common vector databases include Pinecone, Milvus, Qdrant, etc., which play key roles in RAG systems, recommendation engines, and content analysis.
Vector Retrieval
A search method based on text vector embedding similarity, forming the technical core of semantic search. The system first converts user queries into vectors, then finds the most similar content in pre-calculated document vectors. This method can capture deep semantic relationships, find content expressed differently but with similar meanings, overcoming the limitations of keyword search, particularly suitable for processing natural language queries and conceptual problems.
Vision
The functionality of multimodal LLMs to understand and process images, allowing models to analyze user-uploaded pictures and generate responses combining text. For example, users can upload product photos to inquire about usage methods, upload menu photos requesting translation, or upload charts asking for data trend analysis. This capability greatly expands AI application scenarios, making interaction more intuitive and diverse.
W
Workflow
A task orchestration method that breaks down complex AI applications into multiple independent nodes executed in a specific order. In the Dify platform, developers can visually design workflows, combining multiple processing steps (such as user input processing, knowledge retrieval, multi-model collaboration, conditional branching) to build AI applications capable of handling complex business logic, making application development both flexible and intuitive.