Last updated
Last updated
After selecting the chunking mode, the next step is to define the indexing method for structured content.
Similar to the search engines use efficient indexing algorithms to match search results most relevant to user queries, the selected indexing method directly impacts the retrieval efficiency of the LLM and the accuracy of its responses to knowledge base content.
The knowledge base offers two indexing methods: High-Quality and Economical, each with different retrieval setting options:
Note: The original Q&A mode (Available only in the Community Edition) is now an optional feature under the High-Quality Indexing Method.
In High-Quality Mode, the Embedding model converts text chunks into numerical vectors, enabling efficient compression and storage of large volumes of textual information. This allows for more precise matching between user queries and text.
Once the text chunks are vectorized and stored in the database, an effective retrieval method is required to fetch the chunks that match user queries. The High-Quality indexing method offers three retrieval settings: vector retrieval, full-text retrieval, and hybrid retrieval. For more details on retrieval settings, please check .
After selecting High Quality mode, the indexing method for the knowledge base cannot be downgraded to Economical mode later. If you need to switch, it is recommended to create a new knowledge base and select the desired indexing method.
When this mode is enabled, the system segments the uploaded text and automatically generates Q&A pairs for each segment after summarizing its content.
Compared with the common Q to P strategy (user questions matched with text paragraphs), the Q&A mode uses a Q to Q strategy (questions matched with questions).
This approach is particularly effective because the text in FAQ documents is often written in natural language with complete grammatical structures.
The Q to Q strategy makes the matching between questions and answers clearer and better supports scenarios with high-frequency or highly similar questions.
When a user asks a question, the system identifies the most similar question and returns the corresponding chunk as the answer. This approach is more precise, as it directly matches the user’s query, helping them retrieve the exact information they need.
Once the knowledge base receives a user query, it searches existing documents according to preset retrieval methods and extracts highly relevant content chunks. These chunks provide essential contextual for the LLM, ultimately affecting the accuracy and credibility of its answers.
Common retrieval methods include:
Semantic Retrieval based on vector similarity—where text chunks and queries are converted into vectors and matched via similarity scoring.
Keyword Matching using an inverted index (a standard search engine technique). Both methods are supported in Dify’s knowledge base.
Both retrieval methods are supported in Dify’s knowledge base. The specific retrieval options available depend on the chosen indexing method.
In the High-Quality Indexing Method, Dify offers three retrieval settings: Vector Search, Full-Text Search, and Hybrid Search.
Definition: Vectorize the user’s question to generate a query vector, then compare it with the corresponding text vectors in the knowledge base to find the nearest chunks.
Vector Search Settings:
Rerank Model: Disabled by default. When enabled, a third-party Rerank model will sort the text chunks returned by Vector Search to optimize results. This helps the LLM access more precise information and improve output quality. Before enabling this option, go to Settings → Model Providers and configure the Rerank model’s API key.
Enabling this feature will consume tokens from the Rerank model. For more details, refer to the associated model’s pricing page.
TopK: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is 3, and higher numbers will recall more text chunks.
Score Threshold: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is 0.5. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved.
The TopK and Score configurations are only effective during the Rerank phase. Therefore, to apply either of these settings, it is necessary to add and enable a Rerank model.
Definition: Indexing all terms in the document, allowing users to query any terms and return text fragments containing those terms.
Rerank Model: Disabled by default. When enabled, a third-party Rerank model will sort the text chunks returned by Full-Text Search to optimize results. This helps the LLM access more precise information and improve output quality. Before enabling this option, go to Settings → Model Providers and configure the Rerank model’s API key.
Enabling this feature will consume tokens from the Rerank model. For more details, refer to the associated model’s pricing page.
TopK: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is 3, and higher numbers will recall more text chunks.
Score Threshold: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is 0.5. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved.
The TopK and Score configurations are only effective during the Rerank phase. Therefore, to apply either of these settings, it is necessary to add and enable a Rerank model.
Definition: This process combines full-text search and vector search, performing both simultaneously. It includes a reordering step to select the best-matching results from both search outcomes based on the user’s query.
In this mode, you can specify "Weight settings" without needing to configure the Rerank model API, or enable Rerank model for retrieval.
Weight Settings
This feature enables users to set custom weights for semantic priority and keyword priority. Keyword search refers to performing a full-text search within the knowledge base, while semantic search involves vector search within the knowledge base.
Semantic Value of 1
This activates only the semantic search mode. Utilizing embedding models, even if the exact terms from the query do not appear in the knowledge base, the search can delve deeper by calculating vector distances, thus returning relevant content. Additionally, when dealing with multilingual content, semantic search can capture meaning across different languages, providing more accurate cross-language search results.
Keyword Value of 1
This activates only the keyword search mode. It performs a full match against the input text in the knowledge base, suitable for scenarios where the user knows the exact information or terminology. This approach consumes fewer computational resources and is ideal for quick searches within a large document knowledge base.
Custom Keyword and Semantic Weights
In addition to enabling only semantic search or keyword search, we provide flexible custom weight settings. You can continuously adjust the weights of the two methods to identify the optimal weight ratio that suits your business scenario.
Rerank Model
Disabled by default. When enabled, a third-party Rerank model will sort the text chunks returned by Hybrid Search to optimize results. This helps the LLM access more precise information and improve output quality. Before enabling this option, go to Settings → Model Providers and configure the Rerank model’s API key.
Enabling this feature will consume tokens from the Rerank model. For more details, refer to the associated model’s pricing page.
The "Weight Settings" and "Rerank Model" settings support the following options:
TopK: Determines how many text chunks, deemed most similar to the user’s query, are retrieved. It also automatically adjusts the number of chunks based on the chosen model’s context window. The default value is 3, and higher numbers will recall more text chunks.
Score Threshold: Sets the minimum similarity score required for a chunk to be retrieved. Only chunks exceeding this score are retrieved. The default value is 0.5. Higher thresholds demand greater similarity and thus result in fewer chunks being retrieved.
After specifying the retrieval settings, you can refer to the following documentation to review how keywords match with content chunks in different scenarios.
Using 10 keywords per chunk for retrieval, no tokens are consumed at the expense of reduced retrieval accuracy. For the retrieved blocks, only the inverted index method is provided to select the most relevant blocks. For more details, please refer to .