Retrieval Test / Citation and Attributions
Last updated
Last updated
Dify’s knowledge base provides a text retrieval testing feature, allowing you to simulate user queries and retrieve knowledge base content blocks. The retrieval chunks are sorted by score and then sent to the LLM. Generally, the higher the match between the question and the content chunks, the more closely the LLM’s answer will align with the source document, leading to better “training results.”
You can test with different retrieval methods and parameter configurations to evaluate the quality and effectiveness of the retrieved text chunks. Different chunking modes use different retrieved testing methods.
General
Enter common user questions into the Source Text field and click Test to see the Retrieved Chunks results on the right.
In General Mode, each text chunk stands independently. The score shown in the top-right corner of a chunk represents how closely it matches the query keywords. A higher score indicates a stronger alignment between the chunk and the keywords.
Tap a content chunk to see the details of the referenced content. Each chunk shows its source document information at the bottom, letting you verify whether the text chunk is appropriate.
In Records, you can check the past query records. If the knowledge base is linked to an application, any queries triggered within the application will also appear here.
Click the icon in the upper-right corner of the Source Text field to change the current knowledge base’s retrieval method and related parameters. These changes only take effect during the current retrieval test session for debugging, you can compare the retrieval performance of different retrieval settings.
If you want to permanently modify the retrieval method for the knowledge base, go to “Knowledge Base Settings” > “Retrieval Settings” to make changes.
Suggested Steps for Retrieval Testing:
Design and organize test cases/test question sets covering common user questions.
Choose an appropriate retrieval strategy: vector search/full-text search/hybrid search. For the pros and cons of different retrieval methods, please refer to the extended reading Retrieval-Augmented Generation (RAG).
Debug the number of retrieval segments (TopK) and the recall score threshold (Score). Choose appropriate parameter combinations based on the application scenario, including the quality of the documents themselves.
How to Configure TopK Value and Retrieval Threshold (Score)
TopK represents the maximum number of retrieval chunks when sorted in descending order of similarity scores. A smaller TopK value will recall fewer segments, which may result in incomplete recall of relevant texts; a larger TopK value will recall more segments, which may result in recalling segments with lower semantic relevance, reducing the quality of LLM responses.
The retrieval threshold (Score) represents the minimum similarity score allowed for recall segments. A smaller recall score will retrieval more segments, which may result in recalling less relevant segments; a larger recall score threshold will recall fewer segments, and if too large, may result in missing relevant segments.
When testing the knowledge base effect within the application, you can go to Workspace -- Add Feature -- Citation and Attribution to enable the citation attribution feature.
After enabling the feature, when the large language model responds to a question by citing content from the knowledge base, you can view specific citation paragraph information below the response content, including original segment text, segment number, matching degree, etc. Clicking Link to Knowledge above the cited segment allows quick access to the segment list in the knowledge base, facilitating developers in debugging and editing.
On the left side of the knowledge base, you can see all linked Apps. Hover over the circular icon to view the list of all linked apps. Click the jump button on the right to quickly browser them.