Knowledge Base and Document Maintenance
Last updated
Last updated
The knowledge base page is accessible only to the team owner, team administrators, and users with editor permissions.
On the Dify team homepage, click the "Knowledge Base" tab at the top, select the knowledge base you want to manage, then click Settings in the left navigation panel to make adjustments. You can modify the knowledge base name, description, visibility permissions, indexing mode, embedding model, and retrieval settings.
Knowledge Base Name: Used to distinguish among different knowledge bases.
Knowledge Description: Used to describe the information represented by the documents in the knowledge base.
Visibility Permissions: Defines access control for the knowledge base with three levels:
"Only Me": Restricts access to the knowledge base owner.
"All team members": Grants access to every member of the team.
"Partial team members": Allows selective access to specific team members.
Users without appropriate permissions cannot access the knowledge base. When granting access to team members (options 2 or 3), authorized users receive full permissions, including view, edit, and delete rights for the knowledge base content.
Indexing Mode: For detailed explanations, please refer to the documentation.
Embedding Model: Allows you to modify the embedding model for the knowledge base. Changing the embedding model will re-embed all documents in the knowledge base, and the original embeddings will be deleted.
Retrieval Settings: For detailed explanations, please refer to the documentation.
Dify Knowledge Base provides a complete set of standard APIs. Developers can use API calls to perform daily management and maintenance operations such as adding, deleting, modifying, and querying documents and chunks in the knowledge base. Please refer to the Knowledge Base API Documentation.
Each document uploaded to the knowledge base is stored in the form of text chunks. You can view the specific text content of each chunks in the chunks list.
The quality of document chunk significantly affects the Q&A performance of the knowledge base application. It is recommended to manually check the chunks quality before associating the knowledge base with the application.
Although automated chunk methods based on character length, identifiers, or NLP semantic chunk can significantly reduce the workload of large-scale text chunk, the quality of chunk is related to the text structure of different document formats and the semantic context. Manual checking and correction can effectively compensate for the shortcomings of machine chunk in semantic recognition.
When checking chunk quality, pay attention to the following situations:
Overly short text chunks, leading to semantic loss;
Overly long text chunks, leading to semantic noise affecting matching accuracy;
Obvious semantic truncation, which occurs when using maximum segment length limits, leading to forced semantic truncation and missing content during recall;
In the chunk list, click "Add Segment" to add one or multiple custom chunks to the document.
Add a chunk
When adding chunks in bulk, you need to first download the CSV format chunk upload template, edit all the chunk content in Excel according to the template format, save the CSV file, and then upload it.
In the chunk list, you can directly edit the content of the added chunks, including the text content and keywords of the chunks.
In addition to marking metadata information from different source documents, such as the title, URL, keywords, and description of web data, metadata will be used in the chunk recall process of the knowledge base as structured fields for recall filtering or displaying citation sources.
The metadata filtering and citation source functions are not yet supported in the current version.
In "Knowledge Base > Document List," click "Add File" to upload new documents or Notion pages to the created knowledge base.
A knowledge base (Knowledge) is a collection of documents (Documents). Documents can be uploaded by developers or operators, or synchronized from other data sources (usually corresponding to a file unit in the data source).
Disable: The dataset supports disabling documents or chunks that are temporarily not to be indexed. In the dataset document list, click the disable button to disable the document. You can also disable an entire document or a specific chunk in the document details. Disabled documents will not be indexed. Click enable on the disabled documents to cancel the disable status.
Archive: Old document data that is no longer in use can be archived if you do not want to delete it. Archived data can only be viewed or deleted, not edited. In the dataset document list, click the archive button to archive the document. You can also archive documents in the document details. Archived documents will not be indexed. Archived documents can also be unarchived.