Author: Steven Lynn, Dify Technical WriterIn the last experiment, we learned the basic usage of file uploads. However, when the text we need to read exceeds the LLM’s context window, we need to use a knowledge base.
What is context? The context window refers to the range of text that the LLM can “see” and “remember” when processing text. It determines how much previous text information the model can refer to when generating responses or continuing text. The larger the window, the more contextual information the model can utilize, and the generated content is usually more accurate and coherent.Previously, we learned about the concept of LLM hallucinations. In many cases, an LLM knowledge base allows the Agent to locate accurate information, thus accurately answering questions. It has applications in specific fields such as customer service and search tools. Traditional customer service bots are often based on keyword retrieval. When users input questions outside of the keywords, the bot cannot solve the problem. The knowledge base is designed to solve this problem, enabling semantic-level retrieval and reducing the burden on human agents. Before starting the experiment, remember that the core of the knowledge base is retrieval, not the LLM. The LLM enhances the output process, but the real need is still to generate answers.
TEXT EMBEDDING
label are supported. Ensure you have added at least one and have sufficient balance.
What is embedding? “Embedding” is a technique that converts discrete variables (such as words, sentences, or entire documents) into continuous vector representations. Simply put, when we process natural language into data, we convert text into vectors. This process is called embedding. Vectors of semantically similar texts will be close together, while vectors of semantically opposite texts will be far apart. LLMs use this data for training, predicting subsequent vectors, and thus generating text.
embed-english
is suitable for English documents, and embed-multilingual
is suitable for multilingual documents.
{{context}}
, answer {{user question}}
.
/
or {
to reference variables in the prompt writing area. In variables, variables starting with sys.
are system variables. Please refer to the help documentation for details.
In addition, you can enable LLM memory to make the user’s conversation experience more coherent.