> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Retrieve Chunks from a Knowledge Base / Test Retrieval

> Performs a search query against a knowledge base to retrieve the most relevant chunks. This endpoint can be used for both production retrieval and test retrieval.


## OpenAPI

````yaml /en/api-reference/openapi_knowledge.json post /datasets/{dataset_id}/retrieve
openapi: 3.0.1
info:
  title: Knowledge API
  description: >-
    API for managing knowledge bases, documents, chunks, metadata, and tags,
    including creation, retrieval, and configuration. **Note:** A single
    Knowledge Base API key has permission to operate on all visible knowledge
    bases under the same account. Please pay attention to data security.
  version: 1.0.0
servers:
  - url: https://{api_base_url}
    description: >-
      Base URL of the Knowledge API. For self-hosted deployments, replace it
      with your own API base URL.
    variables:
      api_base_url:
        default: api.dify.ai/v1
        description: Host and path of the API base URL, without the `https://` prefix.
security:
  - ApiKeyAuth: []
tags:
  - name: Knowledge Bases
    description: >-
      Operations for managing knowledge bases, including creation,
      configuration, and retrieval.
  - name: Documents
    description: >-
      Operations for creating, updating, and managing documents within a
      knowledge base.
  - name: Chunks
    description: Operations for managing document chunks and child chunks.
  - name: Metadata
    description: >-
      Operations for managing knowledge base metadata fields and document
      metadata values.
  - name: Tags
    description: Operations for managing knowledge base tags and tag bindings.
  - name: Models
    description: Operations for retrieving available models.
  - name: Knowledge Pipeline
    description: >-
      Operations for managing and running knowledge pipelines, including
      datasource plugins and pipeline execution.
paths:
  /datasets/{dataset_id}/retrieve:
    post:
      tags:
        - Knowledge Bases
      summary: Retrieve Chunks from a Knowledge Base / Test Retrieval
      description: >-
        Performs a search query against a knowledge base to retrieve the most
        relevant chunks. This endpoint can be used for both production retrieval
        and test retrieval.
      operationId: retrieveSegments
      parameters:
        - name: dataset_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
          description: Knowledge base ID.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - query
              properties:
                query:
                  type: string
                  maxLength: 250
                  description: Search query text.
                retrieval_model:
                  $ref: '#/components/schemas/RetrievalModel'
                  description: >-
                    Retrieval model configuration. Controls how chunks are
                    searched and ranked when querying this knowledge base.
                external_retrieval_model:
                  type: object
                  description: Retrieval settings for external knowledge bases.
                  properties:
                    top_k:
                      type: integer
                      description: Maximum number of results to return.
                    score_threshold:
                      type: number
                      description: >-
                        Minimum similarity score threshold for filtering
                        results.
                    score_threshold_enabled:
                      type: boolean
                      description: Whether score threshold filtering is enabled.
                attachment_ids:
                  type: array
                  items:
                    type: string
                  nullable: true
                  description: List of attachment IDs to include in the retrieval context.
      responses:
        '200':
          description: Retrieval results.
          content:
            application/json:
              schema:
                type: object
                properties:
                  query:
                    type: object
                    description: The original query object.
                    properties:
                      content:
                        type: string
                        description: The query text.
                  records:
                    type: array
                    description: List of matched retrieval records.
                    items:
                      type: object
                      properties:
                        segment:
                          type: object
                          description: Matched chunk from the knowledge base.
                          properties:
                            id:
                              type: string
                              description: Unique identifier of the chunk.
                            position:
                              type: integer
                              description: Position of the chunk within the document.
                            document_id:
                              type: string
                              description: ID of the document this chunk belongs to.
                            content:
                              type: string
                              description: Text content of the chunk.
                            sign_content:
                              type: string
                              description: Signed content hash for integrity verification.
                            answer:
                              type: string
                              description: Answer content, used in Q&A mode documents.
                            word_count:
                              type: integer
                              description: Word count of the chunk content.
                            tokens:
                              type: integer
                              description: Token count of the chunk content.
                            keywords:
                              type: array
                              description: >-
                                Keywords associated with this chunk for
                                keyword-based retrieval.
                              items:
                                type: string
                            index_node_id:
                              type: string
                              description: ID of the index node in the vector store.
                            index_node_hash:
                              type: string
                              description: >-
                                Hash of the indexed content, used to detect
                                changes.
                            hit_count:
                              type: integer
                              description: >-
                                Number of times this chunk has been matched in
                                retrieval queries.
                            enabled:
                              type: boolean
                              description: Whether the chunk is enabled for retrieval.
                            disabled_at:
                              type: number
                              nullable: true
                              description: >-
                                Timestamp when the chunk was disabled. `null` if
                                enabled.
                            disabled_by:
                              type: string
                              nullable: true
                              description: >-
                                ID of the user who disabled the chunk. `null` if
                                enabled.
                            status:
                              type: string
                              description: Indexing status of the chunk.
                            created_by:
                              type: string
                              description: ID of the user who created the chunk.
                            created_at:
                              type: number
                              description: Creation timestamp (Unix epoch in seconds).
                            indexing_at:
                              type: number
                              nullable: true
                              description: >-
                                Timestamp when indexing started. `null` if not
                                yet started.
                            completed_at:
                              type: number
                              nullable: true
                              description: >-
                                Timestamp when indexing completed. `null` if not
                                yet completed.
                            error:
                              type: string
                              nullable: true
                              description: >-
                                Error message if indexing failed. `null` when no
                                error.
                            stopped_at:
                              type: number
                              nullable: true
                              description: >-
                                Timestamp when indexing was stopped. `null` if
                                not stopped.
                            document:
                              type: object
                              description: >-
                                Parent document information for the matched
                                chunk.
                              properties:
                                id:
                                  type: string
                                  description: Unique identifier of the document.
                                data_source_type:
                                  type: string
                                  description: How the document was created.
                                name:
                                  type: string
                                  description: Document name.
                                doc_type:
                                  type: string
                                  nullable: true
                                  description: >-
                                    Document type classification. `null` if not
                                    set.
                                doc_metadata:
                                  type: object
                                  nullable: true
                                  description: >-
                                    Metadata values for the document. `null` if
                                    no metadata is configured.
                        child_chunks:
                          type: array
                          description: >-
                            Matched child chunks within the chunk, if using
                            hierarchical indexing.
                          items:
                            type: object
                            properties:
                              id:
                                type: string
                                description: Unique identifier of the child chunk.
                              content:
                                type: string
                                description: Text content of the child chunk.
                              position:
                                type: integer
                                description: >-
                                  Position of the child chunk within the parent
                                  chunk.
                              score:
                                type: number
                                description: Similarity score of the child chunk.
                        score:
                          type: number
                          description: Similarity score.
                        tsne_position:
                          type: object
                          nullable: true
                          description: t-SNE visualization position.
                        files:
                          type: array
                          description: Files attached to this chunk.
                          items:
                            type: object
                            properties:
                              id:
                                type: string
                                description: Attachment file identifier.
                              name:
                                type: string
                                description: Original file name.
                              size:
                                type: integer
                                description: File size in bytes.
                              extension:
                                type: string
                                description: File extension.
                              mime_type:
                                type: string
                                description: MIME type of the file.
                              source_url:
                                type: string
                                description: URL to access the attachment.
                        summary:
                          type: string
                          nullable: true
                          description: Summary content if retrieved via summary index.
              examples:
                success:
                  summary: Response Example
                  value:
                    query:
                      content: What is Dify?
                    records:
                      - segment:
                          id: f3d1c7be-9f3a-40d8-8eb8-3a1ef9c3f2c1
                          position: 1
                          document_id: a8e0e5b5-78c6-4130-a5ce-25feb0e0b4ac
                          content: Dify is an open-source LLM app development platform.
                          sign_content: ''
                          answer: ''
                          word_count: 9
                          tokens: 12
                          keywords:
                            - dify
                            - platform
                            - llm
                          index_node_id: a1b2c3d4-e5f6-7890-abcd-000000000001
                          index_node_hash: abc123def456
                          hit_count: 1
                          enabled: true
                          disabled_at: null
                          disabled_by: null
                          status: completed
                          created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                          created_at: 1741267200
                          indexing_at: 1741267200
                          completed_at: 1741267200
                          error: null
                          stopped_at: null
                          document:
                            id: a8e0e5b5-78c6-4130-a5ce-25feb0e0b4ac
                            data_source_type: upload_file
                            name: guide.txt
                            doc_type: null
                            doc_metadata: null
                        child_chunks: []
                        score: 0.92
                        tsne_position: null
                        files: []
                        summary: null
        '400':
          description: >-
            - `dataset_not_initialized` : The dataset is still being initialized
            or indexing. Please wait a moment.

            - `provider_not_initialize` : No valid model provider credentials
            found. Please go to Settings -> Model Provider to complete your
            provider credentials.

            - `provider_quota_exceeded` : Your quota for Dify Hosted OpenAI has
            been exhausted. Please go to Settings -> Model Provider to complete
            your own provider credentials.

            - `model_currently_not_support` : Dify Hosted OpenAI trial currently
            not support the GPT-4 model.

            - `completion_request_error` : Completion request failed.

            - `invalid_param` : Invalid parameter value.
          content:
            application/json:
              examples:
                dataset_not_initialized:
                  summary: dataset_not_initialized
                  value:
                    status: 400
                    code: dataset_not_initialized
                    message: >-
                      The dataset is still being initialized or indexing. Please
                      wait a moment.
                provider_not_initialize:
                  summary: provider_not_initialize
                  value:
                    status: 400
                    code: provider_not_initialize
                    message: >-
                      No valid model provider credentials found. Please go to
                      Settings -> Model Provider to complete your provider
                      credentials.
                provider_quota_exceeded:
                  summary: provider_quota_exceeded
                  value:
                    status: 400
                    code: provider_quota_exceeded
                    message: >-
                      Your quota for Dify Hosted OpenAI has been exhausted.
                      Please go to Settings -> Model Provider to complete your
                      own provider credentials.
                model_currently_not_support:
                  summary: model_currently_not_support
                  value:
                    status: 400
                    code: model_currently_not_support
                    message: >-
                      Dify Hosted OpenAI trial currently not support the GPT-4
                      model.
                completion_request_error:
                  summary: completion_request_error
                  value:
                    status: 400
                    code: completion_request_error
                    message: Completion request failed.
                invalid_param:
                  summary: invalid_param
                  value:
                    status: 400
                    code: invalid_param
                    message: Invalid parameter value.
        '403':
          description: '`forbidden` : Insufficient permissions.'
          content:
            application/json:
              examples:
                forbidden:
                  summary: forbidden
                  value:
                    status: 403
                    code: forbidden
                    message: Insufficient permissions.
        '404':
          description: '`not_found` : Knowledge base not found.'
          content:
            application/json:
              examples:
                not_found:
                  summary: not_found
                  value:
                    status: 404
                    code: not_found
                    message: Dataset not found.
        '500':
          description: >-
            `internal_server_error` : An internal error occurred during
            retrieval.
          content:
            application/json:
              examples:
                internal_server_error:
                  summary: internal_server_error
                  value:
                    status: 500
                    code: internal_server_error
                    message: An internal error occurred.
components:
  schemas:
    RetrievalModel:
      type: object
      required:
        - search_method
        - reranking_enable
        - top_k
        - score_threshold_enabled
      properties:
        search_method:
          type: string
          description: Search method used for retrieval.
          enum:
            - keyword_search
            - semantic_search
            - full_text_search
            - hybrid_search
        reranking_enable:
          type: boolean
          description: Whether reranking is enabled.
        reranking_model:
          type: object
          description: Reranking model configuration.
          properties:
            reranking_provider_name:
              type: string
              description: Provider name of the reranking model.
            reranking_model_name:
              type: string
              description: Name of the reranking model.
        reranking_mode:
          type: string
          enum:
            - reranking_model
            - weighted_score
          nullable: true
          description: Reranking mode. Required when `reranking_enable` is `true`.
        top_k:
          type: integer
          description: Maximum number of results to return.
        score_threshold_enabled:
          type: boolean
          description: Whether score threshold filtering is enabled.
        score_threshold:
          type: number
          nullable: true
          description: >-
            Minimum similarity score for results. Only effective when
            `score_threshold_enabled` is `true`.
        weights:
          type: object
          nullable: true
          description: Weight configuration for hybrid search.
          properties:
            weight_type:
              type: string
              description: Strategy for balancing semantic and keyword search weights.
              enum:
                - semantic_first
                - keyword_first
                - customized
            vector_setting:
              type: object
              description: Semantic search weight settings.
              properties:
                vector_weight:
                  type: number
                  description: Weight assigned to semantic (vector) search results.
                embedding_provider_name:
                  type: string
                  description: Provider of the embedding model used for vector search.
                embedding_model_name:
                  type: string
                  description: Name of the embedding model used for vector search.
            keyword_setting:
              type: object
              description: Keyword search weight settings.
              properties:
                keyword_weight:
                  type: number
                  description: Weight assigned to keyword search results.
        metadata_filtering_conditions:
          type: object
          nullable: true
          description: >-
            Restrict retrieval to chunks whose document metadata matches the
            given conditions. Conditions are evaluated server-side against
            document metadata fields.
          properties:
            logical_operator:
              type: string
              enum:
                - and
                - or
              default: and
              nullable: true
              description: How to combine multiple conditions.
            conditions:
              type: array
              nullable: true
              description: List of metadata conditions to evaluate.
              items:
                type: object
                required:
                  - name
                  - comparison_operator
                properties:
                  name:
                    type: string
                    description: Metadata field name to compare against.
                  comparison_operator:
                    type: string
                    description: >-
                      Comparison to apply. String operators (`contains`, `not
                      contains`, `start with`, `end with`, `is`, `is not`,
                      `empty`, `not empty`, `in`, `not in`) act on string or
                      array metadata. Numeric operators (`=`, `≠`, `>`, `<`,
                      `≥`, `≤`) act on numeric metadata. Time operators
                      (`before`, `after`) act on time metadata.
                    enum:
                      - contains
                      - not contains
                      - start with
                      - end with
                      - is
                      - is not
                      - empty
                      - not empty
                      - in
                      - not in
                      - '='
                      - ≠
                      - '>'
                      - <
                      - ≥
                      - ≤
                      - before
                      - after
                  value:
                    nullable: true
                    description: >-
                      Value to compare against. Type depends on
                      `comparison_operator`: string for most string operators,
                      array of strings for `in` and `not in`, number for numeric
                      operators, and omitted for `empty` and `not empty`.
                    oneOf:
                      - type: string
                      - type: array
                        items:
                          type: string
                      - type: number
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      bearerFormat: API_KEY
      description: >-
        API Key authentication. For all API requests, include your API Key in
        the `Authorization` HTTP Header, prefixed with `Bearer `. Example:
        `Authorization: Bearer {API_KEY}`. **Strongly recommend storing your API
        Key on the server-side, not shared or stored on the client-side, to
        avoid possible API-Key leakage that can lead to serious consequences.**

````