> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Update Document by Text

> Update an existing document's text content, name, or processing configuration. Re-triggers indexing if content changes — use the returned `batch` ID with [Get Document Indexing Status](/api-reference/documents/get-document-indexing-status) to track progress.



## OpenAPI

````yaml /en/api-reference/openapi_knowledge.json post /datasets/{dataset_id}/documents/{document_id}/update-by-text
openapi: 3.0.1
info:
  title: Knowledge API
  description: >-
    API for managing knowledge bases, documents, chunks, metadata, and tags,
    including creation, retrieval, and configuration. **Note:** A single
    Knowledge Base API key has permission to operate on all visible knowledge
    bases under the same account. Please pay attention to data security.
  version: 1.0.0
servers:
  - url: '{apiBaseUrl}'
    description: The base URL for the Knowledge API.
    variables:
      apiBaseUrl:
        default: https://api.dify.ai/v1
        description: Actual base URL of the API
security:
  - ApiKeyAuth: []
tags:
  - name: Knowledge Bases
    description: >-
      Operations for managing knowledge bases, including creation,
      configuration, and retrieval.
  - name: Documents
    description: >-
      Operations for creating, updating, and managing documents within a
      knowledge base.
  - name: Chunks
    description: Operations for managing document chunks and child chunks.
  - name: Metadata
    description: >-
      Operations for managing knowledge base metadata fields and document
      metadata values.
  - name: Tags
    description: Operations for managing knowledge base tags and tag bindings.
  - name: Models
    description: Operations for retrieving available models.
  - name: Knowledge Pipeline
    description: >-
      Operations for managing and running knowledge pipelines, including
      datasource plugins and pipeline execution.
paths:
  /datasets/{dataset_id}/documents/{document_id}/update-by-text:
    post:
      tags:
        - Documents
      summary: Update Document by Text
      description: >-
        Update an existing document's text content, name, or processing
        configuration. Re-triggers indexing if content changes — use the
        returned `batch` ID with [Get Document Indexing
        Status](/api-reference/documents/get-document-indexing-status) to track
        progress.
      operationId: updateDocumentByText
      parameters:
        - name: dataset_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
          description: Knowledge base ID.
        - name: document_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
          description: Document ID.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                  description: Document name. Required when `text` is provided.
                text:
                  type: string
                  description: Document text content.
                process_rule:
                  type: object
                  description: Processing rules for chunking.
                  required:
                    - mode
                  properties:
                    mode:
                      type: string
                      enum:
                        - automatic
                        - custom
                        - hierarchical
                      description: >-
                        Processing mode. `automatic` uses built-in rules,
                        `custom` allows manual configuration, `hierarchical`
                        enables parent-child chunk structure (use with
                        `doc_form: hierarchical_model`).
                    rules:
                      type: object
                      properties:
                        pre_processing_rules:
                          type: array
                          items:
                            type: object
                            properties:
                              id:
                                type: string
                                enum:
                                  - remove_stopwords
                                  - remove_extra_spaces
                                  - remove_urls_emails
                                description: Rule identifier.
                              enabled:
                                type: boolean
                                description: Whether this preprocessing rule is enabled.
                        segmentation:
                          type: object
                          properties:
                            separator:
                              type: string
                              default: |+

                              description: Custom separator for splitting text.
                            max_tokens:
                              type: integer
                              description: Maximum token count per chunk.
                            chunk_overlap:
                              type: integer
                              default: 0
                              description: Token overlap between chunks.
                doc_form:
                  type: string
                  enum:
                    - text_model
                    - hierarchical_model
                    - qa_model
                  default: text_model
                  description: >-
                    `text_model` for standard text chunking,
                    `hierarchical_model` for parent-child chunk structure,
                    `qa_model` for question-answer pair extraction.
                doc_language:
                  type: string
                  default: English
                  description: Language of the document for processing optimization.
                retrieval_model:
                  $ref: '#/components/schemas/RetrievalModel'
                  description: >-
                    Retrieval model configuration. Controls how chunks are
                    searched and ranked when querying this knowledge base.
      responses:
        '200':
          description: Document updated successfully.
          content:
            application/json:
              schema:
                type: object
                properties:
                  document:
                    $ref: '#/components/schemas/Document'
                  batch:
                    type: string
                    description: Batch ID for tracking indexing progress.
              examples:
                success:
                  summary: Response Example
                  value:
                    document:
                      id: a8e0e5b5-78c6-4130-a5ce-25feb0e0b4ac
                      position: 1
                      data_source_type: upload_file
                      data_source_info:
                        upload_file_id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
                      data_source_detail_dict:
                        upload_file:
                          id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
                          name: guide.txt
                          size: 2048
                          extension: txt
                          mime_type: text/plain
                          created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                          created_at: 1741267200
                      dataset_process_rule_id: e1f2a3b4-c5d6-7890-ef12-345678901234
                      name: guide.txt
                      created_from: api
                      created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                      created_at: 1741267200
                      tokens: 512
                      indexing_status: completed
                      error: null
                      enabled: true
                      disabled_at: null
                      disabled_by: null
                      archived: false
                      display_status: available
                      word_count: 350
                      hit_count: 0
                      doc_form: text_model
                      doc_metadata: []
                      summary_index_status: null
                      need_summary: false
                    batch: '20250306150245647595'
        '400':
          description: >-
            - `provider_not_initialize` : No valid model provider credentials
            found. Please go to Settings -> Model Provider to complete your
            provider credentials.

            - `invalid_param` : Knowledge base does not exist, name is required
            when text is provided, or invalid doc_form (must be `text_model`,
            `hierarchical_model`, or `qa_model`).
          content:
            application/json:
              examples:
                provider_not_initialize:
                  summary: provider_not_initialize
                  value:
                    status: 400
                    code: provider_not_initialize
                    message: >-
                      No valid model provider credentials found. Please go to
                      Settings -> Model Provider to complete your provider
                      credentials.
                invalid_param_dataset:
                  summary: invalid_param
                  value:
                    status: 400
                    code: invalid_param
                    message: Dataset does not exist.
                invalid_param_name_required:
                  summary: invalid_param (name required)
                  value:
                    status: 400
                    code: invalid_param
                    message: name is required when text is provided.
components:
  schemas:
    RetrievalModel:
      type: object
      required:
        - search_method
        - reranking_enable
        - top_k
        - score_threshold_enabled
      properties:
        search_method:
          type: string
          description: Search method used for retrieval.
          enum:
            - keyword_search
            - semantic_search
            - full_text_search
            - hybrid_search
        reranking_enable:
          type: boolean
          description: Whether reranking is enabled.
        reranking_model:
          type: object
          description: Reranking model configuration.
          properties:
            reranking_provider_name:
              type: string
              description: Provider name of the reranking model.
            reranking_model_name:
              type: string
              description: Name of the reranking model.
        reranking_mode:
          type: string
          enum:
            - reranking_model
            - weighted_score
          nullable: true
          description: Reranking mode. Required when `reranking_enable` is `true`.
        top_k:
          type: integer
          description: Maximum number of results to return.
        score_threshold_enabled:
          type: boolean
          description: Whether score threshold filtering is enabled.
        score_threshold:
          type: number
          nullable: true
          description: >-
            Minimum similarity score for results. Only effective when
            `score_threshold_enabled` is `true`.
        weights:
          type: object
          nullable: true
          description: Weight configuration for hybrid search.
          properties:
            weight_type:
              type: string
              description: Strategy for balancing semantic and keyword search weights.
              enum:
                - semantic_first
                - keyword_first
                - customized
            vector_setting:
              type: object
              description: Semantic search weight settings.
              properties:
                vector_weight:
                  type: number
                  description: Weight assigned to semantic (vector) search results.
                embedding_provider_name:
                  type: string
                  description: Provider of the embedding model used for vector search.
                embedding_model_name:
                  type: string
                  description: Name of the embedding model used for vector search.
            keyword_setting:
              type: object
              description: Keyword search weight settings.
              properties:
                keyword_weight:
                  type: number
                  description: Weight assigned to keyword search results.
        metadata_filtering_conditions:
          type: object
          nullable: true
          description: >-
            Restrict retrieval to chunks whose document metadata matches the
            given conditions. Conditions are evaluated server-side against
            document metadata fields.
          properties:
            logical_operator:
              type: string
              enum:
                - and
                - or
              default: and
              nullable: true
              description: How to combine multiple conditions.
            conditions:
              type: array
              nullable: true
              description: List of metadata conditions to evaluate.
              items:
                type: object
                required:
                  - name
                  - comparison_operator
                properties:
                  name:
                    type: string
                    description: Metadata field name to compare against.
                  comparison_operator:
                    type: string
                    description: >-
                      Comparison to apply. String operators (`contains`, `not
                      contains`, `start with`, `end with`, `is`, `is not`,
                      `empty`, `not empty`, `in`, `not in`) act on string or
                      array metadata. Numeric operators (`=`, `≠`, `>`, `<`,
                      `≥`, `≤`) act on numeric metadata. Time operators
                      (`before`, `after`) act on time metadata.
                    enum:
                      - contains
                      - not contains
                      - start with
                      - end with
                      - is
                      - is not
                      - empty
                      - not empty
                      - in
                      - not in
                      - '='
                      - ≠
                      - '>'
                      - <
                      - ≥
                      - ≤
                      - before
                      - after
                  value:
                    nullable: true
                    description: >-
                      Value to compare against. Type depends on
                      `comparison_operator`: string for most string operators,
                      array of strings for `in` and `not in`, number for numeric
                      operators, and omitted for `empty` and `not empty`.
                    oneOf:
                      - type: string
                      - type: array
                        items:
                          type: string
                      - type: number
    Document:
      type: object
      properties:
        id:
          type: string
          description: Unique identifier of the document.
        position:
          type: integer
          description: Display position of the document in the list.
        data_source_type:
          type: string
          description: >-
            How the document was created. `upload_file` for file uploads,
            `notion_import` for Notion imports.
        data_source_info:
          type: object
          description: Raw data source information, varies by `data_source_type`.
        data_source_detail_dict:
          type: object
          description: Detailed data source information including file details.
        dataset_process_rule_id:
          type: string
          description: ID of the processing rule applied to this document.
        name:
          type: string
          description: Document name.
        created_from:
          type: string
          description: >-
            Origin of the document. `api` for API creation, `web` for UI
            creation.
        created_by:
          type: string
          description: ID of the user who created the document.
        created_at:
          type: number
          description: Creation timestamp (Unix epoch in seconds).
        tokens:
          type: integer
          description: Total number of tokens in the document.
        indexing_status:
          type: string
          description: >-
            Current indexing status. `waiting` for queued, `parsing` while
            extracting content, `cleaning` while removing noise, `splitting`
            while chunking, `indexing` while building vectors, `completed` when
            ready, `error` if failed, `paused` if manually paused.
        error:
          type: string
          nullable: true
          description: Error message if indexing failed. `null` when no error.
        enabled:
          type: boolean
          description: Whether the document is enabled for retrieval.
        disabled_at:
          type: number
          nullable: true
          description: Timestamp when the document was disabled. `null` if enabled.
        disabled_by:
          type: string
          nullable: true
          description: ID of the user who disabled the document. `null` if enabled.
        archived:
          type: boolean
          description: Whether the document is archived.
        display_status:
          type: string
          description: >-
            User-facing display status derived from `indexing_status` and
            `enabled` state.
        word_count:
          type: integer
          description: Total word count of the document.
        hit_count:
          type: integer
          description: Number of times the document has been matched in retrieval queries.
        doc_form:
          type: string
          description: >-
            Document chunking mode. `text_model` for standard text chunking,
            `hierarchical_model` for parent-child structure, `qa_model` for QA
            pair extraction.
        doc_metadata:
          type: array
          description: Metadata values assigned to this document.
          items:
            type: object
            properties:
              id:
                type: string
                description: Metadata field identifier.
              name:
                type: string
                description: Metadata field name.
              type:
                type: string
                description: Metadata field value type.
              value:
                type: string
                description: Metadata value for this document.
        summary_index_status:
          type: string
          nullable: true
          description: >-
            Status of the summary index for this document. `null` if summary
            indexing is not configured.
        need_summary:
          type: boolean
          description: Whether a summary needs to be generated for this document.
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      bearerFormat: API_KEY
      description: >-
        API Key authentication. For all API requests, include your API Key in
        the `Authorization` HTTP Header, prefixed with `Bearer `. Example:
        `Authorization: Bearer {API_KEY}`. **Strongly recommend storing your API
        Key on the server-side, not shared or stored on the client-side, to
        avoid possible API-Key leakage that can lead to serious consequences.**

````