> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create an Empty Knowledge Base

> Create a new empty knowledge base. After creation, use [Create Document by Text](/api-reference/documents/create-document-by-text) or [Create Document by File](/api-reference/documents/create-document-by-file) to add documents.



## OpenAPI

````yaml /en/api-reference/openapi_knowledge.json post /datasets
openapi: 3.0.1
info:
  title: Knowledge API
  description: >-
    API for managing knowledge bases, documents, chunks, metadata, and tags,
    including creation, retrieval, and configuration. **Note:** A single
    Knowledge Base API key has permission to operate on all visible knowledge
    bases under the same account. Please pay attention to data security.
  version: 1.0.0
servers:
  - url: '{apiBaseUrl}'
    description: The base URL for the Knowledge API.
    variables:
      apiBaseUrl:
        default: https://api.dify.ai/v1
        description: Actual base URL of the API
security:
  - ApiKeyAuth: []
tags:
  - name: Knowledge Bases
    description: >-
      Operations for managing knowledge bases, including creation,
      configuration, and retrieval.
  - name: Documents
    description: >-
      Operations for creating, updating, and managing documents within a
      knowledge base.
  - name: Chunks
    description: Operations for managing document chunks and child chunks.
  - name: Metadata
    description: >-
      Operations for managing knowledge base metadata fields and document
      metadata values.
  - name: Tags
    description: Operations for managing knowledge base tags and tag bindings.
  - name: Models
    description: Operations for retrieving available models.
  - name: Knowledge Pipeline
    description: >-
      Operations for managing and running knowledge pipelines, including
      datasource plugins and pipeline execution.
paths:
  /datasets:
    post:
      tags:
        - Knowledge Bases
      summary: Create an Empty Knowledge Base
      description: >-
        Create a new empty knowledge base. After creation, use [Create Document
        by Text](/api-reference/documents/create-document-by-text) or [Create
        Document by File](/api-reference/documents/create-document-by-file) to
        add documents.
      operationId: createDataset
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - name
              properties:
                name:
                  type: string
                  minLength: 1
                  maxLength: 40
                  description: Name of the knowledge base.
                description:
                  type: string
                  maxLength: 400
                  default: ''
                  description: Description of the knowledge base.
                indexing_technique:
                  type: string
                  enum:
                    - high_quality
                    - economy
                  nullable: true
                  description: >-
                    `high_quality` uses embedding models for precise search;
                    `economy` uses keyword-based indexing.
                permission:
                  type: string
                  enum:
                    - only_me
                    - all_team_members
                    - partial_members
                  default: only_me
                  description: >-
                    Controls who can access this knowledge base. `only_me`
                    restricts to the creator, `all_team_members` grants access
                    to the entire workspace, `partial_members` grants access to
                    specified members.
                provider:
                  type: string
                  enum:
                    - vendor
                    - external
                  default: vendor
                  description: >-
                    `vendor` for internal knowledge base, `external` for
                    external knowledge base.
                embedding_model:
                  type: string
                  description: >-
                    Embedding model name. Use the `model` field from [Get
                    Available
                    Models](/api-reference/models/get-available-models) with
                    `model_type=text-embedding`.
                embedding_model_provider:
                  type: string
                  description: >-
                    Embedding model provider. Use the `provider` field from [Get
                    Available
                    Models](/api-reference/models/get-available-models) with
                    `model_type=text-embedding`.
                retrieval_model:
                  $ref: '#/components/schemas/RetrievalModel'
                  description: >-
                    Retrieval model configuration. Controls how chunks are
                    searched and ranked when querying this knowledge base.
                external_knowledge_api_id:
                  type: string
                  description: ID of the external knowledge API connection.
                external_knowledge_id:
                  type: string
                  description: ID of the external knowledge base.
                summary_index_setting:
                  type: object
                  nullable: true
                  description: Summary index configuration.
                  properties:
                    enable:
                      type: boolean
                      description: Whether to enable summary indexing.
                    model_name:
                      type: string
                      description: Name of the model used for generating summaries.
                    model_provider_name:
                      type: string
                      description: Provider of the summary generation model.
                    summary_prompt:
                      type: string
                      description: Custom prompt template for summary generation.
      responses:
        '200':
          description: Knowledge base created successfully.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Dataset'
              examples:
                success:
                  summary: Response Example
                  value:
                    id: c42e2a6e-40b3-4330-96f8-f1e4d768e8c9
                    name: Product Documentation
                    description: Technical documentation for the product API
                    provider: vendor
                    permission: only_me
                    data_source_type: null
                    indexing_technique: high_quality
                    app_count: 0
                    document_count: 0
                    word_count: 0
                    created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                    author_name: admin
                    created_at: 1741267200
                    updated_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                    updated_at: 1741267200
                    embedding_model: text-embedding-3-small
                    embedding_model_provider: openai
                    embedding_available: true
                    retrieval_model_dict:
                      search_method: semantic_search
                      reranking_enable: false
                      reranking_mode: null
                      reranking_model:
                        reranking_provider_name: ''
                        reranking_model_name: ''
                      weights: null
                      top_k: 3
                      score_threshold_enabled: false
                      score_threshold: null
                    tags: []
                    doc_form: text_model
                    external_knowledge_info: null
                    external_retrieval_model: null
                    doc_metadata: []
                    built_in_field_enabled: true
                    pipeline_id: null
                    runtime_mode: null
                    chunk_structure: null
                    icon_info: null
                    summary_index_setting: null
                    is_published: false
                    total_documents: 0
                    total_available_documents: 0
                    enable_api: true
                    is_multimodal: false
        '409':
          description: >-
            `dataset_name_duplicate` : The dataset name already exists. Please
            modify your dataset name.
          content:
            application/json:
              examples:
                dataset_name_duplicate:
                  summary: dataset_name_duplicate
                  value:
                    status: 409
                    code: dataset_name_duplicate
                    message: >-
                      The dataset name already exists. Please modify your
                      dataset name.
components:
  schemas:
    RetrievalModel:
      type: object
      required:
        - search_method
        - reranking_enable
        - top_k
        - score_threshold_enabled
      properties:
        search_method:
          type: string
          description: Search method used for retrieval.
          enum:
            - keyword_search
            - semantic_search
            - full_text_search
            - hybrid_search
        reranking_enable:
          type: boolean
          description: Whether reranking is enabled.
        reranking_model:
          type: object
          description: Reranking model configuration.
          properties:
            reranking_provider_name:
              type: string
              description: Provider name of the reranking model.
            reranking_model_name:
              type: string
              description: Name of the reranking model.
        reranking_mode:
          type: string
          enum:
            - reranking_model
            - weighted_score
          nullable: true
          description: Reranking mode. Required when `reranking_enable` is `true`.
        top_k:
          type: integer
          description: Maximum number of results to return.
        score_threshold_enabled:
          type: boolean
          description: Whether score threshold filtering is enabled.
        score_threshold:
          type: number
          nullable: true
          description: >-
            Minimum similarity score for results. Only effective when
            `score_threshold_enabled` is `true`.
        weights:
          type: object
          nullable: true
          description: Weight configuration for hybrid search.
          properties:
            weight_type:
              type: string
              description: Strategy for balancing semantic and keyword search weights.
              enum:
                - semantic_first
                - keyword_first
                - customized
            vector_setting:
              type: object
              description: Semantic search weight settings.
              properties:
                vector_weight:
                  type: number
                  description: Weight assigned to semantic (vector) search results.
                embedding_provider_name:
                  type: string
                  description: Provider of the embedding model used for vector search.
                embedding_model_name:
                  type: string
                  description: Name of the embedding model used for vector search.
            keyword_setting:
              type: object
              description: Keyword search weight settings.
              properties:
                keyword_weight:
                  type: number
                  description: Weight assigned to keyword search results.
        metadata_filtering_conditions:
          type: object
          nullable: true
          description: >-
            Restrict retrieval to chunks whose document metadata matches the
            given conditions. Conditions are evaluated server-side against
            document metadata fields.
          properties:
            logical_operator:
              type: string
              enum:
                - and
                - or
              default: and
              nullable: true
              description: How to combine multiple conditions.
            conditions:
              type: array
              nullable: true
              description: List of metadata conditions to evaluate.
              items:
                type: object
                required:
                  - name
                  - comparison_operator
                properties:
                  name:
                    type: string
                    description: Metadata field name to compare against.
                  comparison_operator:
                    type: string
                    description: >-
                      Comparison to apply. String operators (`contains`, `not
                      contains`, `start with`, `end with`, `is`, `is not`,
                      `empty`, `not empty`, `in`, `not in`) act on string or
                      array metadata. Numeric operators (`=`, `≠`, `>`, `<`,
                      `≥`, `≤`) act on numeric metadata. Time operators
                      (`before`, `after`) act on time metadata.
                    enum:
                      - contains
                      - not contains
                      - start with
                      - end with
                      - is
                      - is not
                      - empty
                      - not empty
                      - in
                      - not in
                      - '='
                      - ≠
                      - '>'
                      - <
                      - ≥
                      - ≤
                      - before
                      - after
                  value:
                    nullable: true
                    description: >-
                      Value to compare against. Type depends on
                      `comparison_operator`: string for most string operators,
                      array of strings for `in` and `not in`, number for numeric
                      operators, and omitted for `empty` and `not empty`.
                    oneOf:
                      - type: string
                      - type: array
                        items:
                          type: string
                      - type: number
    Dataset:
      type: object
      properties:
        id:
          type: string
          description: Unique identifier of the knowledge base.
        name:
          type: string
          description: Display name of the knowledge base. Unique within the workspace.
        description:
          type: string
          description: >-
            Optional text describing the purpose or contents of the knowledge
            base.
        provider:
          type: string
          description: >-
            Provider type. `vendor` for internally managed, `external` for
            external knowledge base connections.
        permission:
          type: string
          description: >-
            Controls who can access this knowledge base. Possible values:
            `only_me`, `all_team_members`, `partial_members`.
        data_source_type:
          type: string
          description: Data source type of the documents, `null` if not yet configured.
        indexing_technique:
          type: string
          description: >-
            `high_quality` uses embedding models for precise search; `economy`
            uses keyword-based indexing.
        app_count:
          type: integer
          description: Number of applications currently using this knowledge base.
        document_count:
          type: integer
          description: Total number of documents in the knowledge base.
        word_count:
          type: integer
          description: Total word count across all documents.
        created_by:
          type: string
          description: ID of the user who created the knowledge base.
        author_name:
          type: string
          description: Display name of the creator.
        created_at:
          type: number
          description: Creation timestamp (Unix epoch in seconds).
        updated_by:
          type: string
          description: ID of the user who last updated the knowledge base.
        updated_at:
          type: number
          description: Last update timestamp (Unix epoch in seconds).
        embedding_model:
          type: string
          description: Name of the embedding model used for indexing.
        embedding_model_provider:
          type: string
          description: >-
            Embedding model provider. Use the `provider` field from [Get
            Available Models](/api-reference/models/get-available-models) with
            `model_type=text-embedding`.
        embedding_available:
          type: boolean
          description: Whether the configured embedding model is currently available.
        retrieval_model_dict:
          type: object
          description: Retrieval configuration for the knowledge base.
          properties:
            search_method:
              type: string
              description: >-
                Search method used for retrieval. `keyword_search` for keyword
                matching, `semantic_search` for embedding-based similarity,
                `full_text_search` for full-text indexing, `hybrid_search` for a
                combination of semantic and keyword approaches.
            reranking_enable:
              type: boolean
              description: Whether reranking is enabled.
            reranking_mode:
              type: string
              nullable: true
              description: >-
                Reranking mode. `reranking_model` for model-based reranking,
                `weighted_score` for score-based weighting. `null` if reranking
                is disabled.
            reranking_model:
              type: object
              description: Reranking model configuration.
              properties:
                reranking_provider_name:
                  type: string
                  description: Provider name of the reranking model.
                reranking_model_name:
                  type: string
                  description: Name of the reranking model.
            weights:
              type: object
              nullable: true
              description: Weight configuration for hybrid search.
              properties:
                weight_type:
                  type: string
                  description: Strategy for balancing semantic and keyword search weights.
                vector_setting:
                  type: object
                  description: Semantic search weight settings.
                  properties:
                    vector_weight:
                      type: number
                      description: Weight assigned to semantic (vector) search results.
                    embedding_provider_name:
                      type: string
                      description: Provider of the embedding model used for vector search.
                    embedding_model_name:
                      type: string
                      description: Name of the embedding model used for vector search.
                keyword_setting:
                  type: object
                  description: Keyword search weight settings.
                  properties:
                    keyword_weight:
                      type: number
                      description: Weight assigned to keyword search results.
            top_k:
              type: integer
              description: Maximum number of results to return.
            score_threshold_enabled:
              type: boolean
              description: Whether score threshold filtering is enabled.
            score_threshold:
              type: number
              description: >-
                Minimum similarity score for results. Only effective when
                `score_threshold_enabled` is `true`.
        summary_index_setting:
          type: object
          nullable: true
          description: Summary index configuration.
          properties:
            enable:
              type: boolean
              description: Whether summary indexing is enabled.
            model_name:
              type: string
              description: Name of the model used for generating summaries.
            model_provider_name:
              type: string
              description: Provider of the summary generation model.
            summary_prompt:
              type: string
              description: Prompt template used for summary generation.
        tags:
          type: array
          description: Tags associated with this knowledge base.
          items:
            type: object
            properties:
              id:
                type: string
                description: Tag identifier.
              name:
                type: string
                description: Tag name.
              type:
                type: string
                description: Tag type. Always `knowledge` for knowledge base tags.
        doc_form:
          type: string
          description: >-
            Document chunking mode. `text_model` for standard text chunking,
            `hierarchical_model` for parent-child structure, `qa_model` for QA
            pair extraction.
        external_knowledge_info:
          type: object
          nullable: true
          description: >-
            Connection details for external knowledge bases. Present when
            `provider` is `external`.
          properties:
            external_knowledge_id:
              type: string
              description: ID of the external knowledge base.
            external_knowledge_api_id:
              type: string
              description: ID of the external knowledge API connection.
            external_knowledge_api_name:
              type: string
              description: Display name of the external knowledge API.
            external_knowledge_api_endpoint:
              type: string
              description: Endpoint URL of the external knowledge API.
        external_retrieval_model:
          type: object
          nullable: true
          description: >-
            Retrieval settings for external knowledge bases. `null` for internal
            knowledge bases.
          properties:
            top_k:
              type: integer
              description: >-
                Maximum number of results to return from the external knowledge
                base.
            score_threshold:
              type: number
              description: Minimum similarity score threshold.
            score_threshold_enabled:
              type: boolean
              description: Whether score threshold filtering is enabled.
        doc_metadata:
          type: array
          description: Metadata field definitions for the knowledge base.
          items:
            type: object
            properties:
              id:
                type: string
                description: Metadata field identifier.
              name:
                type: string
                description: Metadata field name.
              type:
                type: string
                description: Metadata field value type.
        built_in_field_enabled:
          type: boolean
          description: >-
            Whether built-in metadata fields (e.g., `document_name`, `uploader`)
            are enabled.
        pipeline_id:
          type: string
          nullable: true
          description: Pipeline ID, if a custom processing pipeline is configured.
        runtime_mode:
          type: string
          nullable: true
          description: Runtime processing mode.
        chunk_structure:
          type: string
          nullable: true
          description: Chunk structure configuration.
        icon_info:
          type: object
          nullable: true
          description: Icon display configuration for the knowledge base.
          properties:
            icon_type:
              type: string
              description: Type of icon.
            icon:
              type: string
              description: Icon identifier or emoji.
            icon_background:
              type: string
              description: Background color for the icon.
            icon_url:
              type: string
              description: URL of a custom icon image.
        is_published:
          type: boolean
          description: Whether the knowledge base is published.
        total_documents:
          type: integer
          description: Total number of documents.
        total_available_documents:
          type: integer
          description: Number of documents that are enabled and available.
        enable_api:
          type: boolean
          description: Whether API access is enabled for this knowledge base.
        is_multimodal:
          type: boolean
          description: Whether multimodal content processing is enabled.
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      bearerFormat: API_KEY
      description: >-
        API Key authentication. For all API requests, include your API Key in
        the `Authorization` HTTP Header, prefixed with `Bearer `. Example:
        `Authorization: Bearer {API_KEY}`. **Strongly recommend storing your API
        Key on the server-side, not shared or stored on the client-side, to
        avoid possible API-Key leakage that can lead to serious consequences.**

````