> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# 创建空知识库

> 创建新的空知识库。创建后，使用 [从文本创建文档](/api-reference/文档/从文本创建文档) 或 [从文件创建文档](/api-reference/文档/从文件创建文档) 添加文档。


## OpenAPI

````yaml /zh/api-reference/openapi_knowledge.json post /datasets
openapi: 3.0.1
info:
  title: 知识库 API
  description: >-
    用于管理知识库、文档、分段、元数据和标签的 API，包括创建、检索和配置操作。**注意：**单个知识库 API
    密钥有权操作同一账户下所有可见的知识库。请注意数据安全。
  version: 1.0.0
servers:
  - url: https://{api_base_url}
    description: Knowledge API 的基础 URL。自部署时，替换为你的 API 基础 URL。
    variables:
      api_base_url:
        default: api.dify.ai/v1
        description: API 基础 URL 的主机与路径，不含 `https://` 前缀。
security:
  - ApiKeyAuth: []
tags:
  - name: 知识库
    description: 用于管理知识库的操作，包括创建、配置和检索。
  - name: 文档
    description: 用于在知识库中创建、更新和管理文档的操作。
  - name: 分段
    description: 用于管理分段和子分段的操作。
  - name: 元数据
    description: 用于管理知识库元数据字段和文档元数据值的操作。
  - name: 标签
    description: 用于管理知识库标签和标签绑定的操作。
  - name: 模型
    description: 用于获取可用模型的操作。
  - name: 知识流水线
    description: 用于管理和运行知识流水线的操作，包括数据源插件和流水线执行。
paths:
  /datasets:
    post:
      tags:
        - 知识库
      summary: 创建空知识库
      description: >-
        创建新的空知识库。创建后，使用 [从文本创建文档](/api-reference/文档/从文本创建文档) 或
        [从文件创建文档](/api-reference/文档/从文件创建文档) 添加文档。
      operationId: createDataset
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - name
              properties:
                name:
                  type: string
                  minLength: 1
                  maxLength: 40
                  description: 知识库名称。
                description:
                  type: string
                  maxLength: 400
                  default: ''
                  description: 知识库描述。
                indexing_technique:
                  type: string
                  enum:
                    - high_quality
                    - economy
                  nullable: true
                  description: '`high_quality` 使用嵌入模型进行精确搜索；`economy` 使用基于关键词的索引。'
                permission:
                  type: string
                  enum:
                    - only_me
                    - all_team_members
                    - partial_members
                  default: only_me
                  description: >-
                    控制谁可以访问此知识库。`only_me` 仅限创建者，`all_team_members`
                    授权整个工作区访问，`partial_members` 授权指定成员访问。
                provider:
                  type: string
                  enum:
                    - vendor
                    - external
                  default: vendor
                  description: '`vendor` 为内部知识库，`external` 为外部知识库。'
                embedding_model:
                  type: string
                  description: >-
                    嵌入模型名称。使用 [获取可用模型](/api-reference/模型/获取可用模型) 中
                    `model_type=text-embedding` 返回的 `model` 字段值。
                embedding_model_provider:
                  type: string
                  description: >-
                    嵌入模型供应商。使用 [获取可用模型](/api-reference/模型/获取可用模型) 中
                    `model_type=text-embedding` 返回的 `provider` 字段值。
                retrieval_model:
                  $ref: '#/components/schemas/RetrievalModel'
                  description: 检索模型配置。控制查询此知识库时如何搜索和排序分段。
                external_knowledge_api_id:
                  type: string
                  description: 外部知识库 API 连接的 ID。
                external_knowledge_id:
                  type: string
                  description: 外部知识库的 ID。
                summary_index_setting:
                  type: object
                  nullable: true
                  description: 摘要索引配置。
                  properties:
                    enable:
                      type: boolean
                      description: 是否启用摘要索引。
                    model_name:
                      type: string
                      description: 用于生成摘要的模型名称。
                    model_provider_name:
                      type: string
                      description: 摘要生成模型的提供商。
                    summary_prompt:
                      type: string
                      description: 用于摘要生成的自定义提示词模板。
      responses:
        '200':
          description: 知识库创建成功。
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Dataset'
              examples:
                success:
                  summary: 响应示例
                  value:
                    id: c42e2a6e-40b3-4330-96f8-f1e4d768e8c9
                    name: Product Documentation
                    description: 产品 API 技术文档
                    provider: vendor
                    permission: only_me
                    data_source_type: null
                    indexing_technique: high_quality
                    app_count: 0
                    document_count: 0
                    word_count: 0
                    created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                    author_name: admin
                    created_at: 1741267200
                    updated_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                    updated_at: 1741267200
                    embedding_model: text-embedding-3-small
                    embedding_model_provider: openai
                    embedding_available: true
                    retrieval_model_dict:
                      search_method: semantic_search
                      reranking_enable: false
                      reranking_mode: null
                      reranking_model:
                        reranking_provider_name: ''
                        reranking_model_name: ''
                      weights: null
                      top_k: 3
                      score_threshold_enabled: false
                      score_threshold: null
                    tags: []
                    doc_form: text_model
                    external_knowledge_info: null
                    external_retrieval_model: null
                    doc_metadata: []
                    built_in_field_enabled: true
                    pipeline_id: null
                    runtime_mode: null
                    chunk_structure: null
                    icon_info: null
                    summary_index_setting: null
                    is_published: false
                    total_documents: 0
                    total_available_documents: 0
                    enable_api: true
                    is_multimodal: false
        '409':
          description: '`dataset_name_duplicate` : 知识库名称已存在，请修改名称。'
          content:
            application/json:
              examples:
                dataset_name_duplicate:
                  summary: dataset_name_duplicate
                  value:
                    status: 409
                    code: dataset_name_duplicate
                    message: >-
                      The dataset name already exists. Please modify your
                      dataset name.
components:
  schemas:
    RetrievalModel:
      type: object
      required:
        - search_method
        - reranking_enable
        - top_k
        - score_threshold_enabled
      properties:
        search_method:
          type: string
          description: 用于检索的搜索方法。
          enum:
            - keyword_search
            - semantic_search
            - full_text_search
            - hybrid_search
        reranking_enable:
          type: boolean
          description: 是否启用重排序。
        reranking_model:
          type: object
          description: 重排序模型配置。
          properties:
            reranking_provider_name:
              type: string
              description: 重排序模型的提供商名称。
            reranking_model_name:
              type: string
              description: 重排序模型名称。
        reranking_mode:
          type: string
          enum:
            - reranking_model
            - weighted_score
          nullable: true
          description: 重排序模式。当 `reranking_enable` 为 `true` 时必填。
        top_k:
          type: integer
          description: 返回的最大结果数。
        score_threshold_enabled:
          type: boolean
          description: 是否启用分数阈值过滤。
        score_threshold:
          type: number
          nullable: true
          description: 结果的最低相关性分数。仅在 `score_threshold_enabled` 为 `true` 时生效。
        weights:
          type: object
          nullable: true
          description: 混合搜索的权重配置。
          properties:
            weight_type:
              type: string
              description: 平衡语义搜索和关键词搜索权重的策略。
              enum:
                - semantic_first
                - keyword_first
                - customized
            vector_setting:
              type: object
              description: 语义搜索权重设置。
              properties:
                vector_weight:
                  type: number
                  description: 分配给语义（向量）搜索结果的权重。
                embedding_provider_name:
                  type: string
                  description: 用于向量搜索的嵌入模型提供商。
                embedding_model_name:
                  type: string
                  description: 用于向量搜索的嵌入模型名称。
            keyword_setting:
              type: object
              description: 关键词搜索权重设置。
              properties:
                keyword_weight:
                  type: number
                  description: 分配给关键词搜索结果的权重。
        metadata_filtering_conditions:
          type: object
          nullable: true
          description: 仅检索文档元数据匹配指定条件的分段。条件由服务端基于文档元数据字段评估。
          properties:
            logical_operator:
              type: string
              enum:
                - and
                - or
              default: and
              nullable: true
              description: 多个条件之间的组合方式。
            conditions:
              type: array
              nullable: true
              description: 要评估的元数据条件列表。
              items:
                type: object
                required:
                  - name
                  - comparison_operator
                properties:
                  name:
                    type: string
                    description: 用于比较的元数据字段名称。
                  comparison_operator:
                    type: string
                    description: >-
                      应用的比较方式。字符串运算符（`contains`、`not contains`、`start with`、`end
                      with`、`is`、`is not`、`empty`、`not empty`、`in`、`not
                      in`）作用于字符串或数组类型的元数据。数值运算符（`=`、`≠`、`>`、`<`、`≥`、`≤`）作用于数值类型的元数据。时间运算符（`before`、`after`）作用于时间类型的元数据。
                    enum:
                      - contains
                      - not contains
                      - start with
                      - end with
                      - is
                      - is not
                      - empty
                      - not empty
                      - in
                      - not in
                      - '='
                      - ≠
                      - '>'
                      - <
                      - ≥
                      - ≤
                      - before
                      - after
                  value:
                    nullable: true
                    description: >-
                      用于比较的值。类型取决于 `comparison_operator`：大多数字符串运算符使用字符串；`in` 和
                      `not in` 使用字符串数组；数值运算符使用数字；`empty` 和 `not empty` 时省略。
                    oneOf:
                      - type: string
                      - type: array
                        items:
                          type: string
                      - type: number
    Dataset:
      type: object
      properties:
        id:
          type: string
          description: 知识库的唯一标识符。
        name:
          type: string
          description: 知识库的显示名称。在工作区内唯一。
        description:
          type: string
          description: 描述知识库用途或内容的可选文本。
        provider:
          type: string
          description: 提供商类型。内部管理为 `vendor`，外部知识库连接为 `external`。
        permission:
          type: string
          description: 控制谁可以访问此知识库。可选值：`only_me`、`all_team_members`、`partial_members`。
        data_source_type:
          type: string
          description: 文档的数据源类型，尚未配置时为 `null`。
        indexing_technique:
          type: string
          description: '`high_quality` 使用嵌入模型进行精确搜索；`economy` 使用基于关键词的索引。'
        app_count:
          type: integer
          description: 当前使用该知识库的应用数量。
        document_count:
          type: integer
          description: 知识库中的文档总数。
        word_count:
          type: integer
          description: 所有文档的总字数。
        created_by:
          type: string
          description: 创建该知识库的用户 ID。
        author_name:
          type: string
          description: 创建者的显示名称。
        created_at:
          type: number
          description: 创建时间戳（Unix 纪元，单位为秒）。
        updated_by:
          type: string
          description: 最后更新该知识库的用户 ID。
        updated_at:
          type: number
          description: 最后更新时间戳（Unix 纪元，单位为秒）。
        embedding_model:
          type: string
          description: 用于索引的嵌入模型名称。
        embedding_model_provider:
          type: string
          description: >-
            嵌入模型供应商。使用 [获取可用模型](/api-reference/模型/获取可用模型) 中
            `model_type=text-embedding` 返回的 `provider` 字段值。
        embedding_available:
          type: boolean
          description: 配置的嵌入模型当前是否可用。
        retrieval_model_dict:
          type: object
          description: 知识库的检索配置。
          properties:
            search_method:
              type: string
              description: >-
                用于检索的搜索方式。`keyword_search` 表示关键词匹配，`semantic_search`
                表示基于嵌入的语义相似度，`full_text_search` 表示全文索引，`hybrid_search`
                表示语义和关键词混合搜索。
            reranking_enable:
              type: boolean
              description: 是否启用重排序。
            reranking_mode:
              type: string
              nullable: true
              description: >-
                重排序模式。`reranking_model` 表示基于模型的重排序，`weighted_score`
                表示基于分数的加权。重排序禁用时为 `null`。
            reranking_model:
              type: object
              description: 重排序模型配置。
              properties:
                reranking_provider_name:
                  type: string
                  description: 重排序模型的提供商名称。
                reranking_model_name:
                  type: string
                  description: 重排序模型名称。
            weights:
              type: object
              nullable: true
              description: 混合搜索的权重配置。
              properties:
                weight_type:
                  type: string
                  description: 平衡语义搜索和关键词搜索权重的策略。
                vector_setting:
                  type: object
                  description: 语义搜索权重设置。
                  properties:
                    vector_weight:
                      type: number
                      description: 分配给语义（向量）搜索结果的权重。
                    embedding_provider_name:
                      type: string
                      description: 用于向量搜索的嵌入模型提供商。
                    embedding_model_name:
                      type: string
                      description: 用于向量搜索的嵌入模型名称。
                keyword_setting:
                  type: object
                  description: 关键词搜索权重设置。
                  properties:
                    keyword_weight:
                      type: number
                      description: 分配给关键词搜索结果的权重。
            top_k:
              type: integer
              description: 返回的最大结果数。
            score_threshold_enabled:
              type: boolean
              description: 是否启用分数阈值过滤。
            score_threshold:
              type: number
              description: 结果的最低相关性分数。仅在 `score_threshold_enabled` 为 `true` 时生效。
        summary_index_setting:
          type: object
          nullable: true
          description: 摘要索引配置。
          properties:
            enable:
              type: boolean
              description: 是否已启用摘要索引。
            model_name:
              type: string
              description: 用于生成摘要的模型名称。
            model_provider_name:
              type: string
              description: 摘要生成模型的提供商。
            summary_prompt:
              type: string
              description: 用于摘要生成的提示词模板。
        tags:
          type: array
          description: 与该知识库关联的标签。
          items:
            type: object
            properties:
              id:
                type: string
                description: 标签标识符。
              name:
                type: string
                description: Tag name.
              type:
                type: string
                description: 标签类型。知识库标签始终为 `knowledge`。
        doc_form:
          type: string
          description: >-
            文档分块模式。`text_model` 表示标准文本分块，`hierarchical_model` 表示父子结构，`qa_model`
            表示问答对提取。
        external_knowledge_info:
          type: object
          nullable: true
          description: 外部知识库的连接详情。当 `provider` 为 `external` 时存在。
          properties:
            external_knowledge_id:
              type: string
              description: 外部知识库的 ID。
            external_knowledge_api_id:
              type: string
              description: 外部知识库 API 连接的 ID。
            external_knowledge_api_name:
              type: string
              description: 外部知识库 API 的显示名称。
            external_knowledge_api_endpoint:
              type: string
              description: 外部知识库 API 的端点 URL。
        external_retrieval_model:
          type: object
          nullable: true
          description: 外部知识库的检索设置。内部知识库时为 `null`。
          properties:
            top_k:
              type: integer
              description: 从外部知识库返回的最大结果数量。
            score_threshold:
              type: number
              description: 最低相关性得分阈值。
            score_threshold_enabled:
              type: boolean
              description: 是否启用分数阈值过滤。
        doc_metadata:
          type: array
          description: 知识库的元数据字段定义。
          items:
            type: object
            properties:
              id:
                type: string
                description: 元数据字段标识符。
              name:
                type: string
                description: 元数据字段名称。
              type:
                type: string
                description: 元数据字段值类型。
        built_in_field_enabled:
          type: boolean
          description: 是否启用内置元数据字段（例如 `document_name`、`uploader`）。
        pipeline_id:
          type: string
          nullable: true
          description: 自定义处理流水线的 ID（如果已配置）。
        runtime_mode:
          type: string
          nullable: true
          description: 运行时处理模式。
        chunk_structure:
          type: string
          nullable: true
          description: 分段结构配置。
        icon_info:
          type: object
          nullable: true
          description: 知识库的图标显示配置。
          properties:
            icon_type:
              type: string
              description: 图标类型。
            icon:
              type: string
              description: 图标标识符或表情符号。
            icon_background:
              type: string
              description: 图标的背景颜色。
            icon_url:
              type: string
              description: 自定义图标图片的 URL。
        is_published:
          type: boolean
          description: 知识库是否已发布。
        total_documents:
          type: integer
          description: 文档总数。
        total_available_documents:
          type: integer
          description: 已启用且可用的文档数量。
        enable_api:
          type: boolean
          description: 该知识库是否启用 API 访问。
        is_multimodal:
          type: boolean
          description: 是否启用多模态内容处理。
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      bearerFormat: API_KEY
      description: >-
        API Key 认证。对于所有 API 请求，请在 `Authorization` HTTP 头中包含您的 API Key，并加上
        `Bearer ` 前缀。示例：`Authorization: Bearer {API_KEY}`。**强烈建议将 API Key
        存储在服务端，不要在客户端共享或存储，以避免 API Key 泄漏导致严重后果。**

````