> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# テキストからドキュメントを作成

> テキストコンテンツからドキュメントを作成します。ドキュメントは非同期で処理されます——返された `batch` ID を [ドキュメント埋め込みステータス（進捗）を取得](/api-reference/ドキュメント/ドキュメント埋め込みステータス（進捗）を取得) で使用して進捗を追跡します。


## OpenAPI

````yaml /ja/api-reference/openapi_knowledge.json post /datasets/{dataset_id}/document/create-by-text
openapi: 3.0.1
info:
  title: ナレッジAPI
  description: >-
    ナレッジベース、ドキュメント、チャンク、メタデータ、タグの管理（作成、取得、設定を含む）のための API です。**注意：**単一のナレッジベース
    API キーは、同じアカウント配下のすべての可視ナレッジベースを操作する権限を持ちます。データセキュリティにご注意ください。
  version: 1.0.0
servers:
  - url: https://{api_base_url}
    description: Knowledge API のベース URL です。セルフホスト環境では、独自の API ベース URL に置き換えてください。
    variables:
      api_base_url:
        default: api.dify.ai/v1
        description: API ベース URL のホストとパス（`https://` を除く）。
security:
  - ApiKeyAuth: []
tags:
  - name: データセット
    description: ナレッジベースの作成、設定、取得を含むナレッジベース管理の操作です。
  - name: ドキュメント
    description: ナレッジベース内のドキュメントの作成、更新、管理のための操作です。
  - name: チャンク
    description: ドキュメントチャンクと子チャンクの管理のための操作です。
  - name: メタデータ
    description: ナレッジベースのメタデータフィールドとドキュメントメタデータ値の管理のための操作です。
  - name: タグ管理
    description: ナレッジベースタグとタグバインディングの管理のための操作です。
  - name: モデル
    description: 利用可能なモデルを取得するための操作です。
  - name: ナレッジパイプライン
    description: データソースプラグインとパイプライン実行を含むナレッジパイプラインの管理と実行のための操作です。
paths:
  /datasets/{dataset_id}/document/create-by-text:
    post:
      tags:
        - ドキュメント
      summary: テキストからドキュメントを作成
      description: >-
        テキストコンテンツからドキュメントを作成します。ドキュメントは非同期で処理されます——返された `batch` ID を
        [ドキュメント埋め込みステータス（進捗）を取得](/api-reference/ドキュメント/ドキュメント埋め込みステータス（進捗）を取得)
        で使用して進捗を追跡します。
      operationId: createDocumentFromText
      parameters:
        - name: dataset_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
          description: ナレッジベース ID です。
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - name
                - text
              properties:
                name:
                  type: string
                  description: ドキュメント名です。
                text:
                  type: string
                  description: ドキュメントのテキスト内容です。
                indexing_technique:
                  type: string
                  enum:
                    - high_quality
                    - economy
                  description: >-
                    ナレッジベースに最初のドキュメントを追加する際に必須です。以降のドキュメントでは省略するとナレッジベースのインデックス方式を継承します。`high_quality`
                    は埋め込みモデルによる精密検索、`economy` はキーワードベースのインデックスを使用します。
                doc_form:
                  type: string
                  enum:
                    - text_model
                    - hierarchical_model
                    - qa_model
                  default: text_model
                  description: >-
                    `text_model` は標準テキストチャンキング、`hierarchical_model`
                    は親子チャンク構造、`qa_model` は質問・回答ペアの抽出です。
                doc_language:
                  type: string
                  default: English
                  description: 処理最適化のためのドキュメント言語です。
                process_rule:
                  type: object
                  description: チャンキングの処理ルールです。
                  required:
                    - mode
                  properties:
                    mode:
                      type: string
                      enum:
                        - automatic
                        - custom
                        - hierarchical
                      description: >-
                        処理モードです。`automatic` は組み込みルールを使用、`custom`
                        は手動設定が可能、`hierarchical` は親子チャンク構造を有効にします（`doc_form:
                        hierarchical_model` と組み合わせて使用）。
                    rules:
                      type: object
                      properties:
                        pre_processing_rules:
                          type: array
                          items:
                            type: object
                            properties:
                              id:
                                type: string
                                enum:
                                  - remove_stopwords
                                  - remove_extra_spaces
                                  - remove_urls_emails
                                description: ルール識別子です。
                              enabled:
                                type: boolean
                                description: この前処理ルールが有効かどうかです。
                        segmentation:
                          type: object
                          properties:
                            separator:
                              type: string
                              default: |+

                              description: テキスト分割用のカスタムセパレーターです。
                            max_tokens:
                              type: integer
                              description: チャンクあたりの最大トークン数です。
                            chunk_overlap:
                              type: integer
                              default: 0
                              description: チャンク間のトークンオーバーラップです。
                retrieval_model:
                  $ref: '#/components/schemas/RetrievalModel'
                  description: 検索モデルの設定です。このナレッジベースをクエリする際のチャンクの検索方法とランキング方法を制御します。
                embedding_model:
                  type: string
                  description: >-
                    埋め込みモデル名です。[利用可能なモデルを取得](/api-reference/モデル/利用可能なモデルを取得) で
                    `model_type=text-embedding` を指定した際の `model` フィールドの値を使用します。
                embedding_model_provider:
                  type: string
                  description: >-
                    埋め込みモデルプロバイダーです。[利用可能なモデルを取得](/api-reference/モデル/利用可能なモデルを取得)
                    で `model_type=text-embedding` を指定した際の `provider`
                    フィールドの値を使用します。
                original_document_id:
                  type: string
                  description: バージョン管理用の元ドキュメント ID です。
      responses:
        '200':
          description: ドキュメントが正常に作成されました。
          content:
            application/json:
              schema:
                type: object
                properties:
                  document:
                    $ref: '#/components/schemas/Document'
                  batch:
                    type: string
                    description: インデックス進捗を追跡するためのバッチ ID です。
              examples:
                success:
                  summary: レスポンス例
                  value:
                    document:
                      id: a8e0e5b5-78c6-4130-a5ce-25feb0e0b4ac
                      position: 1
                      data_source_type: upload_file
                      data_source_info:
                        upload_file_id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
                      data_source_detail_dict:
                        upload_file:
                          id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
                          name: guide.txt
                          size: 2048
                          extension: txt
                          mime_type: text/plain
                          created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                          created_at: 1741267200
                      dataset_process_rule_id: e1f2a3b4-c5d6-7890-ef12-345678901234
                      name: guide.txt
                      created_from: api
                      created_by: ad313dd6-ef04-4dd1-a5b0-c0f0b9e2e7e4
                      created_at: 1741267200
                      tokens: 0
                      indexing_status: indexing
                      error: null
                      enabled: true
                      disabled_at: null
                      disabled_by: null
                      archived: false
                      display_status: indexing
                      word_count: 0
                      hit_count: 0
                      doc_form: text_model
                      doc_metadata: []
                      summary_index_status: null
                      need_summary: false
                    batch: '20250306150245647595'
        '400':
          description: >-
            - `provider_not_initialize` : 有効なモデルプロバイダーの認証情報が見つかりません。設定 →
            モデルプロバイダーで認証情報を完了してください。

            - `invalid_param` : ナレッジベースが存在しません。/ indexing_technique は必須です。/
            doc_form が無効です（`text_model`、`hierarchical_model`、または `qa_model`
            のいずれかである必要があります）。
          content:
            application/json:
              examples:
                provider_not_initialize:
                  summary: provider_not_initialize
                  value:
                    status: 400
                    code: provider_not_initialize
                    message: >-
                      No valid model provider credentials found. Please go to
                      Settings -> Model Provider to complete your provider
                      credentials.
                invalid_param_dataset:
                  summary: invalid_param
                  value:
                    status: 400
                    code: invalid_param
                    message: Dataset does not exist.
                invalid_param_indexing:
                  summary: invalid_param
                  value:
                    status: 400
                    code: invalid_param
                    message: indexing_technique is required.
components:
  schemas:
    RetrievalModel:
      type: object
      required:
        - search_method
        - reranking_enable
        - top_k
        - score_threshold_enabled
      properties:
        search_method:
          type: string
          description: 検索に使用される検索メソッドです。
          enum:
            - keyword_search
            - semantic_search
            - full_text_search
            - hybrid_search
        reranking_enable:
          type: boolean
          description: リランキングが有効かどうかです。
        reranking_model:
          type: object
          description: リランキングモデルの設定です。
          properties:
            reranking_provider_name:
              type: string
              description: リランキングモデルのプロバイダー名です。
            reranking_model_name:
              type: string
              description: リランキングモデル名です。
        reranking_mode:
          type: string
          enum:
            - reranking_model
            - weighted_score
          nullable: true
          description: リランキングモードです。`reranking_enable` が `true` の場合は必須です。
        top_k:
          type: integer
          description: 返す結果の最大数です。
        score_threshold_enabled:
          type: boolean
          description: スコア閾値フィルタリングが有効かどうかです。
        score_threshold:
          type: number
          nullable: true
          description: 結果の最小関連性スコアです。`score_threshold_enabled` が `true` の場合にのみ有効です。
        weights:
          type: object
          nullable: true
          description: ハイブリッド検索の重み設定です。
          properties:
            weight_type:
              type: string
              description: セマンティック検索とキーワード検索の重みを調整するための戦略です。
              enum:
                - semantic_first
                - keyword_first
                - customized
            vector_setting:
              type: object
              description: セマンティック検索の重み設定です。
              properties:
                vector_weight:
                  type: number
                  description: セマンティック（ベクトル）検索結果に割り当てられた重みです。
                embedding_provider_name:
                  type: string
                  description: ベクトル検索に使用される埋め込みモデルのプロバイダーです。
                embedding_model_name:
                  type: string
                  description: ベクトル検索に使用される埋め込みモデルの名前です。
            keyword_setting:
              type: object
              description: キーワード検索の重み設定です。
              properties:
                keyword_weight:
                  type: number
                  description: キーワード検索結果に割り当てられた重みです。
        metadata_filtering_conditions:
          type: object
          nullable: true
          description: ドキュメントのメタデータが指定した条件に一致するチャンクのみを取得対象に絞り込みます。条件はサーバーサイドで評価されます。
          properties:
            logical_operator:
              type: string
              enum:
                - and
                - or
              default: and
              nullable: true
              description: 複数の条件を組み合わせる方法です。
            conditions:
              type: array
              nullable: true
              description: 評価するメタデータ条件のリストです。
              items:
                type: object
                required:
                  - name
                  - comparison_operator
                properties:
                  name:
                    type: string
                    description: 比較対象とするメタデータフィールド名です。
                  comparison_operator:
                    type: string
                    description: >-
                      適用する比較方法です。文字列演算子（`contains`、`not contains`、`start
                      with`、`end with`、`is`、`is not`、`empty`、`not
                      empty`、`in`、`not
                      in`）は文字列または配列のメタデータに作用します。数値演算子（`=`、`≠`、`>`、`<`、`≥`、`≤`）は数値メタデータに作用します。時刻演算子（`before`、`after`）は時刻メタデータに作用します。
                    enum:
                      - contains
                      - not contains
                      - start with
                      - end with
                      - is
                      - is not
                      - empty
                      - not empty
                      - in
                      - not in
                      - '='
                      - ≠
                      - '>'
                      - <
                      - ≥
                      - ≤
                      - before
                      - after
                  value:
                    nullable: true
                    description: >-
                      比較対象の値です。型は `comparison_operator`
                      によって異なります。ほとんどの文字列演算子では文字列、`in` および `not in`
                      では文字列配列、数値演算子では数値、`empty` および `not empty` では省略します。
                    oneOf:
                      - type: string
                      - type: array
                        items:
                          type: string
                      - type: number
    Document:
      type: object
      properties:
        id:
          type: string
          description: ドキュメントの一意識別子です。
        position:
          type: integer
          description: リスト内のドキュメントの表示位置です。
        data_source_type:
          type: string
          description: >-
            ドキュメントの作成方法です。ファイルアップロードの場合は `upload_file`、Notion インポートの場合は
            `notion_import` です。
        data_source_info:
          type: object
          description: 生のデータソース情報です。`data_source_type` によって異なります。
        data_source_detail_dict:
          type: object
          description: ファイル詳細を含む詳細なデータソース情報です。
        dataset_process_rule_id:
          type: string
          description: このドキュメントに適用された処理ルールの ID です。
        name:
          type: string
          description: ドキュメント名です。
        created_from:
          type: string
          description: ドキュメントの作成元です。API で作成した場合は `api`、UI で作成した場合は `web` です。
        created_by:
          type: string
          description: ドキュメントを作成したユーザーの ID です。
        created_at:
          type: number
          description: 作成タイムスタンプ（Unix エポック、秒単位）です。
        tokens:
          type: integer
          description: ドキュメント内の合計トークン数です。
        indexing_status:
          type: string
          description: >-
            現在のインデックスステータスです。`waiting` はキュー待ち、`parsing` はコンテンツ抽出中、`cleaning`
            はノイズ除去中、`splitting` はチャンキング中、`indexing` はベクトル構築中、`completed`
            は準備完了、`error` は失敗、`paused` は手動一時停止を示します。
        error:
          type: string
          nullable: true
          description: インデックス作成が失敗した場合のエラーメッセージです。エラーなしの場合は `null` です。
        enabled:
          type: boolean
          description: このドキュメントが検索に対して有効かどうかです。
        disabled_at:
          type: number
          nullable: true
          description: ドキュメントが無効化されたタイムスタンプです。有効な場合は `null` です。
        disabled_by:
          type: string
          nullable: true
          description: ドキュメントを無効化したユーザーの ID です。有効な場合は `null` です。
        archived:
          type: boolean
          description: ドキュメントがアーカイブ済みかどうかです。
        display_status:
          type: string
          description: '`indexing_status` と `enabled` 状態から導出されたユーザー向け表示ステータスです。'
        word_count:
          type: integer
          description: ドキュメントの合計単語数です。
        hit_count:
          type: integer
          description: ドキュメントが検索クエリでマッチした回数です。
        doc_form:
          type: string
          description: >-
            ドキュメントのチャンキングモードです。`text_model` は標準テキストチャンキング、`hierarchical_model`
            は親子構造、`qa_model` は QA ペア抽出を示します。
        doc_metadata:
          type: array
          description: このドキュメントに割り当てられたメタデータ値です。
          items:
            type: object
            properties:
              id:
                type: string
                description: メタデータフィールドの識別子です。
              name:
                type: string
                description: メタデータフィールド名です。
              type:
                type: string
                description: メタデータフィールドの値の種類です。
              value:
                type: string
                description: このドキュメントのメタデータ値。
        summary_index_status:
          type: string
          nullable: true
          description: このドキュメントの要約インデックスのステータスです。要約インデックスが設定されていない場合は `null` です。
        need_summary:
          type: boolean
          description: このドキュメントに要約を生成する必要があるかどうかです。
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      bearerFormat: API_KEY
      description: >-
        API Key 認証です。すべての API リクエストにおいて、`Authorization` HTTP ヘッダーに `Bearer `
        プレフィックスを付けた API Key を含めてください。例：`Authorization: Bearer {API_KEY}`。**API
        Key はサーバーサイドに保存し、クライアントサイドで共有・保存しないことを強く推奨します。API Key
        の漏洩は深刻な結果につながる可能性があります。**

````