⚠️ 本文档由 AI 自动翻译。如有任何不准确之处,请参考英文原版。
multimodal-Parent-Child 和 multimodal-General。
开发用于多模态数据处理的工具插件时,若希望插件输出的多模态数据(如文字、图片、音视频等)能够被知识库节点正确识别并向量化,需要完成以下配置:
-
在工具代码中,调用接口上传并构造文件对象
files。 -
在工具提供者 YAML 文件中,将
output_schema声明为multimodal-Parent-Child或multimodal-General。
上传并构造文件对象
在处理多模态数据(如图片)时,需要先通过 Dify 的工具会话接口上传文件,以获取文件元数据。 下面以 Dify 官方插件 Dify Extractor 为例,展示如何上传文件并构造文件对象。复制
# Upload the file using the tool session
file_res = self._tool.session.file.upload(
file_name, # filename
file_blob, # file binary data
mime_type, # MIME type, e.g., "image/png"
)
# Generate a Markdown image reference using the file preview URL
image_url = f""
UploadFileResponse 对象,包含文件的基本信息:
复制
from enum import Enum
from pydantic import BaseModel
class UploadFileResponse(BaseModel):
class Type(str, Enum):
DOCUMENT = "document"
IMAGE = "image"
VIDEO = "video"
AUDIO = "audio"
@classmethod
def from_mime_type(cls, mime_type: str):
if mime_type.startswith("image/"):
return cls.IMAGE
if mime_type.startswith("video/"):
return cls.VIDEO
if mime_type.startswith("audio/"):
return cls.AUDIO
return cls.DOCUMENT
id: str
name: str
size: int
extension: str
mime_type: str
type: Type | None = None
preview_url: str | None = None
name, size, extension, mime_type 等)映射到多模态输出结构中的 files 字段。
复制
{
"$id": "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"version": "1.0.0",
"type": "object",
"title": "Multimodal Parent-Child Structure",
"description": "Schema for multimodal parent-child structure (v1)",
"properties": {
"parent_mode": {
"type": "string",
"description": "The mode of parent-child relationship"
},
"parent_child_chunks": {
"type": "array",
"items": {
"type": "object",
"properties": {
"parent_content": {
"type": "string",
"description": "The parent content"
},
"files": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "file name"
},
"size": {
"type": "number",
"description": "file size"
},
"extension": {
"type": "string",
"description": "file extension"
},
"type": {
"type": "string",
"description": "file type"
},
"mime_type": {
"type": "string",
"description": "file mime type"
},
"transfer_method": {
"type": "string",
"description": "file transfer method"
},
"url": {
"type": "string",
"description": "file url"
},
"related_id": {
"type": "string",
"description": "file related id"
}
},
"required": ["name", "size", "extension", "type", "mime_type", "transfer_method", "url", "related_id"]
},
"description": "List of files"
},
"child_contents": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of child contents"
}
},
"required": ["parent_content", "child_contents"]
},
"description": "List of parent-child chunk pairs"
}
},
"required": ["parent_mode", "parent_child_chunks"]
}
声明多模态输出结构
多模态数据的结构由 Dify 官方提供的 JSON Schema 定义。 为了让知识库节点识别插件的多模态输出类型,需在插件的提供者 YAML 文件中将output_schema 的 result 字段指向对应的官方 Schema URL。
复制
output_schema:
type: object
properties:
result:
# multimodal-Parent-Child
$ref: "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json"
# multimodal-General
# $ref: "https://dify.ai/schemas/v1/multimodal_general_structure.json"
multimodal-Parent-Child 为例,一个完整的 YAML 文件配置如下:
复制
identity:
name: multimodal_tool
author: langgenius
label:
en_US: multimodal tool
zh_Hans: 多模态提取器
pt_BR: multimodal tool
description:
human:
en_US: Process documents into multimodal-Parent-Child chunk structures
zh_Hans: 将文档处理为多模态父子分块结构
pt_BR: Processar documentos em estruturas de divisão pai-filho
llm: Processes documents into hierarchical multimodal-Parent-Child chunk structures
parameters:
- name: input_text
human_description:
en_US: The text you want to chunk.
zh_Hans: 输入文本
pt_BR: Conteúdo de Entrada
label:
en_US: Input Content
zh_Hans: 输入文本
pt_BR: Conteúdo de Entrada
llm_description: The text you want to chunk.
required: true
type: string
form: llm
output_schema:
type: object
properties:
result:
$ref: "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json"
extra:
python:
source: tools/parent_child_chunk.py