Integrate Models on LiteLLM Proxy

On this page

Quick Integration
Step 1. Start LiteLLM Proxy Server
Step 2. Start LiteLLM Proxy
Step 3. Integrate LiteLLM Proxy in Dify
More Information

LiteLLM Proxy is a proxy server that allows:

Calling 100+ LLMs (OpenAI, Azure, Vertex, Bedrock) in the OpenAI format
Using Virtual Keys to set Budgets, Rate limits and track usage

Dify supports integrating LLM and Text Embedding capabilities models available on LiteLLM Proxy

Quick Integration

Step 1. Start LiteLLM Proxy Server

LiteLLM Requires a config with all your models defined - we will call this file litellm_config.yaml Detailed docs on how to setup litellm config - here

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2. Start LiteLLM Proxy

docker run \
    -v $(pwd)/litellm_config.yaml:/app/config.yaml \
    -p 4000:4000 \
    ghcr.io/berriai/litellm:main-latest \
    --config /app/config.yaml --detailed_debug

On success, the proxy will start running on http://localhost:4000

Step 3. Integrate LiteLLM Proxy in Dify

In Settings > Model Providers > OpenAI-API-compatible, fill in:

Model Name: gpt-4
Base URL: http://localhost:4000 Enter the base URL where the LiteLLM service is accessible.
Model Type: Chat
Model Context Length: 4096 The maximum context length of the model. If unsure, use the default value of 4096.
Maximum Token Limit: 4096 The maximum number of tokens returned by the model. If there are no specific requirements for the model, this can be consistent with the model context length.
Support for Vision: Yes Check this option if the model supports image understanding (multimodal), like gpt4-o.

Click “Save” to use the model in the application after verifying that there are no errors. The integration method for Embedding models is similar to LLM, just change the model type to Text Embedding.

More Information

For more information on LiteLLM, please refer to:

Edit this page | Report an issue

Integrate Local Models Deployed by Ollama Integrate Local Models Deployed by GPUStack

Getting Started

Guide

Workshop

Community

Plugins

Development

Learn More

Policies

Integrate Models on LiteLLM Proxy

Quick Integration

Step 1. Start LiteLLM Proxy Server

Step 2. Start LiteLLM Proxy

Step 3. Integrate LiteLLM Proxy in Dify

More Information

Getting Started

Guide

Workshop

Community

Plugins

Development

Learn More

Policies

​Quick Integration

​Step 1. Start LiteLLM Proxy Server

​Step 2. Start LiteLLM Proxy

​Step 3. Integrate LiteLLM Proxy in Dify

​More Information

Quick Integration

Step 1. Start LiteLLM Proxy Server

Step 2. Start LiteLLM Proxy

Step 3. Integrate LiteLLM Proxy in Dify

More Information