Ask or search…
Comment on page


LocalAI is a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.
Dify allows integration with LocalAI for local deployment of large language model inference and embedding capabilities.

Deploying LocalAI

Before you start

When using Docker to deploy a private model locally, you might need to access the service via the container's IP address instead of This is because or localhost by default points to your host system and not the internal network of the Docker container. To retrieve the IP address of your Docker container, you can follow these steps:
  1. 1.
    First, determine the name or ID of your Docker container. You can list all active containers using the following command:
docker ps
  1. 2.
    Then, use the command below to obtain detailed information about a specific container, including its IP address:
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' container_name_or_ID
Please note that you usually do not need to manually find the IP address of the Docker container to access the service, because Docker offers a port mapping feature. This allows you to map the container ports to local machine ports, enabling access via your local address. For example, if you used the -p 80:80 parameter when running the container, you can access the service inside the container by visiting http://localhost:80 or
If you do need to use the container's IP address directly, the steps above will assist you in obtaining this information.

Starting LocalAI

You can refer to the official Getting Started guide for deployment, or quickly integrate following the steps below:
(These steps are derived from LocalAI Data query example)
  1. 1.
    First, clone the LocalAI code repository and navigate to the specified directory.
    $ git clone https://github.com/go-skynet/LocalAI
    $ cd LocalAI/examples/langchain-chroma
  2. 2.
    Download example LLM and Embedding models.
    $ wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
    $ wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
    Here, we choose two smaller models that are compatible across all platforms. ggml-gpt4all-j serves as the default LLM model, and all-MiniLM-L6-v2 serves as the default Embedding model, for quick local deployment.
  3. 3.
    Configure the .env file.
    $ mv .env.example .env
    NOTE: Ensure that the THREADS variable value in .env doesn't exceed the number of CPU cores on your machine.
  4. 4.
    Start LocalAI.
    # start with docker-compose
    $ docker-compose up -d --build
    # tail the logs & wait until the build completes
    $ docker logs -f langchain-chroma-api-1
    7:16AM INF Starting LocalAI using 4 threads, with models path: /models
    7:16AM INF LocalAI version: v1.24.1 (9cc8d9086580bd2a96f5c96a6b873242879c70bc)
    The LocalAI request API endpoint will be available at
    And it provides two models, namely:
    • LLM Model: ggml-gpt4all-j
      External access name: gpt-3.5-turbo (This name is customizable and can be configured in models/gpt-3.5-turbo.yaml).
    • Embedding Model: all-MiniLM-L6-v2
      External access name: text-embedding-ada-002 (This name is customizable and can be configured in models/embeddings.yaml).
  5. 5.
    Integrate the models into Dify.
    Go to Settings > Model Providers > LocalAI and fill in:
    Model 1: ggml-gpt4all-j
    • Model Type: Text Generation
    • Model Name: gpt-3.5-turbo
    • Server URL:
      If Dify is deployed via docker, fill in the host domain: http://<your-LocalAI-endpoint-domain>:8080, which can be a LAN IP address, like:
    Click "Save" to use the model in the application.
    Model 2: all-MiniLM-L6-v2
    • Model Type: Embeddings
    • Model Name: text-embedding-ada-002
    • Server URL:
      If Dify is deployed via docker, fill in the host domain: http://<your-LocalAI-endpoint-domain>:8080, which can be a LAN IP address, like:
    Click "Save" to use the model in the application.
For more information about LocalAI, please refer to: https://github.com/go-skynet/LocalAI