Integrating with GPUStack for Local Model Deployment

GPUStack is an open-source GPU cluster manager for running large language models(LLMs).

Dify allows integration with GPUStack for local deployment of large language model inference, embedding and reranking capabilities.

Deploying GPUStack

You can refer to the official Documentation for deployment, or quickly integrate following the steps below:

GPUStack provides a script to install it as a service on systemd or launchd based systems. To install GPUStack using this method, just run:

curl -sfL https://get.gpustack.ai | sh -s -

Run PowerShell as administrator (avoid using PowerShell ISE), then run the following command to install GPUStack:

Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content

Then you can follow the printed instructions to access the GPUStack UI.

Using a LLM hosted on GPUStack as an example:

In GPUStack UI, navigate to the "Models" page and click on "Deploy Model", choose Hugging Face from the dropdown.
Use the search bar in the top left to search for the model name Qwen/Qwen2.5-0.5B-Instruct-GGUF.
Click Save to deploy the model.

For more information about GPUStack, please refer to Github Repo.

Last updated 5 hours ago