Deploy Open WebUI on Kubernetes with ArgoCD and Helm and connect to OpenRouter

This article will guide you through the complete process of setting up Open WebUI, one of the best open source LLM frontends on Kubernetes with ArgoCD and Helm, and connect it to OpenRouter for a smart, cost efficient setup. As a bonus, we'll deploy SearXNG for local web search as well.

Series - Cost Efficient AI Assistants

This post is a part of the Cost Efficient AI Assistants series on Kubito. Make sure you check out the other posts in this series in the menu above.

If you’ve been thinking of moving away from ChatGPT, but you don’t know where to start, you’ve come to the right place. This article will explain how to set up Open WebUI, an amazing and feature rich LLM frontend, where you can connect any LLM you want, no matter if the model is local or external. It also supports RAG, Web Search, Model Cloning, SSO login, Voice with STT/TTS, and much more!

Before we begin, let’s create and get our own OpenRouter API key. This will allow you to use any model available there through a single, unified API. It’s also really great for bypassing rate limiting on models like Anthropic’s Claude 3.5 Sonnet. OpenRouter also has providers that offer free usage of some models, and they are quite generous, since you can use models like Llama 3.1 70B or Llama 3.2 90B Vision for free. It will be slower, but you know, they are really nice to have sometimes.

Go to OpenRouter, click on the Sign In button in the upper right and then the Sign up button. Now create the account however you want and log in once that’s done.

Now go to your account settings, and set some things up first if you’d like. I have enabled the Low Balance Notifications email notifications and I’ve set the notification to trigger once my credits are below $2, and I have set the default model to Claude 3.5 Sonnet. Now go to Keys and click on Create Key. Give it a name, set a credit limit if you want, and once done, you’ll get the API key. Save it in a really secure place because you won’t be able to see it again later.

Now it’s time to add some credits to your account, but you can do this later if you want. Go to Credits and click on the Add Credits button. Finish that process, add as much as you want, and they’ll be available on your account. Keep in mind, Stripe takes a little fee for the payment, but it’s a really small one and in my opinion very much worth it for what you’re getting, for example I added 20$ and it took 21.43$ from my card.

In the long run, this is a better solution for me than paying a fixed price point for a single model on ChatGPT, since I don’t use it daily and I am never getting the full value. OpenRouter allows me to be flexible and use cheaper or more expensive models depending on what I need them for.

You can see the usage in the Credits menu once you’ve interacted with the API, and it’s really nice that it shows you which client used the API and how many tokens you’ve used and how much they cost. You can check each model’s price per million input/output tokens in the OpenRouter dashboard.

Note
I am using Claude 3.5 Sonnet for complex discussions or code, since at the time of writing this, it costs 3$ per million input tokens and 15$ per million output tokens. For Aider (which we’ll discuss in the next post of this series), I am mostly using the Qwen 2.5 Coder 32B model which costs 0.18$ per million tokens for both input and output, so it’s extremely cost effective and good for simpler coding tasks. When I need something more complex, I switch the Aider model to Claude 3.5 Sonnet or something else.

We are finished with the OpenRouter part, let’s proceed with the other components.

To begin the process, since we’ll be deploying it on Kubernetes, make sure you already have an existing Kubernetes cluster. ArgoCD is optional, you can simply deploy with Helm directly, but this guide will show the process with both.

To deploy it, visit the official Helm repository on ArtifactHub page. If you prefer GitHub, you can find the direct link to the Helm repo here. The chart has Ollama, Open WebUI Pipelines, and Apache Tika as optional dependencies (subcharts).

In short, Ollama is an amazing way to run local models and expose them through a single API, Pipelines is a way to extend Open WebUI and it comes into play when you’re dealing with computationally heavy tasks (e.g., running large models or complex logic) that you want to offload from your main Open WebUI instance for better performance and scalability, and Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

For both methods, you’ll want to create the API key secret before proceeding. It’s best if you have a secret manager like HashiCorp Vault or Bitnami Sealed Secrets. I won’t go into detail of using these, so if you have any of them or similar, follow their guidelines for adding the secret.

To create this as a simple Kubernetes Secret, prepare your OpenRouter API key and create the secret, we’ll name it open-webui-secret and the key will be named open-router-api-key:

1
2
3
4
5
6
# create the namespace
kubectl create namespace open-webui
# create the secret
kubectl create secret generic open-webui-secret \
  --from-literal=open-router-api-key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
  --namespace=open-webui
Note
If you have OIDC or any other related stuff that is sensitive, feel free to expand the secret for the WebUI.

First, if you want web search with SearXNG, deploy it first in the same namespace. Feel free to skip this part and remove the SearXNG related flags in the installation command after if you don’t want the web search feature in Open WebUI.

For this, we’ll use the Kubito SearXNG Helm Chart. The default values are already prepared for Open WebUI integration per their recommended configuration.

To deploy it, simply leave the configuration values as they are, unless you want to set your own secret key in the settings generated with openssl rand -hex 32. If you’re changing the port to something else in the config part of the values, make sure to set the service.port to the same one. Make sure you check the other default values to configure it to your own taste.

Choose whether you want this deployed with Helm or ArgoCD.

Simply run the following commands to install SearXNG using Helm only.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# add the repo
helm repo add kubitodev https://charts.kubito.dev
# update the local repositories
helm repo update
# install searxng
helm install searxng https://charts.kubito.dev \
  --version 1.0.1 \
  --namespace open-webui \
  --create-namespace \
  --set env[0].name=TZ \
  --set env[0].value="Europe/London"

First, create the following ArgoCD application in YAML format, replace the secret_key to anything you want with openssl rand -hex 32 if you want. Check the application and feel free to change anything you want, like adding sync waves or disabling self healing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: searxng
  namespace: argocd
spec:
  project: kubito-infrastructure
  source:
    repoURL: 'https://charts.kubito.dev'
    targetRevision: 1.0.1
    chart: searxng
    helm:
      values: |
        env:
          - name: TZ
            value: "Europe/London"

        service:
          type: ClusterIP
          port: 8080

        config:
          settings:
            data: |
              use_default_settings: true

              server:
                secret_key: "57dc63125e7eef404481411b99c21fb9a5763b724b0bc88f2440ef373cf94809"
                limiter: false
                image_proxy: true
                port: 8080
                bind_address: "0.0.0.0"

              ui:
                static_use_hash: true

              search:
                safe_search: 0
                autocomplete: ""
                default_lang: ""
                formats:
                  - html
                  - json        
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: open-webui
  syncPolicy:
    automated:
      selfHeal: true
    syncOptions:
      - CreateNamespace=false

After this, depending on how your infrastructure is structured, deploy the YAML application. For simplicity, let’s just show how it’s done with pure kubectl:

1
kubectl apply -f searxng.yaml

Once done, SearXNG will be deployed.

Without further ado, let’s start with deploying this. We’ll disable the Ollama, Pipelines, and Tika subcharts for simplicity, since in this series we’ll be using OpenRouter to connect to models. We will be enabling the following things, so replace what’s needed, feel free to disable some of them if you don’t need them, and enable some that you need by looking at the available Helm values and the Open WebUI Environment Variables Configuration:

  • Persistence
  • OIDC Login Only
  • OpenRouter with existing secret
  • Web Search with SearXNG

Simply install with Helm here. I’ve written comments between the flags so you know which section they belong to.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# add the repo
helm repo add open-webui https://helm.openwebui.com
# update the local repositories
helm repo update
# install open-webui
helm install open-webui https://helm.openwebui.com \
  --version 3.6.0 \
  --namespace open-webui \
  --create-namespace \
  --set ollama.enabled=false \
  --set pipelines.enabled=false \
  --set tika.enabled=false \
  # persistence section
  --set persistence.enabled=true \
  --set persistence.size=2Gi \
  --set persistence.accessModes[0]=ReadWriteOnce \
  # oidc login only section
  --set extraEnvVars[1].name=ENABLE_SIGNUP \
  --set extraEnvVars[1].value="false" \
  --set extraEnvVars[2].name=ENABLE_LOGIN_FORM \
  --set extraEnvVars[2].value="false" \
  --set extraEnvVars[3].name=ADMIN_EMAIL \
  --set extraEnvVars[3].value="[email protected]" \
  --set extraEnvVars[4].name=OAUTH_CLIENT_ID \
  --set extraEnvVars[4].valueFrom.secretKeyRef.name=open-webui-secret \
  --set extraEnvVars[4].valueFrom.secretKeyRef.key=oidc-client-id \
  --set extraEnvVars[5].name=OAUTH_CLIENT_SECRET \
  --set extraEnvVars[5].valueFrom.secretKeyRef.name=open-webui-secret \
  --set extraEnvVars[5].valueFrom.secretKeyRef.key=oidc-client-secret \
  --set extraEnvVars[6].name=OAUTH_SCOPES \
  --set extraEnvVars[6].value="openid profile email" \
  --set extraEnvVars[7].name=ENABLE_OAUTH_SIGNUP \
  --set extraEnvVars[7].value="true" \
  --set extraEnvVars[8].name=OPENID_PROVIDER_URL \
  --set extraEnvVars[8].value="https://sso.kubito.example/realms/kubito/.well-known/openid-configuration" \
  # openrouter with existing secret section
  --set extraEnvVars[9].name=ENABLE_OPENAI_API \
  --set extraEnvVars[9].value="true" \
  --set extraEnvVars[10].name=OPENAI_API_BASE_URL \
  --set extraEnvVars[10].value="https://openrouter.ai/api/v1" \
  --set extraEnvVars[11].name=OPENAI_API_KEY \
  --set extraEnvVars[11].valueFrom.secretKeyRef.name=open-webui-secret \
  --set extraEnvVars[11].valueFrom.secretKeyRef.key=open-router-api-key \
  # web search with searxng section
  --set extraEnvVars[12].name=ENABLE_RAG_WEB_SEARCH \
  --set extraEnvVars[12].value="true" \
  --set extraEnvVars[13].name=RAG_WEB_SEARCH_ENGINE \
  --set extraEnvVars[13].value="searxng" \
  --set extraEnvVars[14].name=RAG_WEB_SEARCH_RESULT_COUNT \
  --set extraEnvVars[14].value="3" \
  --set extraEnvVars[15].name=RAG_WEB_SEARCH_CONCURRENT_REQUESTS \
  --set extraEnvVars[15].value="10" \
  --set extraEnvVars[16].name=SEARXNG_QUERY_URL \
  --set extraEnvVars[16].value="http://searxng:8080/search?q=<query>"

Same as with SearXNG’s app, create the YAML file with the configuration. Check the application and feel free to change anything you want, like adding sync waves or disabling self healing. I’ve written comments between the values so you know which section they belong to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: open-webui
  namespace: argocd
spec:
  project: kubito-infrastructure
  source:
    repoURL: 'https://helm.openwebui.com'
    targetRevision: 3.6.0
    chart: open-webui
    helm:
      values: |
        ollama:
          enabled: false

        pipelines:
          enabled: false

        tika:
          enabled: false

        # persistence section
        persistence:
          enabled: true
          size: 2Gi
          accessModes:
            - ReadWriteOnce

        extraEnvVars:
          # oidc login only section
          - name: ENABLE_SIGNUP
            value: "false"
          - name: ENABLE_LOGIN_FORM
            value: "false"
          - name: ADMIN_EMAIL
            value: "[email protected]"
          - name: OAUTH_CLIENT_ID
            valueFrom:
              secretKeyRef:
                name: open-webui-secret
                key: oidc-client-id
          - name: OAUTH_CLIENT_SECRET
            valueFrom:
              secretKeyRef:
                name: open-webui-secret
                key: oidc-client-secret
          - name: OAUTH_SCOPES
            value: "openid profile email"
          - name: ENABLE_OAUTH_SIGNUP
            value: "true"
          - name: OPENID_PROVIDER_URL
            value: "https://sso.kubito.example/realms/kubito/.well-known/openid-configuration"

          # openrouter with existing secret section
          - name: ENABLE_OPENAI_API
            value: "true"
          - name: OPENAI_API_BASE_URL
            value: "https://openrouter.ai/api/v1"
          - name: OPENAI_API_KEY
            valueFrom:
              secretKeyRef:
                name: open-webui-secret
                key: open-router-api-key

           # web search with searxng section
          - name: ENABLE_RAG_WEB_SEARCH
            value: "true"
          - name: RAG_WEB_SEARCH_ENGINE
            value: "searxng"
          - name: RAG_WEB_SEARCH_RESULT_COUNT
            value: "3"
          - name: RAG_WEB_SEARCH_CONCURRENT_REQUESTS
            value: "10"
          - name: SEARXNG_QUERY_URL
            value: "http://searxng:8080/search?q=<query>"        
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: open-webui
  syncPolicy:
    automated:
      selfHeal: true
    syncOptions:
      - CreateNamespace=false

After this, depending on how your infrastructure is structured, deploy the YAML application. For simplicity, let’s just show how it’s done with pure kubectl:

1
kubectl apply -f searxng.yaml

Once done, Open WebUI will be deployed.

Now that everything is set up and deployed, we can proceed with the usage. Since most of the things are already configured via the configuration before, there isn’t a lot of things left. You can enable ingress or gateway API access to the instance, or simply port forward it to test it out, it’s up to you. This guide won’t go into detail on how you’ll be accessing the instance outside the cluster.

Once you’ve opened the instance in your browser, depending on how you configured the login process, either user/password or SSO login, you need to register. The first user to log in always gets the Admin access!

The UI is very simple, and yet much more configurable than the other options I’ve found. If the OpenRouter connection is successful, in the upper left corner you can choose a model, and there will be a lot of them, so simply choose one and start interacting with any LLM you want!

In the Workspace tab below New Chat, you can see the list of models and rename them, set logos for them, add functions and tools, and much more.

In the user settings, you can set defaults for anything you want, including setting up a system prompt for every model, although you can set a system prompt per model in the Workspace/Models tab. For example, this is the system prompt that I’ve set and found great results with any model:

1
Provide responses that are concise, smart, clear, and always friendly and helpful, with a professional tone. Focus on essential information only, without filler, apologies, emojis, or humor. Use markdown format for code, keeping it highly technical and including short explanations. For non-code answers, choose the most suitable format (markdown or plain text). Avoid cliches and unnecessary elaboration. When giving advice, insights, or examples, prioritize clarity and brevity. Always ask targeted questions to clarify or expand on details if it will improve the response.

All in all, explore the UI, the user settings, the admin dashboard, and I am sure you’ll get used to it very quickly and have a great experience like I do.

This whole setup helped me explore many different models in a cost effective and simple way. I am hosting all of this on my Raspberry Pi Kubernetes cluster without any issues, so I hope you won’t have issues as well in any kind of environment. The open source nature of the tools in this guide lets you see what’s happening in the code at all times, which is a big plus for me personally. Have fun using Open WebUI, and make sure to check the next guide in this series for configuring and using Aider as your personal AI Assistant, all with the same OpenRouter API Key!