Day 2: GitOps Handoff — Bootstrapping ArgoCD on Talos with OpenTofu

The Day 1 post ended with ArgoCD deployed as a Helm release inside 02-kubernetes/. It's running, but it can't do anything yet. It has no repository credentials, no decryption keys, and no root Application to tell it what to manage. "Running" is not "working."

This post covers the bootstrap resources that bridge Day 1 and Day 2: the credentials, the KSOPS decryption setup, and the root Application that kicks off the GitOps loop. Once that loop starts, OpenTofu steps back.

The Chicken-and-Egg Problem

ArgoCD manages applications from git repositories. To pull from a private repo, it needs credentials. To decrypt SOPS-encrypted manifests, it needs the cluster Age key. But those credentials and keys are themselves secrets that need to live somewhere.

The solution is straightforward: OpenTofu bootstraps the credentials as Kubernetes Secrets before ArgoCD tries to use them. ArgoCD doesn't manage its own credentials, at least not at bootstrap time. OpenTofu creates them, ArgoCD consumes them, and the two never overlap.

Everything below lives in argocd-bootstrap.tf.

Repository Credentials

Two private repos need to be registered: the platform gitops repo (controlled by Nubosas) and the tenant workload repo (controlled by the tenant). Both use a dedicated service account with repo-scoped GitHub tokens.

The tokens are stored SOPS-encrypted in secrets.enc.yaml and decrypted at apply time via the carlpett/sops provider that was declared back in Day 1:

data "sops_file" "secrets" {
  source_file = "${path.module}/environments/${var.environment}/secrets.enc.yaml"
}

Each repo is registered as a Kubernetes Secret with the label ArgoCD watches for:

resource "kubernetes_secret" "argocd_repo_platform" {
  metadata {
    name      = "repo-platform-gitops"
    namespace = "argocd"
    labels = {
      "argocd.argoproj.io/secret-type" = "repository"
    }
  }

  data = {
    type     = "git"
    url      = "https://github.com/example-org/platform-gitops.git"
    username = "example-bot"
    password = data.sops_file.secrets.data["github_token_platform"]
  }

  depends_on = [helm_release.argocd]
}

Same pattern for the tenant repo, different token. The depends_on ensures ArgoCD is running before we create secrets in its namespace.

This is the Day 1 encryption strategy paying off. The sops provider was declared but unused in the initial commit because no secrets were needed for Talos bootstrap. Now it's active, decrypting at apply time with the personal Age key.

KSOPS: Decrypting Secrets at Runtime

OpenTofu decrypts secrets at apply time. But ArgoCD needs to decrypt SOPS-encrypted manifests at sync time, inside the cluster, without the personal key.

That's where KSOPS comes in. It's a Kustomize plugin that intercepts SOPS-encrypted resources during ArgoCD's render phase and decrypts them using the cluster Age key.

ArgoCD's container image doesn't ship with KSOPS. The standard approach is an init container that downloads the binary into a shared volume before the repo-server starts.

The Init Container

repoServer:
  initContainers:
    - name: install-ksops
      image: alpine:3.21
      command: ["sh", "-c"]
      args:
        - |
          set -e
          wget -qO- https://github.com/viaduct-ai/kustomize-sops/releases/download/v4.4.0/ksops_4.4.0_Linux_x86_64.tar.gz \
            | tar xz -C /custom-tools
          chmod +x /custom-tools/ksops
          mkdir -p /custom-tools/plugin/viaduct.ai/v1/ksops
          cp /custom-tools/ksops /custom-tools/plugin/viaduct.ai/v1/ksops/ksops
      volumeMounts:
        - name: custom-tools
          mountPath: /custom-tools
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
        seccompProfile:
          type: RuntimeDefault

Two things worth calling out. First, the binary gets laid down twice: once as a flat ksops for Kustomize to invoke directly, and once inside plugin/viaduct.ai/v1/ksops/ksops so Kustomize can also discover it as an exec plugin via KUSTOMIZE_PLUGIN_HOME. Different code paths in Kustomize look for it in different places.

Second, the security context. This init container downloads and unpacks a single binary, so it has no reason to run as root, hold any capabilities, or escalate privileges. runAsUser: 65534 is nobody. The seccompProfile: RuntimeDefault restricts syscalls to the container runtime's default set. Defense in depth for a container that runs for two seconds, but it costs nothing.

Mounting the Binary and the Key

The KSOPS binary goes into /usr/local/bin/ksops on the repo-server via the shared volume, plus the plugin tree gets mounted under /home/argocd/ksops-plugin. The cluster Age key gets mounted as a directory so SOPS finds it at the path it expects:

repoServer:
  volumes:
    - name: custom-tools
      emptyDir: {}
    - name: sops-age-key
      secret:
        secretName: ksops-age-key
        defaultMode: 292  # 0444 — readable by argocd user (uid 999)
  volumeMounts:
    - name: custom-tools
      mountPath: /usr/local/bin/ksops
      subPath: ksops
    - name: custom-tools
      mountPath: /home/argocd/ksops-plugin
      subPath: plugin
    - name: sops-age-key
      mountPath: /home/argocd/.config/sops/age
      readOnly: true
  env:
    - name: SOPS_AGE_KEY_FILE
      value: /home/argocd/.config/sops/age/age.agekey
    - name: XDG_CONFIG_HOME
      value: /home/argocd/.config
    - name: KUSTOMIZE_PLUGIN_HOME
      value: /home/argocd/ksops-plugin

Three env vars matter here. SOPS_AGE_KEY_FILE tells SOPS exactly where the private key lives. XDG_CONFIG_HOME keeps SOPS's config search rooted under the argocd user's home. KUSTOMIZE_PLUGIN_HOME is what makes Kustomize discover KSOPS as an exec plugin. Miss any of these and the symptom is the same: opaque "decryption failed" or "plugin not found" errors at sync time.

The Age key Secret is created by OpenTofu alongside the repo credentials:

resource "kubernetes_secret" "ksops_age_key" {
  metadata {
    name      = "ksops-age-key"
    namespace = "argocd"
  }

  data = {
    "age.agekey" = data.sops_file.secrets.data["age_cluster_private_key"]
  }

  depends_on = [helm_release.argocd]
}

Notice: the cluster private key is itself SOPS-encrypted in secrets.enc.yaml, decrypted by OpenTofu using the personal key, then stored as a Kubernetes Secret for KSOPS. SOPS decrypting a key that enables more SOPS decryption.

Finally, Kustomize needs the alpha plugins flag to recognize KSOPS. This goes under configs.cm (the argocd-cm ConfigMap), not configs.params (the argocd-cmd-params-cm ConfigMap that drives CLI flags):

configs:
  params:
    "server.insecure": true
  cm:
    "kustomize.buildOptions": "--enable-alpha-plugins --enable-exec"

Easy to put in the wrong block — configs.params looks like the natural home for a "build option" but ArgoCD reads kustomize.buildOptions from the argocd-cm ConfigMap. Wrong block, silent no-op.

The Three-Layer Decryption Flow

Day 1 set up the two-key design. Here's how it completes:

At rest in git: secrets are encrypted with SOPS using Age public keys. Both the personal key and cluster key are recipients, so either can decrypt.
At bootstrap (tofu apply): the carlpett/sops provider decrypts secrets.enc.yaml using the personal Age key (available locally). The decrypted values become Kubernetes Secrets.
At runtime (ArgoCD sync): when ArgoCD reconciles a repo containing SOPS-encrypted manifests, KSOPS intercepts the render, reads the cluster Age key from the mounted Secret, and decrypts inline. The plaintext manifest is applied to the cluster. Plaintext never touches git.

Two keys, three layers, one encryption strategy designed from Day 1.

The Root Application

With credentials and decryption in place, the last step is telling ArgoCD what to manage. This is the App of Apps pattern: a single root Application that watches a directory for other Application definitions.

# root-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/example-org/platform-gitops.git
    targetRevision: HEAD
    path: apps
    directory:
      recurse: true
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

directory.recurse: true is important. The apps/ directory uses subdirectories (apps/platform/, apps/tenants/) to organize Application definitions. Without recursion, ArgoCD only scans the top level and misses everything in subdirectories. I initially had this flat and had to add recursion when the directory structure grew.

prune: true means ArgoCD deletes resources that no longer exist in git. selfHeal: true means it reverts any manual changes. Together, they enforce that git is the single source of truth.

Why `null_resource` Is Ugly but Right

There's no clean way to apply a custom resource through the OpenTofu Kubernetes provider and have ArgoCD manage it afterwards. The kubernetes_manifest resource could create the Application, but then both OpenTofu and ArgoCD would be managing the same resource, fighting over the source of truth. The root Application needs to be created once and then owned entirely by ArgoCD.

The pragmatic answer is null_resource with local-exec:

resource "local_sensitive_file" "kubeconfig" {
  content  = talos_cluster_kubeconfig.cluster.kubeconfig_raw
  filename = "${path.module}/.kubeconfig"
}

resource "null_resource" "argocd_root_app" {
  triggers = {
    once = "bootstrap"
  }

  provisioner "local-exec" {
    command     = "kubectl apply -f ${path.module}/root-app.yaml"
    environment = {
      KUBECONFIG = local_sensitive_file.kubeconfig.filename
    }
  }

  depends_on = [
    helm_release.argocd,
    kubernetes_secret.argocd_repo_platform,
    kubernetes_secret.argocd_repo_tenant,
    kubernetes_secret.ksops_age_key,
  ]
}

triggers = { once = "bootstrap" } is a static value. It triggers on the first tofu apply and never again, because the trigger value never changes. Subsequent applies see the null_resource in state with the same trigger and skip it. This is a fire-and-forget bootstrap step.

The depends_on list is the full dependency chain: ArgoCD must be running, both repo credentials must exist, and the KSOPS key must be in place. Only then does kubectl apply create the root Application. After that, ArgoCD takes over.

Destroying the null_resource from OpenTofu state does NOT delete the ArgoCD Application from the cluster. It's a one-way handoff.

Multi-Tenancy: AppProject as the Authorization Boundary

Once the root Application syncs, it picks up everything in apps/, including the tenant onboarding resources. The natural question at this point: what stops a tenant from deploying into kube-system or another tenant's namespace?

The answer is the AppProject CRD. Every Application references a project, and the project defines what that Application is allowed to do.

The tenant AppProject:

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: tenant-a
  namespace: argocd
spec:
  description: Tenant workloads
  sourceRepos:
    - https://github.com/example-tenant/tenant-gitops.git
  destinations:
    - server: https://kubernetes.default.svc
      namespace: tenant-a
  clusterResourceWhitelist: []
  namespaceResourceBlacklist:
    - group: ""
      kind: ResourceQuota
    - group: ""
      kind: LimitRange
  orphanedResources:
    warn: true

sourceRepos locks the tenant to their own repo. destinations restricts deployment to the tenant-a namespace only. clusterResourceWhitelist: [] blocks all cluster-scoped resources (Namespaces, ClusterRoles, CRDs). namespaceResourceBlacklist prevents the tenant from overriding the ResourceQuota and LimitRange that the platform manages.

If the tenant pushes a Deployment targeting kube-system to their gitops repo, the ArgoCD application controller rejects it at sync time. No workaround.

The Default Project Gap

The tenant AppProject was configured correctly. But I noticed the default project (which ships with ArgoCD) was wide open: sourceRepos: ['*'], destinations unrestricted, full cluster-scoped access.

Both the root Application and the tenant onboarding Application use project: default, which is fine since they source from the platform-controlled repo. The problem is lateral: if anyone creates an Application under project: default (misconfiguration, a future templating mistake, direct kubectl access), they get unrestricted access to the entire cluster from any repo.

The fix: lock the default project's sourceRepos to the platform repo only.

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: default
  namespace: argocd
spec:
  description: Platform-managed resources only
  sourceRepos:
    - https://github.com/example-org/platform-gitops.git
  destinations:
    - server: https://kubernetes.default.svc
      namespace: '*'
  clusterResourceWhitelist:
    - group: '*'
      kind: '*'

Still allows all destinations and cluster-scoped resources (the platform needs full control), but any Application in project: default can only pull from code Nubosas controls. Defense in depth for a gap that might never be exploited.

This lives in platform/default-project.yaml and gets deployed by a platform-config Application that the root app picks up automatically. The pattern scales: any future platform-level resource (NetworkPolicies, additional AppProject restrictions) goes into platform/ and is synced through the same path.

There's more to multi-tenancy security (Cilium NetworkPolicies for runtime network isolation, ArgoCD RBAC if tenants ever get UI access), but those are separate concerns for a future post.

The Full Dependency Chain

From Day 1 through Day 2, the dependency chain is explicit:

flowchart TD
    A["Talos machine secrets
+ node configs"] --> B["talos_machine_bootstrap
(etcd init)"]
    B --> C["talos_cluster_kubeconfig
(client certs)"]
    C --> D["helm_release.cilium
(CNI, pods can schedule)"]
    D -. "Day 2 starts" .-> E["helm_release.argocd"]
    E --> F1["kubernetes_secret
argocd_repo_* (repo creds)"]
    E --> F2["kubernetes_secret
ksops_age_key"]
    E --> F3["local_sensitive_file
kubeconfig"]
    F1 --> G["null_resource.argocd_root_app
(kubectl apply root-app.yaml)"]
    F2 --> G
    F3 --> G
    G -. "OpenTofu stops" .-> H["root Application
(auto-syncs apps/)"]
    H --> I1["platform-config
(locked default project)"]
    H --> I2["tenant-onboarding
(namespace + quota + AppProject)"]
    H --> I3["tenant Application
(tenant-gitops workloads)"]

    classDef day1 fill:#e8f0ff,stroke:#0063FF,color:#0A354F
    classDef day2 fill:#fff7e6,stroke:#C64350,color:#0A354F
    class A,B,C,D day1
    class E,F1,F2,F3,G,H,I1,I2,I3 day2

One tofu apply gets you from Talos API to a fully operational GitOps loop. After that, OpenTofu manages the cluster infrastructure (Talos configs, Cilium, ArgoCD itself) and ArgoCD manages everything that runs on top of it.

The boundary is the same principle from Day 0, applied one layer up: each tool owns what it can observe and control through an API. OpenTofu talks to the Talos and Kubernetes APIs. ArgoCD talks to git and the Kubernetes API. They don't overlap.

What's Next

The GitOps loop is running. Workloads can deploy. But the cluster still needs storage for stateful applications, which is where Rook-Ceph comes in. That's the subject of the next post.