đ§ From âNo Operating Systemâ to Kubernetes HA: My Real Talos Linux Bare Metal Journey
3 min read
¡
Aug 5, 2025
--
1
Share
How a Single Command (âapply-configâ) Saved My Cluster After Hours of Headaches
Introduction
Deploying a Talos Linux HA Kubernetes cluster on bare metal was supposed to be a weekend homelab upgrade. Instead, it became a battle with boot menus, cryptic logs, and command-line âgotchas.â
Hereâs the real story of how I almost gave up (and nearly went the Proxmox route), what tripped me up, and the lessons that got my cluster online.
The Setup
- 3x physical servers (no virtualization, no cloud)
- Talos Linux v1.10, Kubernetes v1.33 âOctarineâ
- Static IPs, Cilium CNI, VIP-based HA control plane
- No SSH, no manual hacks - just Talos and API-driven everything
What Went Wrong: Top Issues & Fixes
1. Disk Order Confusion
Symptom:
My SSD appeared as /dev/sdb when booting from USB, not /dev/sda.
Why:
Linux enumerates devices in the order it discovers them. When booting from USB, the USB stick is often /dev/sda, and your real SSD is /dev/sdb.
Lesson:
Check disk order with talosctl get disks ... before running any install commands!
2. The Reboot Trap: âNo Operating System Foundâ
Symptom:
After running what I thought was the install, rebooting gave me âNo operating system found.â
What I did wrong:
I used:
talosctl apply ...
instead of:
talosctl apply-config --nodes <NODE_IP> --file controlplane.yaml --insecure
Whatâs the difference?
talosctl apply-configis mandatory for pushing the Talos configuration to a bare metal node.talosctl applyis only for Kubernetes resource manifests (after your cluster is up).
Lesson:
Donât confuse the two - your cluster depends on it!
3. âNo Route to Hostâ for the VIP
Symptom:
Flood of logs:connect: no route to host 10.20.0.50:6443
Why:
In HA clusters, the API VIP isnât live until after you bootstrap etcd.
Ignore these errors until you run:
talosctl bootstrap -n <any_control_plane_node_ip>
4. CSR Approval and Pod Security Hurdles
- CSR Approval:
Kubelet server certificate rotation will fail unless you manually approve CSRs or deploy a serving cert approver. - PodSecurity:
Default policies are strict in Kubernetes v1.33 - add securityContext to your manifests.
The âAlmost Proxmoxâ Moment
After hours of:
- Power cycling servers
- Watching the same errors
- Feeling stuck in a loop
âŚI was this close to wiping everything and using Proxmox for virtualization.
But with a fresh mind (and a lot of Googling), I realized my main error:
Iâd never run**apply-config**, so my nodes never joined the cluster.
How I Got It Working
Boot from Talos USB, get node IP
talosctl apply-config –nodes
–file controlplane.yaml –insecure Wait for install to complete
Remove USB, reboot, and let Talos boot from disk
Repeat for all control plane nodes
Once all nodes are online, run the bootstrap!
Fetch kubeconfig, install Cilium, and start deploying workloads
Final Thoughts and Takeaways
- The difference between
applyandapply-configis the difference between ânothing worksâ and âcluster online.â - Donât reboot or remove install media until youâre sure Talos is installed on disk.
- Expect error logs during initial cluster formation - donât panic.
- Stick with it. Most issues are just âone missed stepâ away from being solved.
Would I Do It Again?
Absolutely - but now I know to read the docs and trust the process.
Bare metal K8s isnât for the faint-hearted, but the victory is sweet when you finally see all nodes healthy.
Have questions, or stuck on your own Talos journey? Drop a comment or DM - Iâm happy to help!
Appendix: My Cheat Sheet for Next Time
# Install Talos OS | Apply config (crucial!)
talosctl apply-config --nodes <IP> --file controlplane.yaml --insecure
# Bootstrap cluster (after all nodes are online)
talosctl bootstrap -n <any_control_plane_node_ip>
# Approve any pending CSRs if needed
kubectl get csr | awk '/Pending/ {print $1}' | xargs -r kubectl certificate approve