I built a self-hosted Kubernetes cluster on AWS EC2 to run ghibli transfomation app with inference on nvidia gpu. The whole setup is done using infrastructure-as-code using Terraformand Helm. In this post, I’ll share the architecture, challenges faced, and lessons learned throughout this journey.

Why Self-Host Kubernetes with GPU? Link to heading

Because I like pain and suffering ;-) In all seriousness, I wanted to see how hard can it be to build working k8s cluster on top of EC2 instances and run inference? It also has a few advantages:

  1. A ton of learning
  2. I don’t have any GPUs of my own so had to use some cloud :D
  3. Complete control over the cluster configuration
  4. Better understanding of AWS and Kubernetes internals
  5. I wanted something useful to do with Claude Code

Infrastructure Overview Link to heading

My cluster uses a minimalist but effective architecture:

  • 1 Control Plane node (t3.medium) running essential Kubernetes services
  • 1 Worker node with NVIDIA T4 GPU (g4dn.xlarge) for compute-intensive workloads
  • Network Load Balancer for external access to the API and applications

Here’s a visual representation of how everything fits together:

                                                    ┌──────────────────────┐
                                                    │                      │
                                                    │   AWS Cloud Region   │
                                                    │                      │
                                                    └──────────────────────┘
                                                              │
                                                              ▼
                          ┌──────────────────────────────────────────────────────────┐
                          │                        VPC 10.0.0.0/16                   │
                          │                                                          │
          ┌───────────────┴──────────────┐                 ┌───────────────────────┐ │
          │                              │                 │                       │ │
┌─────────▼─────────┐    ┌───────────────▼──────────┐      │                       │ │
│                   │    │                          │      │                       │ │
│   Public Subnet   │    │     Public Subnet        │      │                       │ │
│   10.0.1.0/24     │    │     10.0.2.0/24          │      │                       │ │
│                   │    │                          │      │   ┌─────────────────┐ │ │
│ ┌───────────────┐ │    │ ┌──────────────────────┐ │      │   │ Network Load    │ │ │
│ │ Control Plane │ │    │ │  Worker Node w/GPU   │ │      │   │ Balancer        │ │ │
│ │ t3.medium     │ │    │ │  g4dn.xlarge         │ │      │   └────────┬────────┘ │ │
│ │               │◄┼────┼─┼─────────────────────►│ │      │            │          │ │
│ │ kube-apiserver│ │    │ │ kubelet              │ │      │            │          │ │
│ │ etcd          │ │    │ │ NVIDIA GPU Driver    │ │      │            │          │ │
│ │ scheduler     │ │    │ │ containerd           │ │      │            │          │ │
│ │ controller-mgr│ │    │ │ NVIDIA Runtime       │ │      │            │          │ │
│ └───────────────┘ │    │ │                      │ │      │            │          │ │
│         ▲         │    │ │ ┌──────────────────┐ │ │      │            │          │ │
│         │         │    │ │ │ Ingress NGINX    │◄┼─┼──────┼────────────┘          │ │
│         │         │    │ │ │ Controller       │ │ │      │                       │ │
│         │         │    │ │ └──────────────────┘ │ │      │                       │ │
│         │         │    │ │                      │ │      │                       │ │
│         │         │    │ │ ┌──────────────────┐ │ │      │                       │ │
│         └─────────┼────┼─┼►│ CoreDNS          │ │ │      │                       │ │
│                   │    │ │ └──────────────────┘ │ │      │                       │ │
│                   │    │ │                      │ │      │                       │ │
│                   │    │ │ ┌──────────────────┐ │ │      │                       │ │
│                   │    │ │ │ Calico           │ │ │      │                       │ │
│                   │    │ │ └──────────────────┘ │ │      │                       │ │
│                   │    │ │                      │ │      │                       │ │
│                   │    │ │ ┌──────────────────┐ │ │      │                       │ │
│                   │    │ │ │ Ghibli App (Pod) │ │ │      │                       │ │
│                   │    │ │ │ w/GPU Access     │ │ │      │                       │ │
│                   │    │ │ └──────────────────┘ │ │      │                       │ │
│                   │    │ └──────────────────────┘ │      │                       │ │
└───────────────────┘    └──────────────────────────┘      └───────────────────────┘ │
                         │                                                           │
                         └───────────────────────────────────────────────────────────┘
                                                │
                                                ▼
                          ┌─────────────────────────────────────────┐
                          │                                         │
                          │               Internet                  │
                          │                                         │
                          └─────────────────────────────────────────┘

Setting Up GPU Infrastructure Link to heading

Integrating GPU support into Kubernetes was one of the most challenging and interesting aspects of this project. Here’s how I approached it:

Hardware Selection Link to heading

For the worker node, I chose a g4dn.xlarge instance which provides:

  • 4 vCPUs and 16GB RAM - enough for handling moderate workloads
  • 1 NVIDIA T4 GPU with 16GB VRAM - enough for the simple infrerence need I had
  • 2,560 CUDA cores and 320 Tensor cores

Software Stack Link to heading

Getting the GPU software stack right required careful version matching:

  • NVIDIA Drivers (535.xx series)
  • CUDA 12.1
  • containerd with nvidia-container-runtime
  • NVIDIA Device Plugin for Kubernetes integration

GPU Setup Process Link to heading

The worker node initialization process required several critical steps:

  1. Installing the correct NVIDIA drivers and libraries
  2. Configuring containerd to use the NVIDIA runtime
  3. Setting up the device plugin directory with proper permissions
  4. Creating the RuntimeClass for GPU workloads

Building a Ghibli-Style Image Transform Application Link to heading

To demonstrate the GPU capabilities, I built a PyTorch-based application that transforms regular photos into Ghibli-style animations.

Application Architecture Link to heading

The app follows a straightforward design:

  • Frontend: Simple HTML/CSS/JS for image upload and display
  • Backend: Flask application running the PyTorch model
  • GPU Acceleration: CUDA-powered tensor operations
  • Kubernetes Deployment: Pod with GPU resource requests

Request Flow Link to heading

When a user uploads an image, the request flows through:

  1. Network Load Balancer
  2. NGINX Ingress Controller
  3. Kubernetes Service
  4. Application Pod (with GPU access)
  5. Response with transformed image

Implementing TLS & Certificate Management Link to heading

Certificate Manager Setup Link to heading

I deployed cert-manager in the cluster with:

  • ClusterIssuer configured for Let’s Encrypt production
  • DNS01 validation using Cloudflare API for domain ownership verification

Certificate Lifecycle Link to heading

The automatic certificate process works as follows:

  1. Ingress resources request certificates via annotations
  2. cert-manager creates Certificate resources
  3. DNS validation occurs via Cloudflare
  4. Valid TLS certificates are stored as Kubernetes secrets
  5. Ingress controller uses these certificates to serve HTTPS

Optimizing Costs Link to heading

Sleep/Wake Functionality Link to heading

I implemented scripts to scale the cluster down when not in use:

  • Sleep Mode:

    • Scales EC2 instances to 0
    • Removes the Network Load Balancer
    • Retains all Kubernetes state (etcd data)
  • Wake Mode:

    • Restores instances to desired count
    • Recreates the Network Load Balancer

Cost Breakdown Link to heading

Component Type Monthly Cost
Control Plane 1 x t3.medium ~$30
Worker Nodes 1 x g4dn.xlarge ~$38 (spot)
Network LB - ~$16.43
EBS Storage gp3 ~$15
Total - ~$99/month

With sleep/wake functionality, I can reduce costs by a lot.

Implementing Monitoring Link to heading

I stood up a simple and most basic obserbability stack:

Kubernetes Dashboard Link to heading

The official Kubernetes Dashboard provides a visual interface for:

  • Cluster resource visualization
  • Deployment management
  • Pod logs and exec access
  • Resource utilization graphs

It’s secured with token-based authentication and accessed through kubectl proxy.

Prometheus and Grafana Link to heading

For metrics collection and visualization:

  • Prometheus collects metrics from:

    • Kubernetes components
    • Node resource utilization
    • Container metrics
    • GPU statistics (utilization, memory, temperature)
  • Grafana provides dashboards for:

    • Cluster overview
    • Node resource utilization
    • Pod resource consumption
    • GPU performance metrics

Automated Port Forwarding Link to heading

To simplify access to monitoring services, I implemented a background port forwarding system that:

  • Manages service connections
  • Provides automatic token retrieval
  • Tracks running port forwards
  • Offers a unified interface for starting/stopping

Configuration Management Approach Link to heading

Managing the configuration of such a complex system required a layered approach:

Terraform for Infrastructure Link to heading

I used Terraform to manage all AWS infrastructure:

  • VPC, subnets, security groups
  • EC2 instances via Auto Scaling Groups
  • Network Load Balancer configuration
  • IAM roles and policies

Terraform gave me declarative configuration, state tracking, and incremental changes capability.

Helm for Kubernetes Applications Link to heading

For complex Kubernetes applications, Helm charts were essential:

  • Monitoring stack (Prometheus, Grafana)
  • NGINX ingress controller
  • cert-manager

Helm simplified version management and configuration of these multi-resource applications.

Shell Scripts as Orchestration Layer Link to heading

Shell scripts served as the orchestration layer:

  • Component installation
  • Cluster sleep/wake functionality
  • Monitoring setup
  • Automated port forwarding

They integrate outputs from different tools and provide error handling for operational reliability.

Kubernetes Manifests for Applications Link to heading

For application-specific configuration, direct Kubernetes manifests provided:

  • Precise control over resources
  • Clarity for simpler deployments
  • Version control alongside application code

Makefile as User Interface Link to heading

A Makefile ties everything together with a unified command interface, hiding implementation details while ensuring correct execution order.

Challenges and Lessons Learned Link to heading

GPU Support Complexity Link to heading

Integrating NVIDIA GPUs with Kubernetes was challenging due to:

  • Version Compatibility: Finding compatible versions of NVIDIA drivers, CUDA, and Kubernetes components
  • Driver Installation: Ubuntu 22.04 required specific libraries not in default repositories
  • Runtime Configuration: Configuring containerd with NVIDIA runtime support
  • Kernel Modules: Ensuring NVIDIA modules loaded properly at boot

I solved these by creating a comprehensive worker node initialization script that installed exact versions, configured the runtime properly, and verified GPU visibility.

Networking Challenges Link to heading

My initial Flannel CNI implementation faced several issues:

  • Pod Communication Problems: Inconsistent connectivity between pods
  • DNS Resolution Failures: CoreDNS pods with connectivity issues
  • Service Accessibility: Unreachable ClusterIP services
  • Network Policy Limitations: Inadequate security controls

Migrating to Calico CNI resolved these issues with more reliable networking, better diagnostics, robust network policies, and improved performance.

Future Plans Link to heading

While the current setup works well, I’m planning several improvements:

  1. Enhanced Security: Implementing more comprehensive security practices
  2. Horizontal Pod Autoscaling: Dynamic scaling based on GPU utilization
  3. CI/CD Pipeline: Automating deployment workflows
  4. Advanced Monitoring: Setting up focused alerts and additional metrics
  5. GitOps Adoption: Moving toward a declarative GitOps approach with ArgoCD or Flux

Conclusion Link to heading

Had fun playing with AWS infra, Kubernetes, Terraform, Helm and Claude. Ghibli app wasn’t really doing a decent job of transformation and required more tuning which I wasn’t very interested in doing. ;-) I had it up at ghibli.doandlearn.app for sometime but becuase I don’t like being broke, I since turned down the infra to save cost. But the good thing is, everything is automated so I can bring it up if needed pretty quickly if needed.

Here is a sample: ghibli_sample Link to heading

The complete code for this project is available in my selfhost_k8s repository.