Objective
A major Indonesian government financial institution required a comprehensive assessment of its Kubernetes-based a critical nationwide tax administration system infrastructure to evaluate the current operational condition, identify potential scalability and reliability risks, and establish a structured roadmap aligned with Kubernetes production best practices.
PT Boer Technology conducted a full infrastructure assessment covering Kubernetes cluster architecture, node utilization, storage configuration, RabbitMQ, Redis, and workload management to support long-term operational stability and modernization initiatives.
Context
The critical platform operated on a large-scale Kubernetes environment consisting of:
-
Over 20+ Kubernetes nodes (Including Control Plane and Worker Nodes)
-
Kubernetes platform
-
Hundreds of deployments across multiple active namespace
-
Over 200+ HorizontalPodAutoscaler resources
-
Stateful workloads including RabbitMQ and Redis clusters
-
Ubuntu LTS infrastructure environment
-
Calico CNI and containerd runtime deployment
The assessment identified several operational and architectural challenges, including:
-
Kubernetes and container runtime versions reaching End-of-Life (EOL)
-
Absence of Pod Security implementation
-
Lack of ResourceQuota and LimitRange policies
-
Shared Redis storage architecture creating Single Point of Failure (SPOF)
-
RabbitMQ memory configuration operating close to OOM threshold
-
Over-provisioned compute allocation reducing cluster efficiency
-
Legacy autoscaling API usage with limited scaling flexibility
-
Limited workload disruption protection mechanisms
These conditions created potential risks for scalability, operational resilience, and future platform expansion.
Approach
PT Boer Technology performed a structured Kubernetes infrastructure assessment and operational analysis focused on production readiness and platform optimization.
The assessment activities included:
-
Reviewing Kubernetes cluster architecture and infrastructure topology
-
Collecting node-level CPU and memory utilization metrics
-
Evaluating Kubernetes runtime, networking, and storage configurations
-
Assessing RabbitMQ cluster sizing, persistence, quorum, and resource allocation
-
Analyzing Redis persistence architecture and storage performance
-
Reviewing autoscaling implementation and workload scheduling strategy
-
Identifying security gaps and operational risks
-
Benchmarking existing configurations against Kubernetes best practices
Based on the findings, PT Boer Technology produced a prioritized recommendation roadmap covering critical, medium, and low-priority improvements.
Key recommendations included:
-
Kubernetes cluster upgrade to supported stable versions
-
Migration from legacy HPA APIs to autoscaling/v2
-
Implementation of Pod Security baseline policies
-
ETCD storage expansion for production-grade scalability
-
Deployment of PodDisruptionBudget protections
-
Redis storage migration from shared RWX storage to dedicated RWO volumes
-
RabbitMQ memory watermark optimization
-
Compute resource right-sizing for improved scheduler efficiency
-
Implementation of Descheduler and TopologySpreadConstraint
-
Adoption of Gateway API architecture for traffic management modernization
Results
The assessment provided financial insitution with a comprehensive operational visibility framework and a phased modernization roadmap for the critical system on Kubernetes platform.
Key outcomes included:
-
Complete visibility into Kubernetes cluster health and workload utilization
-
Identification of production stability and scalability bottlenecks
-
Structured prioritization of infrastructure improvements based on operational impact
-
Risk mitigation recommendations for RabbitMQ quorum and Redis persistence
-
Identification of inefficient compute allocation across stateful workloads
-
Security hardening recommendations aligned with Kubernetes best practices
-
Scalability improvement roadmap for autoscaling and workload distribution
-
Foundation for future Kubernetes platform modernization initiatives
Before
-
Kubernetes cluster operated on End-of-Life versions
-
Redis workloads relied on shared NFS storage with SPOF risk
-
RabbitMQ memory configuration operated near OOM threshold
-
No Pod Security enforcement implemented
-
Limited workload disruption safeguards
-
Manual traffic management configuration using nginx.conf
-
Cluster resource reservations not fully optimized
-
HPA implementation limited to CPU-based autoscaling
After
-
Clear modernization roadmap established for cluster upgrade and optimization
-
Production risks identified with mitigation recommendations
-
Redis persistence architecture improvement plan defined
-
RabbitMQ operational resilience and quorum protection strategy established
-
Security baseline implementation framework documented
-
Autoscaling enhancement strategy prepared using autoscaling/v2
-
Infrastructure scalability and workload balancing improvements defined
-
Operational governance and Kubernetes best-practice alignment strengthened
Takeaways
-
Large-scale Kubernetes environments require periodic operational assessments to maintain production readiness and scalability.
-
Stateful services such as RabbitMQ and Redis require dedicated architecture optimization to avoid performance bottlenecks and operational risks.
-
Incremental modernization strategies allow organizations to improve infrastructure reliability while minimizing disruption to critical services.