Job Description
Hybrid, Bangalore or Pune, India JOB OVERVIEW Are you passionate about automation, cloud infrastructure, Kubernetes, and reliability engineering As a Principal Production Engineer (SRE) at Legion, you will build and operate a secure, highly scalable, and cost-effective AWS/Kubernetes-based cloud platform. You will work across infrastructure automation, CI/CD pipelines, observability, and production reliability. Simply put, the SRE team ensures Legion's platform is reliable, scalable, and continuously improving for our customers. This role includes participation in an on-call rotation. RESPONSIBILITIES AND DUTIES Serve as subject matter expert for the entire production infrastructure stack and software build pipelines. Work independently on initiatives to improve security, stability, and responsiveness of Legion's applications. Train and mentor other members of the team. Collaborate across teams to share knowledge and best practices widely within the Legion organization. Support and operate Legion's AWS-based cloud platform and Kubernetes (EKS) environments. Leverage GenAI tools (e.g., Claude Code, Codex, or similar) to accelerate infrastructure development, automation, and auto-remediation of common production issues. Build and maintain infrastructure-as-code using Terraform. Develop automation and internal tooling using Go or Python. Improve CI/CD pipelines to increase deployment safety and velocity. Define and improve monitoring, alerting, and observability systems. Respond to production incidents, conduct root cause analysis, and implement systemic improvements. Develop and automate operational runbooks and remediation workflows. Support production deployments, including during off-hours as needed. REQUIRED SKILLS AND QUALIFICATIONS 8+ years of experience in SRE, DevOps, or SaaS production operations. 5+ years of hands-on experience operating large scale production workloads in AWS. Strong experience with Terraform and infrastructure-as-code practices. 5+ years o