EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.We are seeking a skilledLead Site Reliability Engineerto join our team. As a Lead SRE, you will work to ensure that our systems, services, and applications running on Google Cloud Platform (GCP) are reliable, performant, and scalable. The ideal candidate will possess strong technical skills, have a passion for automation and infrastructure-as-code, and thrive in a collaborative team environment.

Want more jobs like this?GetjobsinGurgaon, Indiadelivered to your inbox every week.

Want more jobs like this?

GetjobsinGurgaon, Indiadelivered to your inbox every week.

Get Jobs

#LI-DNI#EasyApplyResponsibilitiesParticipate in on-call rotations and provide 24/7 support for critical systemsRespond to alerts of running services and applications, conducting RCADeploy microservices according to release cadenceDesign, implement, and maintain scalable and reliable systems and applications on Google Cloud Platform (GCP)Develop and maintain infrastructure as code using TerraformCollaborate with engineering teams to identify and prioritize reliability, performance improvements, and right-sizing of dedicated cloud resourcesParticipate in incident management and response using ServiceNowManage and resolve technical issues and tickets using JiraDevelop a knowledge base for maintaining existing infrastructure and monitoring servicesRequirements8+ years of experience in an SRE, DevOps, or system administration roleDeep knowledge of Google Cloud Platform (GCP)Experience with incident management and response using ServiceNow or similar toolsStrong problem-solving skills and experience in debugging complex technical issuesUnderstanding of monitoring, logging, and alerting systems (preferably Cloud Monitoring)Familiarity with version control using GitHubExperience with infrastructure-as-codeExcellent communication and collaboration skillsExperience with Kubernetes and containerization technologiesExperience with Terraform for infrastructure-as-codeStrong understanding of the Software Development Life Cycle (SDLC) and CI/CD pipelines, and experience with CI/CD toolsExperience with monitoring and logging tools like Prometheus, Grafana, Catchpoint, and ELKWe offerOpportunity to work on technical challenges that may impact across geographiesVast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certificationsOpportunity to share your ideas on international platformsSponsored Tech Talks & HackathonsUnlimited access to LinkedIn learning solutionsPossibility to relocate to any EPAM office for short and long-term projectsFocused individual developmentBenefit package:Health benefitsRetirement benefitsPaid time offFlexible benefitsForums to explore beyond work passion (CSR, photography, painting, sports, etc.)