Full-Time Senior Site Reliability Engineer
EverBridge is hiring a remote Full-Time Senior Site Reliability Engineer. The career level for this job opening is Experienced and is accepting India based applicants remotely. Read complete job description before applying.
EverBridge
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
About the company Everbridge empowers enterprises and government organizations to anticipate, mitigate, respond to, and recover stronger from critical events. Resilient organizations minimize impact to people and operations, and return to productivity faster.
What you'll do Are you motivated by a sense of purpose to keep people safe? Are you passionate about innovating on technology to develop robust architecture principles? Join the Everbridge Kubernetes Platform team.
Play a critical role in ensuring service quality and availability of Everbridge solutions. This includes designing, deploying, and managing Kubernetes at scale, evangelizing Kubernetes and SRE best practices.
Responsibilities
- Own and maintain Kubernetes infrastructure within AWS (VPCs, EC2, Transit Gateways, IAM roles, Route53, S3, SGs, NACLs).
- Build upon operational availability, security, scalability, efficiency, monitoring, and service reliability of Kubernetes solutions.
- Collaborate with Agile teams (Architects, Developers, Quality, Data, Security engineers) on designing and implementing reliable solutions.
- Research and implement SRE and Kubernetes best practices using automation, collaboration, and data-driven decisions.
- Participate in on-call rotation to resolve production escalations.
Qualifications
- 3+ years AWS production environment experience.
- 2+ years Kubernetes experience (EKS, AKS, GKE, self-managed).
- 3+ years Terraform experience.
- Experience with GitLab CI/CD, Packer, Docker, EKS, Kubernetes, Spinnaker, Helm, Argo, Jenkins.
- Experience with Telemetry tools (Datadog, SumoLogic, Grafana, Prometheus).
- Automation experience in Python, Go, Bash, Java.
- Configuration management tools (Salt, Ansible, AWS user_data).
- DevOps/SRE production environment experience.
- Agile experience.
- Large-scale UNIX/Linux experience.