Full-Time Senior Site Reliability Engineer
Nexthink is hiring a remote Full-Time Senior Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Nexthink
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Nexthink is seeking a Site Reliability Engineer passionate about building and running a high-performance cloud platform and enabling best-in-class site reliability and operations practices. This role supports US-based operations, focusing on enabling Nexthink to deliver to the US Public Sector market, specifically a FedRAMP Moderate offering. The candidate will implement modern cloud-native SRE processes and manage the operations of Nexthink's multi-tenant, microservices-based cloud platform. The platform has multiple instances deployed globally.
Responsibilities include:
- Overseeing the design, deployment, and management of scalable and secure cloud infrastructure.
- Driving automation of infrastructure provisioning, configuration, and management using IaC tools.
- Developing and maintaining monitoring, logging, and alerting systems to ensure high availability and performance.
- Leading efforts in performance tuning and optimization.
- Ensuring implementation and maintenance of security controls to achieve FedRAMP compliance. Conducting regular security assessments, vulnerability scans, and penetration testing.
- Leading incident management efforts and ensuring rapid resolution.
- Developing and implementing strategies to improve incident response.
- Collaborating with development, operations, and security teams to integrate reliability and security.
- Communicating effectively with stakeholders providing updates on system performance, reliability, and compliance.
Qualifications:
- Bachelor's degree in Computer Science or related field (or equivalent experience)
- 5+ years of experience in SRE, DevOps, or a related role
- Proficiency in cloud platforms (AWS, Azure, GCP)
- Strong scripting and programming skills (Python, Bash, Go, or similar)
- Experience with IaC tools (Terraform, CloudFormation, Ansible)
- Knowledge of containerization and orchestration (Docker, Kubernetes)
- Familiarity with CI/CD pipelines and tools
- In-depth knowledge of FedRAMP requirements
- Experience with security tools (SIEM, IDS/IPS)
- Strong problem-solving and communication skills
- Ability to work independently and as part of a team in a fast-paced environment
- Strong communication skills in English