Full-Time Staff Site Reliability Engineer
Sentinellabs is hiring a remote Full-Time Staff Site Reliability Engineer. The career level for this job opening is Experienced and is accepting USA based applicants remotely. Read complete job description before applying.
Sentinellabs
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
About Us: SentinelOne is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to threats in real-time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With SentinelOne, organizations gain full transparency into everything happening across the network at machine speed – to defeat every attack, at every stage of the threat lifecycle. We are a values-driven team. Due to Federal Government contract requirement, U.S. Citizenship is required for this position. FedRamp Staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne.
What Are We Looking For? We are looking for an experienced SRE, well-versed in large-scale SaaS or cloud engineering environments. As a Site Reliability Engineer, your primary responsibility will be the stability, reliability, and scalability of SentinelOne’s products and services.
What Will You Do? Support the stability, reliability, and scalability of SentinelOne’s distributed systems through various tasks including managing Kubernetes, creating IaC, and leading troubleshooting during incident response. Identify areas, such as performance issues and availability concerns, and perform other technical and architectural reviews to partner with fellow engineering teams to improve overall reliability. Design and implement comprehensive monitoring and alerting, SLIs/SLOs, and critical user journeys to provide deeper insight into performance and availability. Analyze systems, identify toil, and develop and implement strategies like automation to streamline and optimize SRE support of critical systems.
What Skills and Experience Will You Need? 7+ years of experience in Site Reliability Engineering, preferably with a large scale SaaS product or large cloud-based distributed system. 5+ years of production experience with orchestration systems like Kubernetes, Nomad or Mesos. Experience with a scripting language, such as Python, Golang, Java, or Ruby. Familiarity with running Java and JavaScript applications, including build and deploy. AWS experience, and familiarity with other platforms like GCP. Experience using Infrastructure as Code (IaC) to setup cloud-native services. Familiarity with CI and practical delivery using Jenkins, GHA, ArgoCD, etc. or similar; familiarity with deployment strategies like blue-green, rolling deploys, canary deploys, and best practices around deployment automation.
Why Us? You will be joining a cutting-edge company, with industry-leading benefits including medical, vision, dental, 401(k), commuter benefits, and unlimited PTO. This U.S. role has a base pay range that will vary based on the location of the candidate.
Preferred: 2+ years of experience in a FedRAMP environment. Ability to work in a diverse and distributed team. Self-starter attitude, with passion for new technologies and empathy for legacy systems. Ability to learn quickly, and navigate through unfamiliar programming languages, systems, and processes.