Full-Time Sr. Staff Site Reliability Engineer
Sentinellabs is hiring a remote Full-Time Sr. Staff Site Reliability Engineer. The career level for this job opening is Expert and is accepting USA based applicants remotely. Read complete job description before applying.
Sentinellabs
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
About Us: SentinelOne is defining the future of cybersecurity through our XDR platform. We are a values-driven team focused on collaboration and results.
What are we looking for? We are seeking a Senior Staff Engineer to join our Site Reliability Engineering Team.
As a Senior Staff SRE, you will:
- Architect and lead implementation of advanced observability, automated triage, and self-healing capabilities.
- Drive proactive incident management through smart alert correlation, automated root cause analysis, and autonomous remediation.
- Define and implement Service Level Objectives (SLOs).
- Partner with software engineers, SREs, and data scientists to implement and refine monitoring, alerting, and SLO solutions.
- Lead initiatives to promote best practices and knowledge sharing.
- Mentor engineers and contribute to a culture of reliability engineering excellence.
What skills and knowledge should you bring?
- Extensive SRE Experience: Proven experience in architecting and implementing SRE solutions at scale within a microservices or distributed systems environment. 10+ years of experience, with 5+ years supporting enterprise SaaS environments.
- Technical Expertise: Deep knowledge of incident management, alert correlation, automated triage, self-healing strategies, and SLO frameworks. Strong understanding of observability platforms.
- Programming & Scripting: Proficient in programming languages (e.g., Python, Go, Java) with automation and scripting experience.
- Machine Learning & Data Analysis: Experience with machine learning, anomaly detection, or data analytics.
- Cloud Infrastructure: Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes), with experience in infrastructure-as-code (e.g., Terraform).
- Problem-Solving & Decision-Making: Ability to make critical architectural decisions.