Full-Time Director of Reliability Engineering

Astronomer is hiring a remote Full-Time Director of Reliability Engineering. The career level for this job opening is Manager and is accepting USA based applicants remotely. Read complete job description before applying.

This job was posted 6 months ago and is likely no longer active. We encourage you to explore more recent opportunities on our site. However, you may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings available here.

Astronomer

Job Title

Director of Reliability Engineering

Posted

6 months ago on 30th May 2025

Career Level

Full-Time

Career Level

Manager

Locations Accepted

USA

Salary

YEAR $260000 - $290000

Job Details

About this role: We are seeking a highly experienced and visionary Director of Reliability Engineering to lead our global reliability initiatives. This will be a central role in our organization, which supports critical services for companies around the world in every industry.

This strategic leadership role is responsible for defining, driving, and evolving operational excellence, platform reliability, and automation at scale across our cloud-native infrastructure.

You will lead, mentor, and grow high-performing SRE teams, collaborate cross-functionally, and play a critical role in ensuring a seamless and resilient customer experience for many of the world’s largest companies.

What you get to do:

Define and lead the strategic direction for SRE, reliability, and operational excellence across the organization.
Collaborate with Software Engineers and Product Managers on projects that impact users and be directly responsible for service uptime.
Own end-to-end availability and performance of key services; build automation to prevent recurrence of issues and automate responses to all non-exceptional service conditions.
Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of services.
Champion observability, automation, and self-healing systems to proactively prevent downtime and reduce manual toil.
Evolve and manage our incident and change management processes, including root cause analysis and postmortems.
Drive adoption of SLOs, SLIs, and error budgets to align engineering efforts with business priorities.
Work with operational support to manage global on-call rotations using a follow-the-sun model to ensure around-the-clock coverage.
Support on-call culture by defining best practices for incident response, escalation policies, and operational readiness.
Partner closely with engineering, product, security, and program management teams to improve reliability without slowing innovation.
Cultivate a culture of continuous improvement, high accountability, and blameless incident management.
Lead and mentor the team, establishing credibility through high-quality technical execution.
Provide strong mentorship and leadership to grow the next generation of reliability and engineering leaders.

What you bring to the role:

10+ years of experience in software engineering, SRE, or DevOps roles.
5+ years in a technical leadership capacity, ideally in a high-growth, cloud-native SaaS environment.
Proven success operating and scaling large-scale, distributed, mission-critical systems.
Deep expertise in public cloud platforms (AWS, Azure, or GCP).
Hands-on knowledge of infrastructure as code (Terraform, CloudFormation), container orchestration (Kubernetes), and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).
Experience implementing and managing CI/CD pipelines and secure development practices.
Demonstrated ability to hire, grow, and lead globally distributed SRE teams.
Strong decision-making, communication, and cross-functional collaboration skills.

Bonus points if you have:

Bachelor’s or Master’s degree in Computer Science, Information Systems, or a related field.
Experience managing vendor relationships and partnerships.
Comfortable presenting to executive stakeholders in high-stakes environments.
Proven ability to scale operations during rapid business or organizational growth.
Strong analytical mindset with the ability to evaluate trade-offs between reliability, speed, and innovation.

Skills

Cloud Computing DevOps Software Engineering SRE Technical Leadership

FAQs

What is the last date for applying to the job?

The deadline to apply for Full-Time Director of Reliability Engineering at Astronomer is 29th of June 2025 . We consider jobs older than one month to have expired.

Which countries are accepted for this remote job?

This job accepts [ USA ] applicants. .

Apply Now

Related Jobs You May Like

Azure DevOps Engineer

Jersey City, NJ

2 days ago

.NET

Azure

DevOps

Derex Technologies Inc

Full-Time

Experienced

Lead Palantir Developer

Seattle, WA

2 days ago

CI/CD Pipelines

Data Engineering

Palantir Foundry

Logic20/20 Inc.

Full-Time

Experienced

YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA

3 days ago

Application Support

AWS

Cloud Services (EC2, S3, IAM, ELB, VPC, VPN)

Sutherland

Full-Time

Experienced

Staff DataOps Engineer

Remote, India

3 days ago

AWS

CI/CD

DataOps

Nagarro

Full-Time

Experienced

Query Tuning Specialist - Database Performance - Postgre

Austin, Texas

3 days ago

Database Management

Performance Tuning

Problem-solving

ServiceNow

Full-Time

Experienced

DevOps Engineer, Playout

New York, New York

3 days ago

CICD

Cloud Services (AWS, GCP, Azure)

DevOps

NBCUniversal

Full-Time

Experienced

YEAR $90000 - $110000

Query Tuning Specialist - Database Performance - Postgres

Austin, Texas

3 days ago

Database Management

Performance Tuning

SaaS/PaaS/Cloud Development

ServiceNow

Full-Time

Experienced

Lead Palantir Developer

Seattle, WA

4 days ago

CI/CD Pipelines

Cloud ETL

Palantir Foundry

Logic20/20 Inc.

Full-Time

Experienced

YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA

4 days ago

Application Support

AWS

Cloud Security

Sutherland

Full-Time

Experienced

Site Reliability Engineer

Stamford, Connecticut

4 days ago

Cloud Platforms (AWS, GCP, Azure)

Configuration Management

Monitoring And Alerting Tools

NBCUniversal

Full-Time

Experienced

YEAR $110000 - $145000

Senior Cloud Platform Engineer (Networking)

Berlin, Germany

5 days ago

AWS

Networking

Scalable GmbH

Full-Time

Experienced

DevOps Engineer

Texas

5 days ago

AWS

GitLab

Kubernetes

InfStones

Full-Time

Experienced

All Remote Jobs

Full-Time Director of Reliability Engineering

Astronomer

Job Title

Posted

Career Level

Career Level

Locations Accepted

Salary

Share

Job Details

Skills

FAQs

What is the last date for applying to the job?

Which countries are accepted for this remote job?

Related Jobs You May Like

Azure DevOps Engineer

Lead Palantir Developer

Cloud AppOps Engineer

Staff DataOps Engineer

Query Tuning Specialist - Database Performance - Postgre

DevOps Engineer, Playout

Query Tuning Specialist - Database Performance - Postgres

Lead Palantir Developer

Cloud AppOps Engineer

Site Reliability Engineer

Senior Cloud Platform Engineer (Networking)

DevOps Engineer

Looking for a specific job?