Full-Time Sr. Cloud Site Reliability Engineer

Serve Robotics is hiring a remote Full-Time Sr. Cloud Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.

This job was posted 9 months ago and is likely no longer active. We encourage you to explore more recent opportunities on our site. However, you may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings available here.

Serve Robotics

Job Title

Sr. Cloud Site Reliability Engineer

Posted

Career Level

Full-Time

Career Level

Senior Manager

Locations Accepted

USA

Job Details

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles while doing commercial deliveries.

We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity. We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

This is a senior-level, individual contributor position. You will balance hands-on responsibilities—building and maintaining critical SRE tooling and processes - with technical leadership - guiding architecture decisions, mentoring others in SRE practices, and steering strategic initiatives to enhance system resiliency and availability.

You’ll collaborate across engineering, product, and operations teams to ensure our systems meet strict uptime and performance goals, all while aligning with overarching business objectives.

Responsibilities

  • Instrumentation & Monitoring: Develop and refine monitoring and observability tools (metrics, logs, traces) to validate system availability and performance. Implement best practices for instrumentation using tools like Prometheus, Grafana, Datadog, or equivalent.
  • Reliability Engineering: Collaborate with development teams to design and implement solutions for higher availability in the cloud. Lead the definition and management of Service Level Indicators (SLIs) and Service Level Objectives (SLOs), ensuring alignment with business goals. Perform capacity planning, load testing, and performance tuning to ensure systems can handle projected traffic and workloads.
  • Incident Response & Prevention: Own the incident response process, including on-call rotation, alerts, and root cause analysis. Proactively identify reliability risks and propose mitigations to reduce system downtime. Conduct and facilitate postmortems to capture learnings, drive improvements, and prevent recurrence of issues.
  • Align System Health with Business Metrics: Map system availability metrics to direct business value, ensuring stakeholders understand how reliability impacts overall company objectives. Create reporting dashboards that connect reliability data with KPIs and business goals.
  • Technical Leadership & Mentorship: Serve as an in-house SRE expert, advising teams on reliability-oriented designs, coding practices, and testing methodologies. Mentor junior and mid-level engineers, fostering a culture of continuous learning, automation, and operational excellence.
  • Collaboration & Education: Work closely with engineering, product, and operations teams to advocate for SRE best practices. Conduct training sessions and share knowledge to build a culture of reliability throughout the organization.

FAQs

What is the last date for applying to the job?

The deadline to apply for Full-Time Sr. Cloud Site Reliability Engineer at Serve Robotics is 20th of March 2025 . We consider jobs older than one month to have expired.

Which countries are accepted for this remote job?

This job accepts [ USA ] applicants. .

Related Jobs You May Like

Azure DevOps Engineer

Jersey City, NJ
2 days ago
.NET
Azure
DevOps
Derex Technologies Inc
Full-Time
Experienced

Lead Palantir Developer

Seattle, WA
2 days ago
CI/CD Pipelines
Data Engineering
Palantir Foundry
Logic20/20 Inc.
Full-Time
Experienced
YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA
3 days ago
Application Support
AWS
Cloud Services (EC2, S3, IAM, ELB, VPC, VPN)
Sutherland
Full-Time
Experienced

Staff DataOps Engineer

Remote, India
3 days ago
AWS
CI/CD
DataOps
Nagarro
Full-Time
Experienced

Query Tuning Specialist - Database Performance - Postgre

Austin, Texas
3 days ago
Database Management
Performance Tuning
Problem-solving
ServiceNow
Full-Time
Experienced

DevOps Engineer, Playout

New York, New York
3 days ago
CICD
Cloud Services (AWS, GCP, Azure)
DevOps
NBCUniversal
Full-Time
Experienced
YEAR $90000 - $110000

Query Tuning Specialist - Database Performance - Postgres

Austin, Texas
3 days ago
Database Management
Performance Tuning
SaaS/PaaS/Cloud Development
ServiceNow
Full-Time
Experienced

Lead Palantir Developer

Seattle, WA
4 days ago
CI/CD Pipelines
Cloud ETL
Palantir Foundry
Logic20/20 Inc.
Full-Time
Experienced
YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA
4 days ago
Application Support
AWS
Cloud Security
Sutherland
Full-Time
Experienced

Site Reliability Engineer

Stamford, Connecticut
4 days ago
Cloud Platforms (AWS, GCP, Azure)
Configuration Management
Monitoring And Alerting Tools
NBCUniversal
Full-Time
Experienced
YEAR $110000 - $145000

Senior Cloud Platform Engineer (Networking)

Berlin, Germany
5 days ago
AWS
Go
Networking
Scalable GmbH
Full-Time
Experienced

DevOps Engineer

Texas
5 days ago
AWS
GitLab
Kubernetes
InfStones
Full-Time
Experienced

Looking for a specific job?