Full-Time Senior Site Reliability Engineer Observability

2100 NVIDIA USA is hiring a remote Full-Time Senior Site Reliability Engineer Observability. The career level for this job opening is Experienced and is accepting US, CA, Santa Clara based applicants remotely. Read complete job description before applying.

This job was posted 11 months ago and is likely no longer active. We encourage you to explore more recent opportunities on our site. However, you may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings available here.

2100 NVIDIA USA

Job Title

Senior Site Reliability Engineer Observability

Posted

11 months ago on 6th January 2025

Career Level

Full-Time

Career Level

Experienced

Locations Accepted

US, CA, Santa Clara

Salary

YEAR $140000 - $258750

Job Details

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build, and maintain large-scale production systems with high efficiency and availability using software and systems engineering practices.

Responsibilities:

Design, implement, and support operational and reliability aspects of a large-scale Observability & Telemetry collection platform, focusing on performance at scale, real-time monitoring, logging, and alerting.
Engage in and improve the entire lifecycle of services (from inception and design through deployment, operation, and refinement).
Support services before launch through system design consulting, developing software tools, platforms, and frameworks, capacity management, and launch reviews.
Maintain services after launch by measuring and monitoring availability, latency, and overall system health.
Scale systems sustainably through automation and evolve systems to improve reliability and velocity.
Practice sustainable incident response and blameless postmortems.
Participate in on-call rotation to support production systems.

Qualifications:

BS degree in Computer Science or a related technical field involving coding, or equivalent experience.
5+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large-scale private or public cloud systems in Production.
5+ years of experience delivering foundational infrastructure and observability platforms.
Experience in Python, Go, Perl, or Ruby.
In-depth knowledge of Linux, Networking, and Containers.

Bonus Skills:

Interest in crafting, analyzing, and fixing large-scale distributed systems.
Strong problem-solving, communication skills, and ownership.
Experience debugging, optimizing code, and automating routine tasks.
Experience with Kubernetes, OpenStack, and Docker.
Experience with Grafana, OpenTelemetry, Prometheus, and similar observability tools.

Skills

Cloud Computing Distributed Systems Infrastructure Automation Linux Python

FAQs

What is the last date for applying to the job?

The deadline to apply for Full-Time Senior Site Reliability Engineer Observability at 2100 NVIDIA USA is 5th of February 2025 . We consider jobs older than one month to have expired.

Which countries are accepted for this remote job?

This job accepts [ US, CA, Santa Clara ] applicants. .

Apply Now

Related Jobs You May Like

Azure DevOps Engineer

Jersey City, NJ

2 days ago

.NET

Azure

DevOps

Derex Technologies Inc

Full-Time

Experienced

Lead Palantir Developer

Seattle, WA

2 days ago

CI/CD Pipelines

Data Engineering

Palantir Foundry

Logic20/20 Inc.

Full-Time

Experienced

YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA

3 days ago

Application Support

AWS

Cloud Services (EC2, S3, IAM, ELB, VPC, VPN)

Sutherland

Full-Time

Experienced

Staff DataOps Engineer

Remote, India

3 days ago

AWS

CI/CD

DataOps

Nagarro

Full-Time

Experienced

Query Tuning Specialist - Database Performance - Postgre

Austin, Texas

3 days ago

Database Management

Performance Tuning

Problem-solving

ServiceNow

Full-Time

Experienced

DevOps Engineer, Playout

New York, New York

3 days ago

CICD

Cloud Services (AWS, GCP, Azure)

DevOps

NBCUniversal

Full-Time

Experienced

YEAR $90000 - $110000

Query Tuning Specialist - Database Performance - Postgres

Austin, Texas

3 days ago

Database Management

Performance Tuning

SaaS/PaaS/Cloud Development

ServiceNow

Full-Time

Experienced

Lead Palantir Developer

Seattle, WA

4 days ago

CI/CD Pipelines

Cloud ETL

Palantir Foundry

Logic20/20 Inc.

Full-Time

Experienced

YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA

4 days ago

Application Support

AWS

Cloud Security

Sutherland

Full-Time

Experienced

Site Reliability Engineer

Stamford, Connecticut

4 days ago

Cloud Platforms (AWS, GCP, Azure)

Configuration Management

Monitoring And Alerting Tools

NBCUniversal

Full-Time

Experienced

YEAR $110000 - $145000

Senior Cloud Platform Engineer (Networking)

Berlin, Germany

5 days ago

AWS

Networking

Scalable GmbH

Full-Time

Experienced

DevOps Engineer

Texas

5 days ago

AWS

GitLab

Kubernetes

InfStones

Full-Time

Experienced

All Remote Jobs

Full-Time Senior Site Reliability Engineer Observability

2100 NVIDIA USA

Job Title

Posted

Career Level

Career Level

Locations Accepted

Salary

Share

Job Details

Skills

FAQs

What is the last date for applying to the job?

Which countries are accepted for this remote job?

Related Jobs You May Like

Azure DevOps Engineer

Lead Palantir Developer

Cloud AppOps Engineer

Staff DataOps Engineer

Query Tuning Specialist - Database Performance - Postgre

DevOps Engineer, Playout

Query Tuning Specialist - Database Performance - Postgres

Lead Palantir Developer

Cloud AppOps Engineer

Site Reliability Engineer

Senior Cloud Platform Engineer (Networking)

DevOps Engineer

Looking for a specific job?