Full-Time Senior Site Reliability Engineer Observability

2100 NVIDIA USA is hiring a remote Full-Time Senior Site Reliability Engineer Observability. The career level for this job opening is Experienced and is accepting US, CA, Santa Clara based applicants remotely. Read complete job description before applying.

This job was posted 11 months ago and is likely no longer active. We encourage you to explore more recent opportunities on our site. However, you may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings available here.

2100 NVIDIA USA

Job Title

Senior Site Reliability Engineer Observability

Posted

Career Level

Full-Time

Career Level

Experienced

Locations Accepted

US, CA, Santa Clara

Salary

YEAR $140000 - $258750

Job Details

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build, and maintain large-scale production systems with high efficiency and availability using software and systems engineering practices.

Responsibilities:

  • Design, implement, and support operational and reliability aspects of a large-scale Observability & Telemetry collection platform, focusing on performance at scale, real-time monitoring, logging, and alerting.
  • Engage in and improve the entire lifecycle of services (from inception and design through deployment, operation, and refinement).
  • Support services before launch through system design consulting, developing software tools, platforms, and frameworks, capacity management, and launch reviews.
  • Maintain services after launch by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through automation and evolve systems to improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Participate in on-call rotation to support production systems.

Qualifications:

  • BS degree in Computer Science or a related technical field involving coding, or equivalent experience.
  • 5+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large-scale private or public cloud systems in Production.
  • 5+ years of experience delivering foundational infrastructure and observability platforms.
  • Experience in Python, Go, Perl, or Ruby.
  • In-depth knowledge of Linux, Networking, and Containers.

Bonus Skills:

  • Interest in crafting, analyzing, and fixing large-scale distributed systems.
  • Strong problem-solving, communication skills, and ownership.
  • Experience debugging, optimizing code, and automating routine tasks.
  • Experience with Kubernetes, OpenStack, and Docker.
  • Experience with Grafana, OpenTelemetry, Prometheus, and similar observability tools.

FAQs

What is the last date for applying to the job?

The deadline to apply for Full-Time Senior Site Reliability Engineer Observability at 2100 NVIDIA USA is 5th of February 2025 . We consider jobs older than one month to have expired.

Which countries are accepted for this remote job?

This job accepts [ US, CA, Santa Clara ] applicants. .

Related Jobs You May Like

Azure DevOps Engineer

Jersey City, NJ
2 days ago
.NET
Azure
DevOps
Derex Technologies Inc
Full-Time
Experienced

Lead Palantir Developer

Seattle, WA
2 days ago
CI/CD Pipelines
Data Engineering
Palantir Foundry
Logic20/20 Inc.
Full-Time
Experienced
YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA
3 days ago
Application Support
AWS
Cloud Services (EC2, S3, IAM, ELB, VPC, VPN)
Sutherland
Full-Time
Experienced

Staff DataOps Engineer

Remote, India
3 days ago
AWS
CI/CD
DataOps
Nagarro
Full-Time
Experienced

Query Tuning Specialist - Database Performance - Postgre

Austin, Texas
3 days ago
Database Management
Performance Tuning
Problem-solving
ServiceNow
Full-Time
Experienced

DevOps Engineer, Playout

New York, New York
3 days ago
CICD
Cloud Services (AWS, GCP, Azure)
DevOps
NBCUniversal
Full-Time
Experienced
YEAR $90000 - $110000

Query Tuning Specialist - Database Performance - Postgres

Austin, Texas
3 days ago
Database Management
Performance Tuning
SaaS/PaaS/Cloud Development
ServiceNow
Full-Time
Experienced

Lead Palantir Developer

Seattle, WA
4 days ago
CI/CD Pipelines
Cloud ETL
Palantir Foundry
Logic20/20 Inc.
Full-Time
Experienced
YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA
4 days ago
Application Support
AWS
Cloud Security
Sutherland
Full-Time
Experienced

Site Reliability Engineer

Stamford, Connecticut
4 days ago
Cloud Platforms (AWS, GCP, Azure)
Configuration Management
Monitoring And Alerting Tools
NBCUniversal
Full-Time
Experienced
YEAR $110000 - $145000

Senior Cloud Platform Engineer (Networking)

Berlin, Germany
5 days ago
AWS
Go
Networking
Scalable GmbH
Full-Time
Experienced

DevOps Engineer

Texas
5 days ago
AWS
GitLab
Kubernetes
InfStones
Full-Time
Experienced

Looking for a specific job?