Full-Time Site Reliability Engineer (SRE) - CloudVision

Arista Networks is hiring a remote Full-Time Site Reliability Engineer (SRE) - CloudVision. The career level for this job opening is Experienced and is accepting Remote, Ireland based applicants remotely. Read complete job description before applying.

This job was posted 2 months ago and is likely no longer active. We encourage you to explore more recent opportunities on our site. However, you may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings available here.

Arista Networks

Job Title

Site Reliability Engineer (SRE) - CloudVision

Posted

Career Level

Full-Time

Career Level

Experienced

Locations Accepted

Remote, Ireland

Job Details

SRE's at Arista combine strong software and systems engineering with a passion for operating production systems at scale. As an SRE you’ll be part of the team responsible for our global service fleet.
What You’ll Do:
CloudVision is deployed on Kubernetes across global regions using Spinnaker for our CI/CD pipeline. Our tech stack runs on GKE, using HBase/Hadoop as main distributed database and storage layer, ElasticSearch for powering search data, ClickHouse for fast real time queries of flow data, our own Kafka-based distributed real time stream processing layer for analytics, and TensorFlow for ML analysis. Our monitoring system is built on top of Prometheus, Grafana, Loki, and other OSS tools.
As a Senior SRE, you’ll be responsible for our global CloudVision service fleet. This includes:
  • Build, deploy safely and incrementally and operate critical production systems with focus on scalability, reliability, observability, performance and security.
  • Monitor, support and enhance product deployment experience across services.
  • Build automation to remove toil and efficiently operate production systems.
  • Proactively monitor, respond to, and enhance alerts and set up automated alert handling
  • Create and maintain the incident response runbooks.
  • Build and deploy new systems with scalability, reliability, and observability as primary requirements
  • Triage platform/infrastructural issues and help Arista software engineers in their triages. Engage with 3rd party vendor support.
  • Deploy new systems in a staged manner
  • Write postmortem documents and build solutions to avoid incidents from repeating.
  • Plan and communicate maintenance windows on production systems.
  • Work with Arista’s product development teams to identify infrastructural issues that are causing bottlenecks and limitations in their workflows. Design and implement solutions to resolve them.
  • Survey and adopt best practices around infrastructure/platform to maintain secure, scalable and fault-tolerant systems.
  • Implement solutions to scale the systems
  • Implement fault-tolerance and performance to improve availability of the systems
  • Study the design and sufficient implementation details of OSS systems for better triage and fix resolution.
#LI-EO1
Bachelors in Computer Science or Engineering + 5 years’ experience, MS Computer Science or Engineering + 5 years’ experience, or equivalent work experience.
Skills:
  • Knowledge of one or more of Go, Python, bash shell scripting to be able to implement medium complexity automation workflows.
  • Knowledge of Linux (or UNIX) from administration and debugging perspective
  • Hands-on experience in operating software systems (infrastructure, complex applications etc) at scale
  • Experience in server provisioning (esp from storage and networking perspective).
  • Strong problem solving and software troubleshooting skills
  • Experience with infrastructure-as-code.
Desirable:
  • Experience managing databases - eg: PostgreSQL or equivalent RDBMS etc
  • Experience with docker and virtualization technologies
  • Experience managing monitoring stack - Prometheus, Grafana etc
  • Experience managing Artifactory, docker registry etc
  • Experience managing CI/CD systems like GitLab tools, Spinnaker etc
  • Experience with infrastructure-as-code frameworks like Terraform
  • Experience with container orchestration via Kubernetes

FAQs

What is the last date for applying to the job?

The deadline to apply for Full-Time Site Reliability Engineer (SRE) - CloudVision at Arista Networks is 9th of October 2025 . We consider jobs older than one month to have expired.

Which countries are accepted for this remote job?

This job accepts [ Remote, Ireland ] applicants. .

Related Jobs You May Like

Azure DevOps Engineer

Jersey City, NJ
2 days ago
.NET
Azure
DevOps
Derex Technologies Inc
Full-Time
Experienced

Lead Palantir Developer

Seattle, WA
2 days ago
CI/CD Pipelines
Data Engineering
Palantir Foundry
Logic20/20 Inc.
Full-Time
Experienced
YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA
3 days ago
Application Support
AWS
Cloud Services (EC2, S3, IAM, ELB, VPC, VPN)
Sutherland
Full-Time
Experienced

Staff DataOps Engineer

Remote, India
3 days ago
AWS
CI/CD
DataOps
Nagarro
Full-Time
Experienced

Query Tuning Specialist - Database Performance - Postgre

Austin, Texas
3 days ago
Database Management
Performance Tuning
Problem-solving
ServiceNow
Full-Time
Experienced

DevOps Engineer, Playout

New York, New York
3 days ago
CICD
Cloud Services (AWS, GCP, Azure)
DevOps
NBCUniversal
Full-Time
Experienced
YEAR $90000 - $110000

Query Tuning Specialist - Database Performance - Postgres

Austin, Texas
3 days ago
Database Management
Performance Tuning
SaaS/PaaS/Cloud Development
ServiceNow
Full-Time
Experienced

Lead Palantir Developer

Seattle, WA
4 days ago
CI/CD Pipelines
Cloud ETL
Palantir Foundry
Logic20/20 Inc.
Full-Time
Experienced
YEAR $156750 - $173329

Cloud AppOps Engineer

Atlanta, GA
4 days ago
Application Support
AWS
Cloud Security
Sutherland
Full-Time
Experienced

Site Reliability Engineer

Stamford, Connecticut
4 days ago
Cloud Platforms (AWS, GCP, Azure)
Configuration Management
Monitoring And Alerting Tools
NBCUniversal
Full-Time
Experienced
YEAR $110000 - $145000

Senior Cloud Platform Engineer (Networking)

Berlin, Germany
5 days ago
AWS
Go
Networking
Scalable GmbH
Full-Time
Experienced

DevOps Engineer

Texas
5 days ago
AWS
GitLab
Kubernetes
InfStones
Full-Time
Experienced

Looking for a specific job?