Full-Time Staff Site Reliability Engineer
Wikimedia Foundation is hiring a remote Full-Time Staff Site Reliability Engineer. The career level for this job opening is Expert and is accepting Americas, Europe, Africa based applicants remotely. Read complete job description before applying.
Wikimedia Foundation
Job Title
Staff Site Reliability Engineer
Posted
Career Level
Full-Time
Career Level
Expert
Locations Accepted
Americas, Europe, Africa
Salary
YEAR $129347 - $200824
Share
Job Details
The Wikimedia Foundation seeks a Staff Site Reliability Engineer (SRE) focused on ML Infrastructure.
You'll join a distributed team (UTC -5 to UTC +3) and report to the Director of Machine Learning.
Responsibilities:
- Design, develop, maintain, and scale foundational ML infrastructure for ML Engineers & Researchers.
- Improve reliability, availability, and scalability of ML infrastructure.
- Collaborate with ML engineers, product teams, researchers, SREs, and the Wikimedia volunteer community.
- Proactively monitor and optimize system performance, capacity, and security.
- Provide guidance and documentation on using the ML infrastructure.
- Mentor team members on infrastructure management and reliability engineering.
Skills & Experience:
- 7+ years of SRE/DevOps/Infrastructure Engineering experience with production-grade ML systems.
- Expertise with on-premises ML infrastructure (Kubernetes, Docker, GPU acceleration, distributed training systems).
- Proficiency with infrastructure automation and configuration management tools (Terraform, Ansible, Helm, Argo CD).
- Experience implementing observability, monitoring, and logging for ML systems (Prometheus, Grafana, ELK stack).
- Familiarity with Python-based ML frameworks (PyTorch, TensorFlow, scikit-learn).
- Strong English communication skills for global team collaboration.
Qualities:
- Collaborative, proactive, and independently motivated.
- Experienced with diverse, remote teams.
- Committed to open-source software and volunteer communities.
- Systematic thinker focused on operational excellence.
Ideal Candidates Excel in:
- Scalable ML Infrastructure: Deep understanding of scalable infrastructure design for ML training/inference.
- Reliability and Operations: Proven track record ensuring reliability of complex, distributed ML systems.
- Tooling and Automation: Expertise creating robust tooling/automation for ML infrastructure.
FAQs
What is the last date for applying to the job?
The deadline to apply for Full-Time Staff Site Reliability Engineer at Wikimedia Foundation is
21st of April 2025
. We consider jobs older than one month to have expired.
Which countries are accepted for this remote job?
This job accepts [
Americas, Europe, Africa
] applicants. .
Related Jobs You May Like
AI Architect
USA
2 days ago
Azure Technologies
Data Management
Data Warehousing
3Cloud
Full-Time
Expert
YEAR $133600 - $193700
AI/ML Engineer
Bengaluru, India
2 days ago
AI
Deep Learning
Generative AI
Abstrabit Technologies Pvt Ltd
Full-Time
Entry Level
Senior Machine Learning Engineer
Seattle, WA
3 days ago
AWS
Data Analysis
Hadoop
Logic20/20 Inc.
Full-Time
Senior Manager
YEAR $130000 - $150000
Senior AI Specialist
Fes, Morocco
3 days ago
AI
Data Analysis
Machine Learning
ALTEN
Full-Time
Senior Manager
Mid-Level Machine Learning Engineer
Barcelona, Spain
3 days ago
Azure
Machine Learning
MLflow
EcoVadis
Full-Time
Experienced
Senior AI Modeller
Melbourne, Australia
4 days ago
Data Analysis
Git
Large Language Models (LLMs)
Montu
Full-Time
Senior Manager
Senior AI Modeller
Melbourne, Australia
5 days ago
Cloud Computing (GCP)
Data Analysis
Large Language Models
Montu
Full-Time
Senior Manager
Machine Learning Engineer
USA
6 days ago
AWS
Kubernetes
Machine Learning
Artera
Full-Time
Experienced
Sr. Machine Learning Engineer
Prague, Czech Republic
1 week ago
Deep Learning
Generative AI
Large-scale Data Processing
DNAnexus
Full-Time
Experienced
AI/ML Engineer
Bengaluru, India
1 week ago
Deep Learning
Generative AI
Machine Learning
Abstrabit Technologies Pvt Ltd
Full-Time
Entry Level
Director, Recommendation Science
New York, New York
1 week ago
Data Science
Deep Learning
Machine Learning
NBCUniversal
Full-Time
Manager
YEAR $200000 - $250000
Junior AI/ML Engineer
Cologne, Germany
1 week ago
AI
Cloud Platforms
Git
Redcare Pharmacy
Full-Time
Entry Level