Full-Time Staff Site Reliability Engineer
Wikimedia Foundation is hiring a remote Full-Time Staff Site Reliability Engineer. The career level for this job opening is Expert and is accepting Americas, Europe, Africa based applicants remotely. Read complete job description before applying.
This job was posted 8 months ago and is likely no
longer active. We encourage you to explore more recent opportunities on our site. However, you
may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings
available here.
Wikimedia Foundation
Job Title
Staff Site Reliability Engineer
Posted
Career Level
Full-Time
Career Level
Expert
Locations Accepted
Americas, Europe, Africa
Salary
YEAR $129347 - $200824
Share
Job Details
The Wikimedia Foundation seeks a Staff Site Reliability Engineer (SRE) focused on ML Infrastructure.
You'll join a distributed team (UTC -5 to UTC +3) and report to the Director of Machine Learning.
Responsibilities:
- Design, develop, maintain, and scale foundational ML infrastructure for ML Engineers & Researchers.
- Improve reliability, availability, and scalability of ML infrastructure.
- Collaborate with ML engineers, product teams, researchers, SREs, and the Wikimedia volunteer community.
- Proactively monitor and optimize system performance, capacity, and security.
- Provide guidance and documentation on using the ML infrastructure.
- Mentor team members on infrastructure management and reliability engineering.
Skills & Experience:
- 7+ years of SRE/DevOps/Infrastructure Engineering experience with production-grade ML systems.
- Expertise with on-premises ML infrastructure (Kubernetes, Docker, GPU acceleration, distributed training systems).
- Proficiency with infrastructure automation and configuration management tools (Terraform, Ansible, Helm, Argo CD).
- Experience implementing observability, monitoring, and logging for ML systems (Prometheus, Grafana, ELK stack).
- Familiarity with Python-based ML frameworks (PyTorch, TensorFlow, scikit-learn).
- Strong English communication skills for global team collaboration.
Qualities:
- Collaborative, proactive, and independently motivated.
- Experienced with diverse, remote teams.
- Committed to open-source software and volunteer communities.
- Systematic thinker focused on operational excellence.
Ideal Candidates Excel in:
- Scalable ML Infrastructure: Deep understanding of scalable infrastructure design for ML training/inference.
- Reliability and Operations: Proven track record ensuring reliability of complex, distributed ML systems.
- Tooling and Automation: Expertise creating robust tooling/automation for ML infrastructure.
FAQs
What is the last date for applying to the job?
The deadline to apply for Full-Time Staff Site Reliability Engineer at Wikimedia Foundation is
21st of April 2025
. We consider jobs older than one month to have expired.
Which countries are accepted for this remote job?
This job accepts [
Americas, Europe, Africa
] applicants. .
Related Jobs You May Like
Lead of Modeling / Deputy to Head of ML
Dubai, United Arab Emirates
2 days ago
Machine Learning
Python
Quantitative Finance
BHFT
Full-Time
Manager
Senior Machine Learning Engineer
New York, NY
2 days ago
Databricks
MLflow
Python
Informa Group Plc.
Full-Time
Experienced
YEAR $110000 - $140000
Senior Product Manager, AI & Data Platform
Raleigh, NC
3 days ago
Agile Environment
AI/ML
Data Platforms
Momentus Technologies
Full-Time
Experienced
Junior Machine Learning Engineer
United States
4 days ago
DevOps
LLM Integration
Machine Learning
Experian
Full-Time
Entry Level
Data Science and Machine Learning Engineer (Remote)
Hyderabad, India
6 days ago
Data Science
Feature Engineering
Machine Learning
Winbold
Full-Time
Experienced
Senior AI Engineer
Poland
1 week ago
Agentic Workflows
Deep Learning
Machine Learning
SmartRecruiters Inc
Full-Time
Experienced
Sr. AI Data Engineer
Newcastle upon Tyne, United Kingdom
1 week ago
Cloud Platforms (AWS, Azure, GCP)
Data Engineering
MLOps
Turnitin, LLC
Full-Time
Experienced
Data Science and Machine Learning Engineer (Remote)
Hyderabad, India
1 week ago
Feature Engineering
Machine Learning
Model Training
Winbold
Full-Time
Experienced
Senior Machine Learning Engineer
Poland
1 week ago
Deep Learning
LLM
Machine Learning
SmartRecruiters Inc
Full-Time
Experienced
AI Engineer
Chicago, IL
2 weeks ago
Backend Development
LLMs/Agent Systems
Prompt Engineering
IFS
Full-Time
Experienced
Director of Artificial Intelligence (AI) - Remote
San Antonio, TX
2 weeks ago
AI Ethics
AI Strategy
Cloud Computing
Vericast
Full-Time
Manager
YEAR $175000 - $200000
Staff Machine Learning Engineer
USA
2 weeks ago
Data Science
Machine Learning
MLOps
Niche
Full-Time
Experienced
YEAR $177800 - $222000