Full-Time Staff Site Reliability Engineer

Wikimedia Foundation is hiring a remote Full-Time Staff Site Reliability Engineer. The career level for this job opening is Expert and is accepting Americas, Europe, Africa based applicants remotely. Read complete job description before applying.

This job was posted 8 months ago and is likely no longer active. We encourage you to explore more recent opportunities on our site. However, you may still try your luck using 'Apply Now' link below. We recommend focusing on newer listings available here.

Wikimedia Foundation

Job Title

Staff Site Reliability Engineer

Posted

Career Level

Full-Time

Career Level

Expert

Locations Accepted

Americas, Europe, Africa

Salary

YEAR $129347 - $200824

Job Details

The Wikimedia Foundation seeks a Staff Site Reliability Engineer (SRE) focused on ML Infrastructure.

You'll join a distributed team (UTC -5 to UTC +3) and report to the Director of Machine Learning.

Responsibilities:

  • Design, develop, maintain, and scale foundational ML infrastructure for ML Engineers & Researchers.
  • Improve reliability, availability, and scalability of ML infrastructure.
  • Collaborate with ML engineers, product teams, researchers, SREs, and the Wikimedia volunteer community.
  • Proactively monitor and optimize system performance, capacity, and security.
  • Provide guidance and documentation on using the ML infrastructure.
  • Mentor team members on infrastructure management and reliability engineering.

Skills & Experience:

  • 7+ years of SRE/DevOps/Infrastructure Engineering experience with production-grade ML systems.
  • Expertise with on-premises ML infrastructure (Kubernetes, Docker, GPU acceleration, distributed training systems).
  • Proficiency with infrastructure automation and configuration management tools (Terraform, Ansible, Helm, Argo CD).
  • Experience implementing observability, monitoring, and logging for ML systems (Prometheus, Grafana, ELK stack).
  • Familiarity with Python-based ML frameworks (PyTorch, TensorFlow, scikit-learn).
  • Strong English communication skills for global team collaboration.

Qualities:

  • Collaborative, proactive, and independently motivated.
  • Experienced with diverse, remote teams.
  • Committed to open-source software and volunteer communities.
  • Systematic thinker focused on operational excellence.

Ideal Candidates Excel in:

  • Scalable ML Infrastructure: Deep understanding of scalable infrastructure design for ML training/inference.
  • Reliability and Operations: Proven track record ensuring reliability of complex, distributed ML systems.
  • Tooling and Automation: Expertise creating robust tooling/automation for ML infrastructure.

FAQs

What is the last date for applying to the job?

The deadline to apply for Full-Time Staff Site Reliability Engineer at Wikimedia Foundation is 21st of April 2025 . We consider jobs older than one month to have expired.

Which countries are accepted for this remote job?

This job accepts [ Americas, Europe, Africa ] applicants. .

Related Jobs You May Like

Lead of Modeling / Deputy to Head of ML

Dubai, United Arab Emirates
2 days ago
Machine Learning
Python
Quantitative Finance
BHFT
Full-Time
Manager

Senior Machine Learning Engineer

New York, NY
2 days ago
Databricks
MLflow
Python
Informa Group Plc.
Full-Time
Experienced
YEAR $110000 - $140000

Senior Product Manager, AI & Data Platform

Raleigh, NC
3 days ago
Agile Environment
AI/ML
Data Platforms
Momentus Technologies
Full-Time
Experienced

Junior Machine Learning Engineer

United States
4 days ago
DevOps
LLM Integration
Machine Learning
Experian
Full-Time
Entry Level

Data Science and Machine Learning Engineer (Remote)

Hyderabad, India
6 days ago
Data Science
Feature Engineering
Machine Learning
Winbold
Full-Time
Experienced

Senior AI Engineer

Poland
1 week ago
Agentic Workflows
Deep Learning
Machine Learning
SmartRecruiters Inc
Full-Time
Experienced

Sr. AI Data Engineer

Newcastle upon Tyne, United Kingdom
1 week ago
Cloud Platforms (AWS, Azure, GCP)
Data Engineering
MLOps
Turnitin, LLC
Full-Time
Experienced

Data Science and Machine Learning Engineer (Remote)

Hyderabad, India
1 week ago
Feature Engineering
Machine Learning
Model Training
Winbold
Full-Time
Experienced

Senior Machine Learning Engineer

Poland
1 week ago
Deep Learning
LLM
Machine Learning
SmartRecruiters Inc
Full-Time
Experienced

AI Engineer

Chicago, IL
2 weeks ago
Backend Development
LLMs/Agent Systems
Prompt Engineering
IFS
Full-Time
Experienced

Director of Artificial Intelligence (AI) - Remote

San Antonio, TX
2 weeks ago
AI Ethics
AI Strategy
Cloud Computing
Vericast
Full-Time
Manager
YEAR $175000 - $200000

Staff Machine Learning Engineer

USA
2 weeks ago
Data Science
Machine Learning
MLOps
Niche
Full-Time
Experienced
YEAR $177800 - $222000

Looking for a specific job?