Full-Time Site Reliability Engineer
Mirantis is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting Poznań, Poland based applicants remotely. Read complete job description before applying.
Mirantis
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
The Site Reliability Engineer (SRE) ensures the reliability, scalability, and performance of complex distributed systems.
As a Site Reliability Engineer, you will play a central role in evolving the reliability and sustainability of the company’s core platform.
The successful candidate will work within a DevIntegration team, integrating multiple layers of the product stack to enable automated, Kubernetes-based GPU workload provisioning using the Cluster API framework.
Key Responsibilities
- Reliability and Infrastructure EngineeringDesign, deploy, and maintain highly available, fault-tolerant systems running on Kubernetes and bare metal infrastructure.
- System Integration and AutomationWork within the DevIntegration team to integrate diverse components of the product stack.Build automation pipelines using Infrastructure as Code (IaC) and CI/CD frameworks.
- Architecture and Design LeadershipParticipate in and lead architectural discussions to ensure alignment with reliability, security, and scalability goals.
- Operational ExcellenceEnsure long-term operational sustainability of the deployed product.
- Leadership and MentorshipMentor and support team members, sharing deep expertise in reliability engineering, infrastructure design, and troubleshooting best practices.
Required Qualifications
- 8+ years of hands-on experience managing mission-critical, high-availability production environments.
- Proven background in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- Strong understanding of cloud infrastructure (AWS, GCP, Azure) and private clouds.
- Proficiency in at least one general-purpose programming language (Python or Go preferred).
- Expertise in containerization and orchestration technologies (Docker, Kubernetes, Cluster API).
- Strong knowledge of modern observability stacks (Prometheus, VictoriaMetrics).
Preferred Qualifications
- Experience in GPU-based workload orchestration and performance optimization.
- Familiarity with chaos engineering and proactive reliability testing.