Full-Time Site Reliability Engineer
AIDA Recruitment is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting Lithuania based applicants remotely. Read complete job description before applying.
AIDA Recruitment
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
We are seeking an experienced and proactive Site Reliability Engineer to join our technology team. This is a hybrid role that combines the responsibilities of building and maintaining a scalable, resilient cloud infrastructure with the critical function of leading our response to operational and security incidents.
You will be responsible for the entire lifecycle of our production environment, from managing CI/CD pipelines and infrastructure-as-code development to real-time threat monitoring and crisis management.
Key Responsibilities- DevOps & Infrastructure Management:
- Manage, automate, and maintain our production infrastructure hosted on Amazon Web Services (AWS).
- Develop, manage, and improve our CI/CD pipelines using GitHub Actions.
- Own and advance our Infrastructure as Code (IaC) practices using Terraform.
- Collaborate with development teams to support the deployment and operation of backend microservices (.NET, Go) and frontend applications (React, hosted on Vercel).
- Monitor and manage system capacity and performance, ensuring high availability and low latency for our users.
- Implement and enforce security best practices across the infrastructure.
- Incident Response & Security:
- Serve as the primary lead for responding to, managing, and resolving production incidents.
- Develop, maintain, and test incident response playbooks, disaster recovery plans, and business continuity procedures.
- Utilize our monitoring stack to proactively detect, triage, and respond to security threats and system anomalies.
- Conduct thorough root cause analysis (RCA) for all major incidents and drive the implementation of corrective and preventative actions.
- Support and participate in regular security and resilience testing.
- Ensure all operational and incident management activities are documented and executed in alignment with our DORA and MiCA compliance obligations.
- Security Operations & Compliance:
- Define, implement, and maintain a PSIRT process.
- Design and execute incident response processes.
- Lead digital forensics efforts.
- Roll out and manage EDR (Endpoint Detection and Response) tools.
- Implement and manage MDM (Mobile Device Management) for laptops and phones.
- Define and enforce security rules and guardrails aligned with business risk.
- Harden Kubernetes clusters (EKS), containers, and implement admission control policies.
- Maintain and test Disaster Recovery (DR) and backup plans regularly.
- Manage Cloudflare WAF rules, vulnerability management (SAST/SCA/DAST), and AWS/Kubernetes event-based security tooling.
- 10+ years of experience in the field.
- Proven experience in a Site Reliability Engineering (SRE), DevOps or similar role.
- Deep, hands-on expertise with Amazon Web Services (AWS).
- Strong proficiency with containerization (Docker) and Kubernetes orchestration.
- Expert-level knowledge of Infrastructure as Code, with extensive experience using Terraform.
- Demonstrable experience building and managing CI/CD pipelines, preferably with GitHub Actions.
- Solid experience in leading incident response efforts.
- A strong understanding of networking principles.
- Familiarity with modern monitoring, logging, and observability principles and tools.
- Experience working in a highly regulated environment.
- Familiarity with our wider tech stack, including Vercel, Fireblocks, and NGINX.
- Experience with security scanning tools for containers and dependencies (e.g., Trivy).
- Knowledge of authentication mechanisms like JWE and best practices for secrets management (e.g., credential stores, AWS KMS).
- Scripting skills in languages such as Python or Bash for automation tasks.
- A competitive salary and benefits package.
- The opportunity to work with a modern, cutting-edge technology stack.
- A key role in a fast-growing company.
- A collaborative and dynamic work environment.
- Flexible working arrangements, full remote work opportunity.