Full-Time Site Reliability Engineer
Graylog is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting USA based applicants remotely. Read complete job description before applying.
Graylog
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Site Reliability Engineer at Graylog
Job Summary: Provide architectural guidance and technical solutions for adapting the product in a 24x7 support cloud offering. Focus on high availability, resilience, security, scalability, and cost-efficiency. Work with cutting-edge technologies, shaping the future of the cloud strategy.
Responsibilities:
- Cloud Infrastructure Management: Write pull requests to improve AWS+Terraform+Kubernetes setup, focusing on high availability and resilience.
- Security & Compliance: Implement security measures, audit the cloud environment, and ensure compliance standards.
- Tool Development: Expand the internal tool base, focusing on Infrastructure as Code and configuration management.
- Issue Resolution: Collaborate with teams to resolve infrastructure issues swiftly.
- Cloud Strategy Advocacy: Champion cloud strategies aligning with business objectives.
- Knowledge Sharing: Connect with other engineers, document decisions, and prevent knowledge silos.
First 12 Months:
- Infrastructure Knowledge: Acquire expertise in Terraform, Flux, Kustomize, and Argo within 6 months.
- Stability Improvements: Deliver a Proof of Concept (POC) for improving uptime, reducing single points of failure, or decreasing Time to Recovery in 6-9 months.
- Signal and Metrics Improvement: Contribute to signal and metrics improvement, showing decreased alerts or providing requested metrics.
- Security and Compliance: Contribute to AWS Product and Architecture Review, SOC 2 compliance review, Disaster Recovery (DR) plan review and drill, Security Penetration Test (Pen Test) review and remediation within the first 12 months.
Required Skills:
- Proficiency in managing cloud infrastructures (especially AWS).
- Hands-on experience with Infrastructure as Code (IaC) tools.
- Basic programming skills (Python, etc.).
- Knowledge of security protocols and compliance requirements.
- Experience in diagnosing and resolving infrastructure issues.
- Familiarity with cloud monitoring tools and performance metrics.
- Understanding of CI/CD practices.
- Strong documentation and communication skills.
About Graylog:
- Remote-friendly company with global offices.
- Competitive compensation.
- Opportunities for professional and personal growth.