Full-Time SRE Engineer
Pango is hiring a remote Full-Time SRE Engineer. The career level for this job opening is Experienced and is accepting Poland based applicants remotely. Read complete job description before applying.
Pango
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Pango Group helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Pango Group is dedicated to creating the world’s most comprehensive portfolio of industry-leading cybersecurity solutions.
About the Role:We are seeking a highly motivated and skilled Site Reliability Engineer (SRE) to join our dynamic engineering team.
Day to Day Responsibilities:
- System Monitoring & Incident Response: Develop and implement monitoring tools to ensure system health. Respond to incidents, troubleshoot issues, and provide timely resolutions.
- Automation & Infrastructure as Code: Design and implement automation solutions to manage infrastructure and application deployment using tools like Terraform, Ansible, or similar technologies.
- Performance Optimization: Analyze system performance and capacity; implement improvements to enhance system reliability and efficiency.
- Collaboration: Work closely with development teams to improve system design and deployment practices. Advocate for reliability improvements in the software development lifecycle.
- Documentation & Reporting: Maintain thorough documentation of system architecture, processes, and incident response procedures. Provide regular reports on system performance and reliability metrics.
- Recovery & Backup: Design and implement disaster recovery plans and ensure effective data backup solutions are in place.
- Security Best Practices: Collaborate with security teams to ensure best practices are followed to protect systems and data.
What You Bring:
- Proven experience in a Site Reliability Engineering, DevOps, or related role.
- Strong knowledge of cloud services (AWS, Azure, Google Cloud) and container orchestration (Kubernetes, Docker).
- Proficiency in scripting languages (Python, Bash, ansible, etc.) and experience with CI/CD tools (Jenkins, GitLab CI/CD, etc.) and infrastructure as code tools (Terraform, Ansible).
- 3+ years of proven track record with production monitoring using Prometheus, ELK, Grafana and OpsGenie/PagerDuty.
- 3+ years of experience in Linux system administration (preferably Ubuntu)
- Solid understanding of networking, security, system architecture, and data center operations in a fast-paced, 24x7, production environment.
- Strong understanding of networking concepts, protocols (TCP/IP, BGP, OSPF), and technologies (LAN, WAN, VPN) with proficiency in network monitoring tools and software.