Full-Time Site Reliability Engineer - Remote
PayNearMe is hiring a remote Full-Time Site Reliability Engineer - Remote. The career level for this job opening is Expert and is accepting Santa Clara, CA based applicants remotely. Read complete job description before applying.
PayNearMe
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
System Administration: Support EC2 infrastructure for proper configuration, reliability, and monitoring, modernizing towards automation and containerization.
Automation: Build and maintain Ansible (and legacy Puppet) configuration management, increasing automation and reducing toil.
Kubernetes and Containers: Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker, implementing best practices for container orchestration and management.
Systems and Application Monitoring/Observability: Develop and maintain comprehensive monitoring and observability solutions using Datadog, ensuring detailed system performance and application health visibility.
CI/CD Pipeline Management: Create, enhance, and maintain continuous integration and continuous deployment pipelines using GitLab CI, ensuring seamless and reliable deployment processes.
Security and Compliance: Implement security best practices and ensure industry standard compliance, regularly reviewing and updating security policies.
Collaboration and Support: Collaborate closely with development teams for reliability and scalability, providing technical support and guidance.
On-Call Rotation: Participate in on-call rotation for production issues and incident response efforts.
Linux System Administration: Actively maintain EC2 servers while migrating to Kubernetes for Monolith to Microservices migration.
Rails Production Environments: Experience supporting production environments running Ruby on Rails applications.
Cloud Platform Experience: Proficient with AWS, GCP, or Azure, with experience in EC2, RDS, VPCs, and security groups.
Configuration Management: Ansible or equivalent experience for managing large EC2 server fleets.
Infrastructure as Code: Expert in using Terraform for infrastructure as code.
Monitoring and Observability: Extensive experience with Datadog, Prometheus, Grafana, ELK stack, or Splunk for monitoring and observability, skilled in setting up detailed monitoring and logging systems.
Engineering Collaboration: Collaborate with Engineering team members on troubleshooting, support, and projects for Production and lower environments.
DevOps Best Practices: Deep understanding of DevOps principles, practices, and tools for continuous improvement in the software development lifecycle.