Full-Time Site Reliability Engineer - Remote
PayNearMe is hiring a remote Full-Time Site Reliability Engineer - Remote. The career level for this job opening is Experienced and is accepting Santa Clara, CA based applicants remotely. Read complete job description before applying.
PayNearMe
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
As our Site Reliability Engineer, you will design, build, and maintain the systems and infrastructure that power our applications, ensuring their reliability, scalability, and performance. You will bring a software engineering approach to operations, automating processes, and continuously improving the infrastructure and tools to support our business needs.
System Administration: Support our EC2 infrastructure to ensure proper configuration, reliability, and monitoring, modernizing towards automation and containerization.
Automation: Build and maintain Ansible (and legacy Puppet) configuration management, increasing automation and reducing toil.
Kubernetes and Containers: Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker, implementing best practices for container orchestration and management.
Systems and Application Monitoring/Observability: Develop and maintain comprehensive monitoring and observability solutions using Datadog. Ensure detailed visibility into system performance and application health.
CI/CD Pipeline Management: Create, enhance, and maintain continuous integration and continuous deployment pipelines using GitLab CI, ensuring seamless and reliable deployment processes.
Security and Compliance: Implement security best practices, ensure compliance with industry standards, and regularly review and update security policies.
Collaboration and Support: Collaborate closely with development teams to ensure reliability and scalability of new features, providing technical support on infrastructure issues.
On-Call Rotation: Participate in on-call rotation to address production issues and collaborate in incident response.
Linux System Administration: Migrate from Monolith to Microservices, actively maintaining EC2 servers and migrating to Kubernetes.
Rails Production Environments: Experience supporting production environments running Ruby on Rails applications.
Cloud Platform Experience: Proficient with AWS (EC2, RDS, VPCs, security groups).
Configuration Management: Ansible or equivalent experience for managing large fleets of servers.
Infrastructure as Code: Expert in using Terraform.
Monitoring and Observability: Extensive experience with monitoring and observability tools like Datadog, Prometheus, Grafana, ELK, or Splunk, setting up detailed monitoring and logging systems.
Engineering Collaboration: Ability to work with other Engineering team members.
DevOps Best Practices: Deep understanding of DevOps principles, practices, and tools.