Full-Time Site Reliability Engineer
Air Apps is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting USA based applicants remotely. Read complete job description before applying.
Air Apps
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
About Air Apps: We're a family-founded company creating an AI-powered Personal & Entrepreneurial Resource Planner (PRP). We seek passionate individuals to help us change how people plan, work, and live. With offices in Lisbon and San Francisco, we've reached over 100 million downloads worldwide.
The Role: As a Site Reliability Engineer (SRE), you'll ensure system reliability, availability, and scalability. You'll work at the intersection of software development and operations, implementing automation, monitoring, and optimization strategies.
Responsibilities:
- Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.
- Develop and maintain observability tools (monitoring, logging, alerting).
- Automate infrastructure provisioning, deployment, and incident response.
- Optimize system performance, scalability, and incident response.
- Collaborate with development and DevOps teams.
- Conduct root cause analysis (RCA) and implement preventative measures.
- Ensure high availability through load balancing, failover, and disaster recovery.
- Improve CI/CD pipelines while maintaining stability.
- Optimize cloud cost and resource utilization (AWS, Azure, or GCP).
- Participate in on-call rotations to address failures.
Requirements:
- 4+ years SRE, DevOps, or System Engineering experience.
- Strong cloud platform (AWS, Azure, or GCP) and cloud-native architecture knowledge.
- Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog).
- Proficiency in IaC tools (Terraform, CloudFormation).
- Containerization and orchestration (Docker, Kubernetes, Helm) experience.
- Strong Linux system administration, networking, incident management and debugging skills.
- Scripting (Bash, Python, or Go) proficiency for automation and monitoring.
- Load balancing, failover, and distributed systems understanding.
- Security best practices, access control, and compliance knowledge.
- Strong communication and collaboration skills.
Benefits:
- Remote-first approach with flexible hours.
- Apple hardware ecosystem.
- Annual bonus.
- Medical, vision, and dental insurance.
- Disability insurance.
- 401k up to 4% contribution.
- Air Stipend ($3,120/year).
- Air Conference 2025 in Las Vegas.
Diversity & Inclusion: We welcome applicants from all backgrounds.
Application Disclaimer: Applicants must submit original work.