Full-Time Site Reliability Engineer
Startree is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting USA based applicants remotely. Read complete job description before applying.
Startree
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
At StarTree, we're passionate about improving lives through real-time analytics tools. We aim to empower companies by building a comprehensive cloud analytics system.
About the role:
StarTree seeks exceptional Site Reliability Engineers for Pinot (SRE- Pinot). This role involves managing, tuning, and debugging large-scale, highly available distributed systems.
Responsibilities:
- Leverage monitoring and alerting to solve complex problems at scale.
- Manage and tune critical customer-facing Apache Pinot clusters.
- Monitor availability, latencies, and telemetry to identify and mitigate issues.
- Collaborate with customers to resolve incidents.
- Execute disaster recovery strategies with minimal downtime.
- Collaborate with engineers to troubleshoot systems and influence roadmap decisions.
- Debug Pinot queries and ingestion during incidents.
Requirements:
- 5+ years experience as an engineer (SRE, SDET, or development).
- Experience managing highly available production distributed systems, with in-depth Java knowledge.
- Exposure to cloud platforms (AWS, GCP, or Azure) is a plus.
- Experience with Kubernetes and container orchestration is a plus.
- Familiarity with streaming systems (Kafka, Pulsar, Flume, Flink, Spark).
- Strong troubleshooting and critical thinking skills.
- Experience with Apache Pinot is preferred.
- Building Java applications is required.
- Experience with Zookeeper is a plus.