Other Senior MLOps Engineer
Raft Company Website is hiring a remote Other Senior MLOps Engineer. The career level for this job opening is Expert and is accepting USA based applicants remotely. Read complete job description before applying.
Raft Company Website
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
About the role:
Raft is building a real-time data platform for the Department of Defense (DoD), aimed at enhancing operators' awareness of critical events, including instances like the Chinese balloon and Cessna over the White House incidents. Central to this data platform is the aggregation of real-time data from over 750 sensors. This data is subsequently enriched, rendered queryable, and ultimately presented as a common operational picture to empower operators in making time critical and pertinent decisions. Our system efficiently handles the processing of over a billion events daily, all achieved with millisecond-level latency. Key technologies include Kafka, Kafka Streams, Pinot, Java, Scala, and Kubernetes. Your involvement in this role will encompass hands-on collaboration with a team of accomplished individuals, collectively striving towards excellence. Your primary responsibilities will be to deploy ML infrastructure, build MLOps pipelines, and contribute to the development of a full-lifecycle ML platform.
What we are looking for:
- 4+ years of relevant hands-on experience
- 3+ years' experience with Docker and Kubernetes, provisioning production clusters and maintaining their compliance.
- 3+ years experience supporting enterprise Cloud applications or infrastructure (AWS, Azure, etc.)
- Solid understanding of Helm Charts
- Practical experience with Machine Learning on Kubernetes
- Experience managing clusters with GPU machines
- Practical programming and scripting skills (Python preferred)
- Fast learner, analytical thinker, creative, hands-on, strong communication skills
- Able to work both independently and as part of a team
- Excellent problem-solving skills and attention to detail.
- Proven experience with modern software development and engineering practices including scrum/agile, Git, and DevOps
- Ability to obtain a Security+ certification within the first 90 days of employment with Raft
Highly preferred:
- Currently Cleared or a Clearance in the past
- Knowledge of Istio
- Experience building and maintaining machine learning platforms
- Comfortable provisioning and debugging complex CI/CD pipelines
- Prior experience with Terraform