Full-Time Staff Software Engineer, ML Training Platform
Reddit is hiring a remote Full-Time Staff Software Engineer, ML Training Platform. The career level for this job opening is Expert and is accepting USA based applicants remotely. Read complete job description before applying.
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
Reddit is a community of communities, built on shared interests and passion. It's home to open and authentic conversations. Users submit, vote, and comment on topics they care about.
Location: This role is completely remote-friendly. If you live near an office, you can come in as often as you like.
About Us: The Machine Learning Platform team powers recommendations, content discovery, user and content quantification, impacting other teams (Growth, Ads, Feeds, Core ML).
What You'll Do: As a Staff Software Engineer on the ML Training Platform, you will architect, implement, and maintain foundational ML infrastructure powering features like Feeds Ranking, Content Understanding, and Recommendations. You'll build systems and tools to empower machine learning engineers and data scientists and improve the ML lifecycle.
You'll create a self-service ML platform for continuous iteration using Deep Learning, Natural Language Processing, Recommendation Systems, Representation Learning, and Computer Vision.
Your Responsibilities:
- Lead the building, testing, and maintenance of ML infrastructure.
- Propose, design, and implement high-performance ML platform solutions.
- Play a pivotal role in designing, building, and optimizing the infrastructure for large-scale ML workflows.
- Design and implement solutions to advance the ML Platform architecture.
- Analyze bottlenecks in distributed systems and optimize for performance and cost-efficiency.
- Collaborate with management on team goals, planning, and risk mitigation.
- Mentor team members on DevOps practices for ML platform components.
Who You Are:
- 8+ years of production software development or data system experience.
- Degree in Machine Learning, Engineering, Computer Science, or related field.
- Experience with large-scale ML systems design and architecture.
- Familiarity with ML frameworks (TensorFlow, PyTorch, JAX).
- Experience with training workflows, hyperparameter tuning, and resource optimization.
- Knowledge of MLOps practices and tools (Ray, MLFlow).
- Experience with container orchestration systems (Kubernetes, Docker).
- Proficient in Python and/or Golang, object-oriented programming.
- Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems.
Benefits: Comprehensive healthcare, 401k matching, home office benefits, professional development funds, family planning support, flexible vacation, Reddit Global Wellness Days, parental leave, and paid volunteer time off. Equity opportunity also exists.
Pay Transparency: Base salary range is $230,000-$322,000 USD, and final offer amounts vary.