Full-Time Senior Site Reliability Engineer
Weedmaps77 is hiring a remote Full-Time Senior Site Reliability Engineer. The career level for this job opening is Experienced and is accepting USA based applicants remotely. Read complete job description before applying.
Weedmaps77
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Senior Site Reliability Engineer (REMOTE)
Overview:As a Senior Site Reliability Engineer at Weedmaps you will work cross-departmentally with your partners on the application, infrastructure and quality teams to enhance the performance, reliability, resilience and scalability of the web services that make up Weedmaps.com. We are a cloud native organization with 100% of our services in Docker running on Kubernetes in AWS’ public cloud.. We also leverage observability, monitoring, CI/CD automation and custom tooling to push multiple production releases a day. Your day to day focus will be leveraging your engineering skills to assist in building, monitoring, reducing developer toil, configuring CI workflows and improving our deployment pipelines. You will also be a knowledge reference for our development teams to ensure they are leveraging consistent tooling for metrics, logging, build, and deployment. You will work closely with the development and infrastructure teams to identify the essential service-specific metrics (beyond the golden metrics) that need to be monitored and work with application development teams to create libraries to allow services to easily instrument their services.
The impact you'll make:
- Collaborate with stakeholders to drive best practices for monitoring, CI/CD pipelines
- Troubleshoot deployment issues in our CI/CD pipeline
- Advocate emphatically for the DevOps culture here at Weedmaps
- Identify areas for automation and embrace the codification of all things
- Evangelize best practices around collaboration, reliability, security and performance to all partner teams
- Take ownership of the application configuration/scaling for given services to ensure that they are following the established practices of the organization
- Create and refine synthetic monitoring flows
- Help teams understand the reliability of their services using metrics and observability.
What you've accomplished:
- Minimum 5 years of experience at startup/mid-sized companies
- Proficiency in at least one, Python, Go, Node, Ruby or Elixir
- Experience using/operating Kubernetes in a production environment.
- Effective communication skills, a positive attitude, and ability to give and receive constructive feedback
- Ability to learn fast and be adaptable to environments and change.
- Strong bias for action and strong decision-making capabilities.
- Must be capable of self-managing.
- Prioritization and time management are an absolute must.
- Professional experience with cloud native observability standard such as Open metrics, Open tracing and Open Census
- Expertise using/configuring modern CI/CD workflows
- Intimate understanding and experience implementing of SLIs, SLOs and SLAs from the service level to the business level
- Intimate understanding of the golden metrics, how to monitor and alert on them
- Deep understanding of the GitHub branching strategy
- Experience troubleshooting containerized applications
- Familiarity with Infrastructure as Code, automation and configuration
Bonus points:
- Familiarity with cloud infrastructure concepts (AWS, GCP)
- Experience with Hashicorp tools such as Terraform, Consul, Vault
- Computer science or other engineering background
- Experience with CI tools such as CircleCi, Jenkins, Travis, Drone, Semaphore, etc.
- Experience with monitoring and observability with tools like Prometheus, CloudWatch, DataDog, and Grafana