Full-Time Sr Site Reliability Engineer
Kentik is hiring a remote Full-Time Sr Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Kentik
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
Who we are Kentik is the network observability company. Our platform is a must-have for the network front line, whether digital business, corporate IT, or service provider.
What we do Kentik's Platform Engineering group stores, enriches, and queries traffic metadata and metrics from the world's largest networks. Our platform monitors infrastructure, triggers automated responses to outages and attacks, and delivers complete network observability to our customers.
As a senior engineer in platform engineering, you will co-own, design, and implement state-of-the-art reliability engineering to ensure our data-intensive platform continues to play a critical role for influential companies.
What you'll do Ensure our real-time, scalable, microservices-based infrastructure is set up for growth and working efficiently. Our infrastructure runs on our own hardware, across multiple locations and all major cloud vendors.
Work on tools and processes to better monitor our platform and ensure its stability. Deep-dive into diverse topics, from NetFlow and IP routing, to database replication strategies or HTTP optimization.
Collaborate with engineering and infrastructure teams on finding solutions from an operational perspective.
Contribute code, code reviews, and tools or patches to existing code. Write design documents or collaborate on colleagues' docs to introduce new features or changes.
Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team.
What you'll bring We encourage you to apply if you meet most of the criteria. We value strong collaboration and communication skills, and a high level of independence.
- 5+ years of experience in Systems Administration, Datacenter/IT and/or SRE related projects.
- Experience with *nix system command line (e.g. ssh, grep, awk).
- Detailed understanding of major internet protocols.
- Experience with or desire to learn about microservices, containers, and orchestration.
- Networking administration experience: concepts such as routing, firewalls (iptables), peering.
- A passion for documenting code, processes, and infrastructure.
- Strong collaboration and communication skills.
- Worked with a configuration management platform (infrastructure as code) such as: Ansible, Puppet, Chef, SaltStack or CFEngine.
- Worked with metrics monitoring solutions such as grafana, prometheus, and OpenTelemetry.
- Strong preference towards automation - coding in Bash, Python, Ruby, or Go.
- Experience with public cloud (AWS, GCP, Azure, etc.) architectures and technologies management using Terraform.