Senior Software Engineer, Reliability at Klaviyo | KeyStep

Job Description

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny.

Senior Software Engineer, Reliability (Dublin)

Team Overview

As a Senior Software Engineer, Reliability, you’ll ensure Klaviyo’s critical platforms are reliable, scalable, and sustainable while enabling rapid product development. We treat reliability as a core product feature and use software engineering to solve complex systems and operational challenges.

Our work spans security, infrastructure, and software development, requiring us to understand systems and engineering. We build complex, foundational solutions that must be extremely reliable, secure, and performant at global scale.

Our charter is to build and operate foundational services and infrastructure, define clear reliability objectives, reduce operational toil through automation, and continuously improve systems based on real production learnings. The work is highly visible and directly impacts how Klaviyos build software and how customers experience Klaviyo every day.

How You’ll Make an Impact

As a Senior Software Engineer, Reliability, you will build and operate the platforms, systems, and services that underpin Klaviyo’s reliability and operational excellence. You will:

Build and operate foundational, security-critical services with a strong emphasis on availability, scalability, latency, and fault tolerance

Apply software engineering principles to automate infrastructure, reduce operational toil, and improve system reliability at scale

Design, implement, and evolve systems using SRE best practices

Define and refine SLIs, SLOs, and error budgets to guide engineering decisions

Improve observability, alerting, and incident response to reduce mean time to detection and recovery

Participate in on-call rotations with a focus on sustainable operations and automatic remediations

Perform quantitative analysis to understand system behavior, capacity constraints, and scaling limits

Identify systemic risks and reliability bottlenecks and drive long-term, preventative solutions

Collaborate closely with product, platform, and security engineers to influence architecture early and ship reliable systems

Mentor and pair with other engineers, helping raise the bar for reliability, operational maturity, and engineering excellence

Who You Are

You are a cloud-native, platform-focused SRE who uses software to build and operate reliable production systems at scale.

You write and maintain production-quality code (e.g. Python, Go, or similar) to build internal platforms, automate operations, and improve system reliability

You have built, deployed, and operated distributed, cloud-native systems and understand failure modes such as partial outages, dependency failures, resource saturation, and cascading impact

You have experience operating containerized workloads and platforms (e.g. Kubernetes) in production, including deployment strategies, scaling behavior, and service networking

You are comfortable participating in on-call rotations and diagnosing production issues

You have designed and operated observability systems and know how to build actionable alerts that reflect real user and service impact

You apply SRE concepts such as SLIs, SLOs, error budgets, and burn-rate–based alerting to guide engineering decisions and operational response

You have hands-on experience with infrastructure as code and declarative configuration (e.g. Terraform, Kubernetes manifests, policy-as-code)

You have performed capacity planning, load testing, and perfo

Senior Software Engineer, Reliability

Skills & Technologies

Job Description

Team Overview

How You’ll Make an Impact

Who You Are

Company & Role Analysis

Similar roles

Senior Software Engineer

Senior Software Engineer, Distributed Databases

Senior Software Engineer - Cyber Security Configuration Assurance

Senior Software Engineer

Senior Software Engineer (Full Stack) to £100K equity

Senior Software Engineer