KeyStep

Software Engineer II, Reliability

Klaviyo
Dublin, Ireland
about 6 hours ago
full-timeEngineering

Skills & Technologies

PythonGoSoftware EngineeringSREDistributed SystemsScalabilityKubernetesTerraformCloudKlaviyoImplementationMakeAIAutomationDocumentation

Job Description

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny.

Software Engineer II, Reliability (Dublin)

Team Overview

As a Software Engineer II, Reliability, you will help ensure Klaviyo’s critical platforms are reliable, scalable, and sustainable while enabling rapid product development.

We treat reliability as a core product feature and use software engineering to solve complex systems and operational challenges. Our work spans infrastructure, security, and software engineering, and focuses on building and operating systems that are reliable, secure, and performant at scale.

The SRE team’s charter is to build and operate foundational services and infrastructure, reduce operational toil through automation, and continuously improve systems based on real production learnings. Your work will directly impact how Klaviyo engineers build software and how customers experience our platform every day.

How You’ll Make an Impact

As a Software Engineer II, Reliability, you will contribute to the reliability and operational excellence of Klaviyo’s platforms by working on well-scoped projects and owning services with support from senior engineers. You will:

Build, operate, and improve production systems with a focus on reliability, scalability, and performance

Apply software engineering principles to automate operational tasks and reduce manual toil

Contribute to the design and implementation of systems using established SRE best practices

Help define and measure SLIs and SLOs for services you support

Improve observability through metrics, dashboards, logging, and tracing

Participate in on-call rotations and respond to production incidents with guidance and support

Assist with incident investigation and contribute to post-incident reviews and follow-up actions

Perform basic analysis around system behavior, capacity usage, and scaling characteristics

Identify reliability issues or operational pain points and work with teammates to address them

Collaborate with product, platform, and security engineers to ship reliable systems

Write and maintain clear operational runbooks and system documentation

Who You Are

You are an early-to-mid career SRE who is comfortable operating production systems and eager to deepen your expertise in reliability engineering. You:

Have experience operating cloud-native production systems and services

Write production-quality code (e.g. Python, Go, or similar) to automate operations and improve reliability

Understand common failure modes in distributed systems, such as dependency failures, resource exhaustion, and partial outages

Have experience working with containerized workloads and platforms (e.g. Kubernetes) in production environments

Are comfortable participating in on-call rotations and diagnosing straightforward production issues

Have experience using observability tools and responding to alerts

Are familiar with SRE concepts such as SLIs, SLOs, and error budgets, and are learning how to apply them in practice

Have hands-on experience with infrastructure as code or declarative configuration (e.g. Terraform, Kubernetes manifests)

Can follow incident response processes and contribute meaningfully during outages

Are comfortable receiving feedback, learning from incidents, and improving your systems over time

You’ve already experimented with AI in work or personal projects, and you’re excited to dive in and learn fast. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter

Company & Role Analysis

JobSeeker+
Likely perks
Private MedicalPension25+ Days HolidayStock OptionsLearning BudgetFlexible Hours
Culture & working style

Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.

Market salary range

£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)

Unlock the full analysis for this job
Sign in to unlock →

Similar roles

See more
Twilio
Remote - Ireland
Full-time
Remote
about 3 hours ago

Who we are At Twilio, we’re shaping the future of communications, all from the comfort of our homes. We deliver innovative solutions to hu…

View Job
Klaviyo
Dublin, Ireland
Full-time
about 6 hours ago

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace…

View Job
Klaviyo
Dublin, Ireland
Full-time
about 6 hours ago

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace…

View Job
MongoDB
Gurugram
Full-time
about 7 hours ago

The worldwide data management software market is massive (According to IDC, the worldwide database software market, which it refers to as th…

View Job
MongoDB
Dublin, Ireland
Full-time
about 11 hours ago

The Enterprise Advanced team is a diverse group of individuals across Europe and India, who develop software to run MongoDB on any type of i…

View Job
Apply NowApply with CV Improver