KeyStep

Staff Site Reliability Engineer I

Remote

EMEA

about 12 hours ago

full-timeEngineering

Skills & Technologies

GoBashPlatform EngineeringSite ReliabilityDevOpsSRESite Reliability EngineeringSOLIDAWSDockerKubernetesTerraformGitHub ActionsGitLab CICI/CDPrometheusGrafanaDatadogCloudCloud Infrastructure

Job Description

About Remote

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. We make it possible for businesses of all sizes to recruit, pay, and manage international teams. With our core values at heart and future focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world. You can find Remoters working from 6 different continents (Antarctica left to go!) and all of our positions are fully remote.

With Innovation as one of the core values, we have built Automation and AI capabilities into the requirements for every role.

We encourage every member of the Remote team to bring their talents, experiences and culture to the table to help us build the best-in-class HR platform.

If you are energetic, curious, motivated and ambitious, be part of our world. Apply now and define the future of work!

What this job can offer you

As a Staff SRE at Remote, you will own the technical direction of our SRE platform, shaping its architecture, reliability strategy, and long-term evolution. This is a leadership role as much as a technical one: you'll drive platform-wide initiatives, set the reliability bar for engineering teams across the organisation, and be a force multiplier for the engineers around you.

A key part of this role is identifying and leading opportunities to leverage AI: from reducing operational toil to enabling engineering teams to build, ship, and operate software more effectively. You'll work with a high degree of autonomy, translating technical risks into business impact and aligning with Engineering Managers, Team Leads, and Product teams to ensure reliability and engineering efficiency are built into everything we do.

What you bring

Technical

8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering

Deep expertise in Kubernetes: operating, designing, and scaling production clusters

Proven experience designing and managing cloud infrastructure on AWS (or other cloud providers) at scale

Strong infrastructure-as-code practice with Terraform

Experience defining and operating reliability frameworks: SLOs, SLIs, error budgets, alerting strategies

Solid observability background: Datadog, Grafana/Prometheus, or similar

Proficiency with CI/CD platforms (GitLab CI, GitHub Actions, or similar) and deployment automation

Comfortable with Bash and scripting for automation; broader programming skills are a plus

Experience with container tooling (Docker) and the broader ecosystem around it

Curiosity and practical experience applying AI tools to infrastructure, operations, or developer tooling: whether through AI-assisted automation, LLM-powered workflows, or intelligent observability

Leadership & behavioural

Proven track record of driving platform-wide technical initiatives and influencing engineering direction without formal authority

Strong communicator: able to tailor messaging to technical and non-technical audiences, write clearly, and align stakeholders across teams

Self-directed: able to identify what needs attention, define the path forward, and execute with minimal supervision

Experience mentoring senior engineers and creating space for others to lead and grow

Comfortable navigating ambiguity, translating vague requirements into concrete solutions

Approaches technical problems with a business lens, understands the cost and value of engineering decisions

Nice to have

Excellent communication and interpersonal skills

Holistic debugging skills

Security knowledge and capabilities from a defensive and offensive standpoint

Key Responsibilities

Own the technical direction of Remote's SRE/Platform domain, its architecture, tooling, and long-term roadmap

Define and drive the reliability strategy across the platform: SLOs/SLIs, error budgets, observability, and incident management maturity

Lead complex, cross-team infrastructure initiatives from discovery through delivery, delegating effe

Company & Role Analysis

JobSeeker+

Likely perks

Private MedicalPension25+ Days HolidayStock OptionsLearning BudgetFlexible Hours

Culture & working style

Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.

Market salary range

£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)

Unlock the full analysis for this job