KeyStep

Site Reliability Engineer - Observability

N26

Berlin, Germany

about 4 hours ago

full-time

Skills & Technologies

PythonGolangPlatform EngineeringSite ReliabilityMicroservicesPrometheusGrafanaOpenTelemetryCloudComplianceMakeAutomationResilience

Job Description

About the opportunity

We are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain.

Platform Engineering’s goal is to provide easy to use, self-service platforms to enable other segments to easily build, deploy and monitor their business applications. And Observability’s role in that part of the company is to provide our users with end-to-end observability that’s easy to use.

As one of the first banks completely hosted in the cloud, our security, resilience and productivity standards require a motivated and well balanced team. We are using a modern technology stack to match our principles when it comes to providing the framework for our development team, the company and our customers.

In this role, you will

In Observability you’ll build the tools for monitoring and measuring infrastructure, microservices, and sometimes totally unique workloads. You’ll put the developer experience at the front of your mind and implementations, and you’ll contribute deeply to understanding and preventing incidents through tooling, automation, and people-centered processes. All the while you’ll be thinking like an engineer. That means using automation to speed up repetitive tasks and to contribute to the security and compliance of the tech stack that you run and support.

What you need to be successful

You need to be well-versed in the basics building blocks of observability: metrics, logs, and traces. You should also be familiar with the tools we need to extract (think things like Prometheus, StatsD, OpenTelemetry libraries), transform (tools like Vector, Beats, FluentBit), and load (OpenSearch, Grafana, etc.) this data in ways that your fellow engineers can make sense of it. Then, you should be comfortable with helping your colleagues understand this data so they can measure and monitor their applications and recover quickly for incidents..

Next, you’ll need experience with at least one glue language like GoLang or Python.

Company & Role Analysis

JobSeeker+

Likely perks

Private MedicalPension25+ Days HolidayStock OptionsLearning BudgetFlexible Hours

NatWest Group

Edinburgh, UK

£45,395 – £45,395

Full-time

3 days ago

Join us as a Site Reliability Engineer, Financial Crime Services In this key role, you’ll support the improvement of non-functional and oper…

View Job

Apply Now Apply with CV Improver