KeyStep

Senior Software Engineer (AI)

Datadog
Madrid, Spain
about 3 hours ago
full-timeDev Eng

Skills & Technologies

BackendSoftware EngineeringAPIMicroservicesCI/CDArgoCDDatadogCloudLLMRAGSAFeRoadmapImplementationDeploymentService DesignAIAutomation

Job Description

At Datadog, we leverage AI across our observability platform to improve monitoring, speed up incident resolution, and ensure data reliability for cloud applications.

Datadog’s Deployment Gates team builds customer-facing systems that decide whether software should ship to production. Deployment Gates sit directly in customers’ CI/CD pipelines and use observability data to answer one of the hardest questions in software delivery: Given everything we know right now, is this deployment safe to proceed?

In this role, you will

Work on an analysis service that prevents incidents by detecting faulty software changes in production before they reach clients

Lead the design of progressive deployment automation, starting with zero-setup, conservative AI rules and evolving toward adaptive gates that learn from incidents and organizational patterns

Design the foundation for autonomous remediation, connecting what changed in code to what broke in production, from blocking and rolling back unsafe deployments to proposing fixes and enforcing policies

This is a highly product‑minded engineering role: you’ll work from problem discovery and UX all the way to reliable, scalable production systems.

At Datadog, we place value in our office culture - the relationships that it builds, the creativity it brings to the table, and the collaboration of being together. We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them.

What you’ll do

Build AI-driven deployment gates: Design and ship decision systems that evaluate customer deployments using CI/CD context and Datadog telemetry, producing safe, explainable allow/block outcomes

Own evals and rollout: Define precision, recall, and trust metrics; build offline and online evals; validate changes in shadow mode; and safely promote improvements to enforcement

Design for robustness and safety: Implement conservative defaults, guardrails, fallbacks, and human-in-the-loop paths so gates behave predictably under noisy or incomplete data

Partner closely with Product: Work hand-in-hand with the Product Manager to translate customer problems, adoption signals, and roadmap goals into concrete technical decisions and iterations

Integrate across the Datadog platform: Partner with internal AI teams building the Faulty Deployment Detection pipeline, as well as teams working on LLMs and AI agents

Own production systems: Build and operate reliable backend services that run in the critical path of customer deployments, and be on-call for those services

Who you are

A Product‑minded engineer who ships AI to production

You have 5+ years experience with backend systems and microservices performance: tracing, latency breakdowns, concurrency, and resiliency patterns

You are proficient in a modern programming language; strong API/service design; production ops (monitoring, alerting, on‑call rotation)

You have proven experience delivering software based on LLM/agent features to production

You are comfortable owning user journeys, iterating from prototype → alpha → GA, and measuring impact with clear product metrics

An End-to-end AI implementation owner: You understands the end-to-end LLM product lifecycle

Fluent with offline/online evals for AI systems

You have demonstrated ability to use AI coding tools in day-to-day workflows and validate, critique, and refine AI-generated output

Bonus Point

Experience with Continuous Delivery tools (i.e. ArgoCD / Argo Rollouts, Spinnaker, Octopus Deploy)

Exposure to planning/agent frameworks, tool‑use orchestration, RAG, and retrieval/indexing for observability data

You’re motivated to push the boundaries of how AI can improve software engineering best practices and contribute to building AI-enabled products

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we

Company & Role Analysis

JobSeeker+
Likely perks
Private MedicalPension25+ Days HolidayStock OptionsLearning BudgetFlexible Hours
Culture & working style

Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.

Market salary range

£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)

Unlock the full analysis for this job
Sign in to unlock →

Similar roles

See more
Affirm
Remote US
Full-time
Remote
about 3 hours ago

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidd…

View Job
MongoDB
Dublin, Ireland
Full-time
about 20 hours ago

We are seeking a Senior Engineer to join our Migration Services team, which owns Atlas Live Migration. This diverse group of engineers is re…

View Job
Sanderson
London, UK
£50,000 – £60,000
Full-time
Remote
about 10 hours ago

An excellent opportunity to join one of the UK's leading software consultancies, renowned for its strong presence in the public sector while…

View Job
Apply NowApply with CV Improver