Senior Software Engineer
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidd…
At Datadog, we leverage AI across our observability platform to improve monitoring, speed up incident resolution, and ensure data reliability for cloud applications.
Datadog’s Deployment Gates team builds customer-facing systems that decide whether software should ship to production. Deployment Gates sit directly in customers’ CI/CD pipelines and use observability data to answer one of the hardest questions in software delivery: Given everything we know right now, is this deployment safe to proceed?
Work on an analysis service that prevents incidents by detecting faulty software changes in production before they reach clients
Lead the design of progressive deployment automation, starting with zero-setup, conservative AI rules and evolving toward adaptive gates that learn from incidents and organizational patterns
Design the foundation for autonomous remediation, connecting what changed in code to what broke in production, from blocking and rolling back unsafe deployments to proposing fixes and enforcing policies
This is a highly product‑minded engineering role: you’ll work from problem discovery and UX all the way to reliable, scalable production systems.
At Datadog, we place value in our office culture - the relationships that it builds, the creativity it brings to the table, and the collaboration of being together. We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them.
Build AI-driven deployment gates: Design and ship decision systems that evaluate customer deployments using CI/CD context and Datadog telemetry, producing safe, explainable allow/block outcomes
Own evals and rollout: Define precision, recall, and trust metrics; build offline and online evals; validate changes in shadow mode; and safely promote improvements to enforcement
Design for robustness and safety: Implement conservative defaults, guardrails, fallbacks, and human-in-the-loop paths so gates behave predictably under noisy or incomplete data
Partner closely with Product: Work hand-in-hand with the Product Manager to translate customer problems, adoption signals, and roadmap goals into concrete technical decisions and iterations
Integrate across the Datadog platform: Partner with internal AI teams building the Faulty Deployment Detection pipeline, as well as teams working on LLMs and AI agents
Own production systems: Build and operate reliable backend services that run in the critical path of customer deployments, and be on-call for those services
A Product‑minded engineer who ships AI to production
You have 5+ years experience with backend systems and microservices performance: tracing, latency breakdowns, concurrency, and resiliency patterns
You are proficient in a modern programming language; strong API/service design; production ops (monitoring, alerting, on‑call rotation)
You have proven experience delivering software based on LLM/agent features to production
You are comfortable owning user journeys, iterating from prototype → alpha → GA, and measuring impact with clear product metrics
An End-to-end AI implementation owner: You understands the end-to-end LLM product lifecycle
Fluent with offline/online evals for AI systems
You have demonstrated ability to use AI coding tools in day-to-day workflows and validate, critique, and refine AI-generated output
Experience with Continuous Delivery tools (i.e. ArgoCD / Argo Rollouts, Spinnaker, Octopus Deploy)
Exposure to planning/agent frameworks, tool‑use orchestration, RAG, and retrieval/indexing for observability data
You’re motivated to push the boundaries of how AI can improve software engineering best practices and contribute to building AI-enabled products
Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we
Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.
£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidd…
We’re a new team building AI-assisted tools to make Datadog developers more effective, by autonomously generating tests, fixing bugs, and im…
Come join the Server Ingress Security team, where we are rearchitecting MongoDB Server’s ingress networking to make MongoDB clusters even mo…
We are seeking a Senior Engineer to join our Migration Services team, which owns Atlas Live Migration. This diverse group of engineers is re…
Senior Software Developer, AI Engineering, London, COR7550 Are you a Software Engineer with a passion for AI and building scalable systems…
An excellent opportunity to join one of the UK's leading software consultancies, renowned for its strong presence in the public sector while…