KeyStep

Principal Data Scientist - Agent Builder

Elastic
United Kingdom
about 4 hours ago
full-timeEnterprise Search - Workchat

Skills & Technologies

BackendElasticsearchCloudNLPA/B TestingLLMRAGVector SearchRoadmapStrategyAILeadershipInfluencing

Job Description

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI.

What is The Role

The Search Conversational Experiences team builds Elastic’s new conversational and agentic platform that lets customers chat with their own data in Elasticsearch. We build the core quality layer for RAG, agents and tools, retrieval and citations, streaming, memory, and the evaluation signals that turn open-ended questions into grounded, reliable answers.

As a Principal Data Scientist, you will help set the technical direction for how we evaluate, improve, and scale chat quality across Elastic’s agentic platform. You will define the evaluation strategy that guides product decisions, including which models we standardize on, how we route requests across agents, which tools we enable and when, and how we tailor agents to different Elastic use cases in search and beyond. You will work closely with backend engineering, product, UX, and other data scientists to turn ambiguous, cutting-edge problems into measurable product improvements.

You’ll help lead work on frontier problems such as folding RAG and vector search into an agent’s knowledge base, dynamically enriching model context to improve groundedness, shaping reasoning strategies and tool-selection policies, lighting up agent-driven visualizations on top of Elasticsearch data, and exploring multimodality where it can create meaningful user value. This is an applied leadership role: you will prototype, evaluate, influence roadmap direction, and help teams ship improvements that customers can feel.

What You Will Be Doing

Define the evaluation strategy for conversational and agentic search, including offline and online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness and citation checks, and A/B testing.

Lead the design of quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.

Build, compare, and guide improvements across retrieval and re-ranking approaches, including sparse and dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.

Turn experimental results into product and business decisions: which models to use, how to route requests efficiently, which tools should be exposed, and how agents should be customized for different Elastic use cases.

Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality, helpfulness, dedication, latency, and cost.

Influence the roadmap by identifying the highest-leverage quality gaps, proposing practical solutions, and communicating trade-offs clearly to product, engineering, and leadership.

Mentor other data scientists and engineers in experiment design, evaluation methodology, statistical rigor, and practical approaches to improving LLM-powered systems.

Share outcomes through clear docs, notebooks, PRs, dashboards, technical proposals, and cross-functional reviews.

What You Bring

8+ years of applied DS/ML experience, with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences.

Strong track record defining and leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge approaches, groundedness, citation quality, and model comparison.

Experience influencing product a

Company & Role Analysis

JobSeeker+
Likely perks
Private MedicalPension25+ Days HolidayStock OptionsLearning BudgetFlexible Hours
Culture & working style

Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.

Market salary range

£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)

Unlock the full analysis for this job
Sign in to unlock →

Similar roles

See more
Elastic
United Kingdom
Full-time
about 4 hours ago

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the…

View Job
Microsoft
London, UK
£62,010 – £62,010
Contract
5 days ago

Overview Do you enjoy solving problems, looking at problems through a different lens, and working closely with customers to innovate new sol…

View Job
Sky
EH547HH
Full-time
Hybrid
7 days ago

We don't just believe in better. We make it happen. " Better content. Better products. And better careers." " " Working in Tech, Produ…

View Job
. Chelsea and Westminster Hospital NHS Foundation Trust
London, UK
£69,742 – £69,742
Full-time
8 days ago

Job summary Imperial College Health Partners (ICHP) is seeking an experienced Principal Data Scientisttoprovide senior technical leadership…

View Job
Lendable
London, UK
£57,101 – £57,101
Full-time
8 days ago

About Lendable Lendable is on a mission to build the world's best technology to help people get credit and save money. We're building one of…

View Job
Data Idols
London, UK
£95,000 – £105,000
Full-time
10 days ago

Salary: £95,000 - 105,000 per year Requirements: Advanced SQL skills Experience with analytics engineering tooling such as dbt Experience us…

View Job
Apply NowApply with CV Improver