About the Opportunity We are seeking a Site Reliability Engineer to join the Platform Engineering Domain in the Access Team. The mission o…
Site Reliability Engineer
Skills & Technologies
Job Description
GAQ127R40
Team: IT Infrastructure and Operations
About the Role
At Databricks Information Technology, we are a product-led organization transforming how we work—from the ease of using our IT services to the applications we develop to scale seamlessly during rapid growth.
As a Site Reliability Engineer (SRE), you will bridge the gap between software engineering and systems architecture. You will be a core contributor to the IT Infrastructure team, owning the evolution of core infrastructure and observability platforms. This role requires a strong software engineering mindset and deep technical breadth to deliver high-quality, scalable solutions for "immature" system problems. Your focus will be on building resilient, automated infrastructure that empowers development teams and ensures our cloud environment is cost-optimized, secure, and highly available.
The Impact You Will Have
Architect and Automate: Design and deploy production-grade infrastructure on cloud platforms (AWS/Azure) using Infrastructure as Code (IaC) tools like Terraform or Pulumi.
Reliability and Performance Engineering:Optimize system performance, architecture, and scaling to ensure maximum uptime and minimal latency for critical IT services.
CI/CD Excellence: Architect robust deployment pipelines (e.g., GitHub Actions), managing both hosted and self-hosted runners for specialized build requirements.
Observable by Default: Create underlying infrastructure to ensure new internal applications are secure and have logging, metrics and alerts enabled by default.
Agentic ToolingI: Build internal AI plugins, and automation scripts to streamline developer workflows and enhance operational efficiency.
Incident Response: Focus on subsequent data usage, incident management workflows, and creating necessary dashboards to maintain service health. Participate in a shared on-call rotation, leading rapid incident response and technical troubleshooting for production outages.Facilitate blameless post-mor
Company & Role Analysis
JobSeeker+Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.
£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)
Similar roles
See moreAbout the Opportunity We are seeking a Site Reliability Engineer to join the Platform Engineering Domain in the Access Team. The mission o…
About the Opportunity We are seeking a Senior Site Reliability Engineer to join the Platform Engineering Domain in the Access Team. The mi…
About the Opportunity We are seeking a Senior Site Reliability Engineer to join the Platform Engineering Domain in the Access Team. The mi…
As a TPM for SRE, you will partner with SRE leaders and engineers to scale the platform that underpins all of MongoDB’s cloud products. You…
As a TPM for SRE, you will partner with SRE leaders and engineers to scale the platform that underpins all of MongoDB’s cloud products. You…