Skip to content

Senior Database Reliability Engineer

145k – 230kSan Francisco, CADevOps / SRERemote5+ YOE
Summary

Senior IC role owning reliability, performance, and scalability of PostgreSQL (Aurora), OpenSearch, Redis, and CDC pipeline to Snowflake. Sets standards for ORM usage, migration safety, and observability at scale.

About the role

What You'll Do

  • Own database reliability across Aurora, OpenSearch, Redis, and our CDC pipeline — including schema design reviews, migration safety (locks, backfills, concurrent index builds, NOT VALID constraints), and incident response for the data tier
  • Make the Django ORM a strength at scale: catch N+1 patterns in review, extend QuerySet conventions and physical schema standards, and build the CI checks and AGENTS.md scaffolding that encode those standards so they scale beyond any single reviewer
  • Operate and evolve the CDC pipeline from Aurora through DMS to S3 Parquet to Snowflake – including replication slot hygiene, schema evolution safety, and automated checks that catch migrations likely to break downstream consumers before they ship
  • Build and improve observability across pganalyze, CloudWatch, and Honeycomb, with Django-side instrumentation that ties slow ORM queries back to specific users, flags, and deploys
  • Drive multi-AZ resilience within our single-region architecture — Aurora writer/reader placement, failover behavior, RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability
  • Build self-service tooling and dashboards that give product and platform teams visibility into their own query footprint, reducing the review burden as the engineering org grows
  • Contribute to onboarding and knowledge-sharing as a large incoming class of engineers joins — write docs, run internal sessions on "what your ORM query is really doing," and feed that knowledge back into AI review tooling

What We're Looking For

  • Has deep PostgreSQL expertise in practice: reads EXPLAIN (ANALYZE, BUFFERS) fluently, understands MVCC, bloat, lock contention, and vacuum behavior, and can tune Aurora Serverless V2 for latency and throughput
  • Has worked with an ORM (Django, SQLAlchemy, ActiveRecord, or similar) at production scale – can predict the SQL a query generates, spot N+1 issues on sight, and knows when joins beat batched IN queries and when they don't
  • Has run CDC pipelines in production, ideally with AWS DMS — comfortable with logical replication, slot hygiene, schema evolution, and Parquet-based data lakes feeding Snowflake, BigQuery, or Redshift
  • Has hands-on experience with pganalyze (or Datadog DBM / pg_stat_statements pipelines), CloudWatch, and Honeycomb (or another high-cardinality tracing tool); comfortable with OpenTelemetry
  • Has worked with OpenSearch, Redis, and at least one production message broker (SQS, RabbitMQ, or Kafka) at scale
  • Writes real automation — Python, Go, or similar — and has used Terraform or comparable IaC to manage infrastructure
  • Has used AI coding and review tools in a team setting: written or maintained AGENTS.md files, configured review agents, iterated on prompts

Nice to Have

  • Event sourcing on Postgres, or experience with alternate CDC tooling (Debezium, Fivetran, Airbyte)
  • pgbouncer or RDS Proxy at scale with Django connection handling
  • Deep Honeycomb usage: SLOs, BubbleUp, Triggers, derived columns
  • Snowflake from the producer side: staging, Snowpipe, external tables on Parquet
  • Experience scaling data infrastructure through rapid engineering headcount growth
  • SOC 2 Type II, GDPR, or similar compliance work
Skills
PostgreSQLDjango ORMAWS Aurora ServerlessAWS DMSCDC pipelinesOpenSearchRedisRabbitMQpganalyzeHoneycombTerraformPythonOpenTelemetry
Similar roles at this salary range
All DevOps / SRE jobs →
Northwood Space

Senior Network Engineer

Design, deploy, and operate enterprise network infrastructure for corporate facilities and hybrid cloud environments with zero-trust architecture and compliance requirements. Requires 5+ years enterprise networking experience and ability to obtain TS/SCI clearance.

133k – 215kLos Angeles, CA +1DevOps / SREOn-site5+ YOEAWSVLAN
Forterra

Senior Software Engineer-Internal Tools

Senior Software Engineer on the DevOps and Tooling team building internal tools. Requires 3-5+ years experience, Rust or strong systems background, TypeScript/React, Linux, Docker, and CI/CD.

125k – 140kArlington, VA +1DevOps / SREOn-site5+ YOEAWSRust
Beacon AI

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud infrastructure and LLM platform services including RAG pipelines, vector search, model endpoints, and data ingestion for an aviation AI company.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSGlue
MongoDB

Site Reliability Engineer

Senior or Staff Site Reliability Engineer focused on continuous delivery infrastructure using Argo Workflows, ArgoCD, and Kubernetes. Owns deployment tooling, onboarding flows, and participates in 24/7 on-call. Requires 6+ years building and operating distributed systems.

127k – 249kBoston, MA +6DevOps / SREHybrid6+ YOEGoAWS
CommandLink

Senior Network Engineer

Senior Network Engineer building and supporting carrier interconnects, private circuits, NNIs, and cloud connectivity for a managed network services provider. Requires hands-on service provider experience with Layer 2/3 protocols and direct carrier coordination.

120k – 160kUnited StatesDevOps / SRERemote5+ YOEBGPVRF