Senior Database Reliability Engineer (DBRE)
Designs, operates, and optimizes large-scale PostgreSQL and MySQL databases for mission-critical systems. Builds automation, monitoring, and high-availability infrastructure while leading incident response and collaborating with engineering teams. Requires 4+ years PostgreSQL experience.
Responsibilities
Architecture, Reliability & Performance
- Design, implement, and operate highly available PostgreSQL clusters (physical replication, logical replication, sharding/partitioning, failover automation).
- Optimize query performance, indexing strategies, schema design, and storage engines.
- Perform capacity planning, growth forecasting, and workload modeling.
- Own high-availability strategies including automatic failover, multi-AZ/multi-region setups, and disaster recovery.
Automation & Tooling
- Develop automation for provisioning, configuration, backups, failovers, vacuum tuning, and schema management using Terraform, Ansible, Kubernetes Operators, or custom tooling.
- Build monitoring, alerting, and self-healing systems for PostgreSQL and MySQL.
Operations & Incident Response
- Lead response during database incidents—performance regressions, replication lag, deadlocks, bloat issues, storage failures, etc.
- Conduct root-cause analysis and implement permanent fixes.
Cross-Functional Collaboration
- Partner with software engineers to review SQL, optimize schemas, and ensure efficient use of PostgreSQL features.
- Provide guidance on database-related design patterns, migrations, version upgrades, and best practices.
Required Qualifications
- 4+ years of hands-on PostgreSQL experience in high-volume, distributed, or large-scale production environments.
- Strong knowledge of PostgreSQL internals (WAL, MVCC, bloat/vacuum tuning, query planner, indexing, logical replication).
- Production experience with MySQL (InnoDB internals, replication, performance tuning).
- Advanced SQL and strong understanding of schema design and query optimization.
- Experience with Linux systems, networking fundamentals, and systems troubleshooting.
- Experience building automation with Go or Python.
- Production experience with monitoring tools (Prometheus, Grafana, Datadog, PMM, pg_stat_statements, etc.).
- Hands-on experience with cloud environments (AWS or GCP).
Preferred/Bonus Qualifications
- Experience with PgBouncer, HAProxy, or other connection-pooling/load-balancing layers.
- Exposure to event streaming (Kafka, Debezium) and change data capture.
- Experience supporting 24/7 production environments with on-call rotation.
- Contributions to open-source PostgreSQL ecosystem.
Senior Infrastructure Engineer
Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.
Senior Developer Experience Engineer
Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.
Senior Site Reliability Engineer
Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.