Responsibilities

Own infrastructure performance and reliability across Nash’s production systems, with a focus on low latency, high throughput, and predictable behavior under load.
Design, build, and optimize AWS-based infrastructure, leveraging managed services with a strong emphasis on ECS/Fargate.
Lead Postgres performance engineering, including query optimization, indexing strategies, connection management, replication, cluster design, and failover.
Architect and operate multi-region, highly availability systems with strong resiliency, disaster recovery, and failover guarantees.
Design and evolve enterprise-grade CI/CD pipelines that support safe, repeatable, and fast deployments across environments and regions.
Drive observability standards (metrics, logs, tracing, SLOs) and use data to proactively identify and eliminate performance bottlenecks.
Partner with application engineers to influence system design decisions that impact scalability, latency, and reliability.
Lead incident response and postmortems, focusing on root cause analysis, systemic fixes, and long-term resilience.
Set infrastructure and performance best practices and mentor engineers across the organization.

Requirements

6+ years of experience building and operating high-scale, production infrastructure for business-critical systems.
Deep expertise in AWS, including networking, compute, storage, and managed services.
Hands-on experience running production workloads on ECS/Fargate at scale.
Strong background in Postgres, including performance tuning, replication, high availability, and operational excellence.
Proven experience designing and operating multi-region architectures with strict uptime and reliability requirements.
Strong understanding of CI/CD for enterprise deployments, including rollout strategies, environment isolation, and rollback safety.
Experience building low-latency systems where milliseconds matter.
Excellent debugging and systems-level problem-solving skills.
Ability to operate autonomously and lead technical initiatives in a fast-paced startup environment.