Skip to content

Staff Infrastructure and Performance Engineer

San Francisco, CADevOps / SREHybrid6+ YOE
Summary

Owns performance, reliability, and scalability of core infrastructure using AWS, ECS/Fargate, and Postgres. Leads optimization, multi-region deployments, CI/CD, and observability for low-latency logistics systems. Requires 6+ years high-scale experience.

About the role

Responsibilities

  • Own infrastructure performance and reliability across Nash’s production systems, with a focus on low latency, high throughput, and predictable behavior under load.
  • Design, build, and optimize AWS-based infrastructure, leveraging managed services with a strong emphasis on ECS/Fargate.
  • Lead Postgres performance engineering, including query optimization, indexing strategies, connection management, replication, cluster design, and failover.
  • Architect and operate multi-region, highly availability systems with strong resiliency, disaster recovery, and failover guarantees.
  • Design and evolve enterprise-grade CI/CD pipelines that support safe, repeatable, and fast deployments across environments and regions.
  • Drive observability standards (metrics, logs, tracing, SLOs) and use data to proactively identify and eliminate performance bottlenecks.
  • Partner with application engineers to influence system design decisions that impact scalability, latency, and reliability.
  • Lead incident response and postmortems, focusing on root cause analysis, systemic fixes, and long-term resilience.
  • Set infrastructure and performance best practices and mentor engineers across the organization.

Requirements

  • 6+ years of experience building and operating high-scale, production infrastructure for business-critical systems.
  • Deep expertise in AWS, including networking, compute, storage, and managed services.
  • Hands-on experience running production workloads on ECS/Fargate at scale.
  • Strong background in Postgres, including performance tuning, replication, high availability, and operational excellence.
  • Proven experience designing and operating multi-region architectures with strict uptime and reliability requirements.
  • Strong understanding of CI/CD for enterprise deployments, including rollout strategies, environment isolation, and rollback safety.
  • Experience building low-latency systems where milliseconds matter.
  • Excellent debugging and systems-level problem-solving skills.
  • Ability to operate autonomously and lead technical initiatives in a fast-paced startup environment.

Compensation & Benefits

  • Competitive compensation and opportunity for equity
  • Flexible paid time off
  • Health, dental, and vision insurance
Skills
AWSECSFargatePostgresCI/CDKubernetesMulti-region architecturesObservabilitySLOsPerformance tuning