Skip to content

Senior Infrastructure Engineer

185k – 275kSan Francisco, CACaliforniaDevOps / SRERemote5+ YOE
Summary

Senior Infrastructure Engineer owns critical infrastructure decisions, builds scalable platforms using AWS and Terraform, ensures security/compliance, and mentors teams. Requires 5+ years AWS experience and expertise in monitoring tools like Datadog.

About the role

What You’ll Do

  • Design, build, and maintain core platform infrastructure to support scalable, reliable, and secure services across engineering teams.
  • Develop and manage Infrastructure as Code (IaC) using tools like Terraform to ensure consistent, reliable environments.
  • Own major architectural decisions and long-term technical direction for Fieldguide’s infrastructure.
  • Build reusable platforms, abstractions, and AI-enabled tooling that raise the baseline for engineering teams.
  • Monitor and improve system reliability, performance, and cost efficiency through metrics, logging, and alerting frameworks (e.g., Datadog, CloudWatch).
  • Ensure infrastructure security and compliance by implementing best practices for identity management, network segmentation, secrets handling, and vulnerability management.
  • Lead incident response and postmortem processes, driving root cause analysis and structural long-term improvements.
  • Mentor and collaborate with engineers and tech leads, fostering a culture of reliability, automation, and continuous improvement.
  • Support disaster recovery and business continuity planning, ensuring high availability and resilience of critical systems.
  • Document infrastructure design, architecture decisions, and operational procedures for transparency and team enablement.

Who You Are

  • You have 5+ years of hands-on experience constructing complex cloud solutions using multiple AWS services.
  • You are skilled in provisioning and configuring cloud services using Terraform and the AWS CLI / API.
  • You have proficiency in designing effective monitoring/alerting and log aggregation solutions using tools like Datadog and AWS CloudWatch (New Relic, Prometheus/Grafana, etc.)
  • You have a solid understanding of data systems, including both SQL and NoSQL.
  • You have experience in developing and maintaining software in security and regulatory compliance environments (SOC 2, PCI-DSS, HIPAA, etc.)
  • You are comfortable participating in on-call support to ensure 24/7 availability of services.
  • You have a passion for mentoring and coaching other engineers.
  • You have excellent communication and organizational skills and are capable of managing multiple competing priorities.
  • You have deep expertise designing systems and processes that make engineering teams measurably faster and more effective.
  • You can clearly communicate technical strategy to managers and executives.

Bonus Points

  • You have experience with GraphQL as a database front-end API.
  • You have experience with database system architecture (e.g., Postgres) and observability, to help us increase our overall database performance.
  • You have experience both working with AI, and with providing it as a tool for engineers and our internal applications to utilize.
  • You have experience working through and designing for security audits (e.g., SOC2, PCI, etc.)
Skills
AWSTerraformDatadogAWS CloudWatchPrometheusGrafanaKubernetesSQLNoSQLPostgres
Similar roles at this salary range
All DevOps / SRE jobs →
Plaid

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO
Fivetran

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP
Dropbox

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++
Okta

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Cribl

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3