Senior Infrastructure Engineer

185k – 275kSan Francisco, CACaliforniaDevOps / SRERemote5+ YOEFeb 27

Summary

Senior Infrastructure Engineer owns critical infrastructure decisions, builds scalable platforms using AWS and Terraform, ensures security/compliance, and mentors teams. Requires 5+ years AWS experience and expertise in monitoring tools like Datadog.

About the role

What You’ll Do

Design, build, and maintain core platform infrastructure to support scalable, reliable, and secure services across engineering teams.
Develop and manage Infrastructure as Code (IaC) using tools like Terraform to ensure consistent, reliable environments.
Own major architectural decisions and long-term technical direction for Fieldguide’s infrastructure.
Build reusable platforms, abstractions, and AI-enabled tooling that raise the baseline for engineering teams.
Monitor and improve system reliability, performance, and cost efficiency through metrics, logging, and alerting frameworks (e.g., Datadog, CloudWatch).
Ensure infrastructure security and compliance by implementing best practices for identity management, network segmentation, secrets handling, and vulnerability management.
Lead incident response and postmortem processes, driving root cause analysis and structural long-term improvements.
Mentor and collaborate with engineers and tech leads, fostering a culture of reliability, automation, and continuous improvement.
Support disaster recovery and business continuity planning, ensuring high availability and resilience of critical systems.
Document infrastructure design, architecture decisions, and operational procedures for transparency and team enablement.

Who You Are

You have 5+ years of hands-on experience constructing complex cloud solutions using multiple AWS services.
You are skilled in provisioning and configuring cloud services using Terraform and the AWS CLI / API.
You have proficiency in designing effective monitoring/alerting and log aggregation solutions using tools like Datadog and AWS CloudWatch (New Relic, Prometheus/Grafana, etc.)
You have a solid understanding of data systems, including both SQL and NoSQL.
You have experience in developing and maintaining software in security and regulatory compliance environments (SOC 2, PCI-DSS, HIPAA, etc.)
You are comfortable participating in on-call support to ensure 24/7 availability of services.
You have a passion for mentoring and coaching other engineers.
You have excellent communication and organizational skills and are capable of managing multiple competing priorities.
You have deep expertise designing systems and processes that make engineering teams measurably faster and more effective.
You can clearly communicate technical strategy to managers and executives.

Bonus Points

You have experience with GraphQL as a database front-end API.
You have experience with database system architecture (e.g., Postgres) and observability, to help us increase our overall database performance.
You have experience both working with AI, and with providing it as a tool for engineers and our internal applications to utilize.
You have experience working through and designing for security audits (e.g., SOC2, PCI, etc.)

Skills

AWSTerraformDatadogAWS CloudWatchPrometheusGrafanaKubernetesSQLNoSQLPostgres

Similar roles at this salary range

All DevOps / SRE jobs →

Plaid

Jun 19

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO

Fivetran

Jun 18

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Dropbox

Jun 18

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++

Okta

Jun 17

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE

Cribl

Jun 17

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Apply