Site Reliability Engineer

Site Reliability Engineer enhances system observability, reliability, and availability at a prediction markets platform. Builds automation, optimizes cloud infrastructure (Kubernetes, Docker, Terraform), debugs issues, and participates in on-call rotations. Requires 4+ years software engineering experience.

100k – 250kNew York, NYDevOps / SREOnsite4+ YOE

Apply

About the role

What You’ll Do

Improve observability, reliability, and service availability by defining and measuring key metrics
Build automation and systems that eliminate toil and reduce operational burden
Collaborate with core infrastructure engineers to performance-tune and optimize cloud deployments (Docker, Terraform, Kubernetes, EC2, etc.)
Partner with product teams to minimize service disruptions and automate incident response
Identify and analyze reliability problems across the stack, designing and implementing software for significant, long-term improvements
Mentor engineers and drive a culture where reliability is a core engineering value
Write high-quality, well-tested code that supports internal and external customer needs
Debug complex technical issues and improve system usability, operability, and diagnosability
Review feature designs across the company and ensure security, safety, scalability, and architectural clarity
Build and maintain integrations with third-party vendors
Participate in on-call rotations to troubleshoot and resolve urgent issues

What You Bring

4+ years of software engineering experience
Experience designing, building, scaling, and maintaining production services and service-oriented architectures
Strong system design, coding, debugging, performance-tuning, and observability skills
High-quality coding practices with strong testing discipline
Excellent written and verbal communication; comfort working transparently across teams
Strong interpersonal skills across junior-to-principal engineering levels
Ability to think clearly under pressure and dive into any layer of the stack
Passion for building an open financial system that connects the world
Willingness to participate in on-call rotations and swiftly resolve issues

Bonus Points:

Experience designing highly reliable, high-throughput, low-latency systems
Experience with Datadog
Experience with Rust, Go, and Terraform
Experience with AWS, GCP, or Azure
Experience operating in regulated environments
Experience writing training materials or company-facing engineering content

NYC Pay Transparency

Salary: $100,000–$250,000 annually, plus equity and benefits.

Skills

KubernetesDockerTerraformDatadogAWSGCPAzureRustGoEC2

Similar roles

DevOps / SRE jobs

Fluidstack

Data Center Operations, Network Technician Lead

Leads on-site network and structured cabling troubleshooting in AI data centers as Tier 2 escalation, handling fiber faults, repairs, validation, and tooling deployment. Requires 3+ years datacenter experience with fiber tools and basic network diagnostics.

100k – 150kAbernathy, TX +1DevOps / SREOn-site3+ YOEVflSQL

Coram AI

Software Engineer - Infrastructure

Builds and maintains edge and cloud infrastructure for IoT devices and AI video security platform, including AWS provisioning, Kubernetes orchestration, CI/CD pipelines, and observability. Requires 3+ years in AWS IaC, Docker/K8s, and Python/Go.

100k – 180kSunnyvale, CADevOps / SREOn-site3+ YOEGoAWS

Astera

Site Reliability Engineer

Owns digital infrastructure for AI research, managing compute access, auto-scaling, resource visibility, and reproducibility using Kubernetes and observability tools. Requires systems intuition, operational rigor, and pragmatism for experimental workloads.

100k – 300kEmeryville, CADevOps / SREHybridDockerPython

Kalshi

Infrastructure Engineer

Designs, builds, and scales infrastructure for a prediction market exchange including AWS, Kubernetes, high-performance APIs, and clearing systems. Requires 3+ years experience with strong fundamentals in cloud, containers, and DevOps tooling.

100k – 250kNew York, NYDevOps / SREOn-site3+ YOEAWSRds

Ai2

Infrastructure Engineer

Builds and maintains automated IT infrastructure pipelines using Python, Terraform, and cloud providers to support company operations. Requires 5+ years experience with focus on automation, strong coding, and collaboration skills.

100k – 150kSeattle, WADevOps / SREOn-site5+ YOEAWSGCP