Skip to content

Site Reliability Engineer

Site Reliability Engineer improves, manages, and monitors production-critical infrastructure and data pipelines in a finance AI/ML firm. Collaborates on fault-tolerance, deployments, automation, and on-call incident response using Python, Linux, and cloud tools. Requires 2+ years experience and quantitative degree.

120k – 160kNew York, NYNew YorkDevOps / SRERemote2+ YOE

About the role

Responsibilities

  • Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems
  • Diagnose and fix bugs in code
  • Lead complex deployments
  • Automate manual workflows
  • Track and prioritize outstanding production-related issues
  • Share an on-call rotation responding to incidents to ensure the continuous operation of production-critical systems

Requirements

  • Experience with coding and debugging Python
  • Experience with Linux
  • Familiarity with Relational Databases & SQL
  • Sharp analytical and problem-solving skills and a persistent drive to make things work (better)
  • Strong growth mindset and a passion for learning
  • Strong technical communication skills
  • Attention to detail
  • 2 years of relevant industry experience
  • An undergraduate degree or comparable training in a quantitative field or equivalent, relevant industry experience

Preferred Qualifications

  • Familiarity with best practices concerning code maintainability, documentation, quality assurance, continuous integration and deployment
  • Experience supporting production systems
  • Experience with any of the following: gRPC microservices, Postgres, Pandas, Golang, R, Git, Jenkins, Bazel, Prometheus, Grafana, Airflow, Kubernetes

Skills

PythonLinuxSQLKubernetesPrometheusGrafanaPostgrespandasGoAirflowJenkinsGitgRPCBazelR

Similar roles

DevOps / SRE jobs

Software Engineer - Developer Infrastructure

Builds and improves core libraries, frameworks, and developer tools like Bazel and Buildkite CI/CD to boost engineering productivity. Requires 2+ years experience, Bachelor's in CS, and expertise in Go/C++/Python/TypeScript.

120k – 300kSunnyvale, CADevOps / SREOn-site2+ YOEGoC++

Capacity Ops Associate

Manages GPU fleet operations, including node maintenance, capacity fulfillment, and technical orchestration between SRE/infra teams and customers. Requires 2+ years experience, Kubernetes familiarity, and strong communication skills.

120k – 160kSan Francisco, CA +1DevOps / SREHybrid2+ YOEGPUSRE

Platform Operations Engineer

Leads cross-functional technical projects to optimize tech stack, build custom automation and analytics solutions for business operations, and integrate systems using AWS, APIs, and databases. Requires 2+ years experience with Python/SQL proficiency and onsite presence in New York.

120k – 200kNew York, NYDevOps / SREOn-site2+ YOESQLAWS

Devops - Internal Platform & Tools

Develop and maintain internal developer tooling and infrastructure across air-gapped, cloud, and on-prem environments. Collaborate on deployments, AI adoption, and custom tools to accelerate developers and clients, requiring 2+ years in infrastructure and cloud services.

130k – 230kNew York, NY +2DevOps / SREOn-site2+ YOEAWSGCP

Software Engineer, DevOps / Infrastructure

DevOps Engineer builds and maintains CI/CD pipelines, ML model infrastructure, and automated testing for AI image/video software products. Requires 2+ years experience, C++ build tools expertise, and cloud platforms like AWS/Azure.

110k – 160kDallas, TXDevOps / SREOn-site2+ YOEQtGo