Skip to content

Software Engineer, DevOps

Designs and builds scalable infrastructure for AI products, focusing on cloud platforms, Kubernetes orchestration, CI/CD pipelines, and observability. Requires 3+ years in infrastructure engineering and bachelor's/master's in CS.

135k – 225kPalo Alto, CASan Francisco, CADevOps / SREHybrid3+ YOE

About the role

Responsibilities

  • Partner with product teams to architect, design, and build the foundational infrastructure for our products.
  • Design, develop, and deploy highly available and scalable Multi-tenant SaaS solutions on public cloud networks like AWS, Azure, and GCP. Leverage technologies such as Kubernetes, Helm, Terraform, and Istio to achieve infrastructure resilience.
  • Drive the automation of infrastructure tasks, from provisioning to configuration management and deployment, utilizing tools like Terraform, Ansible, and Kubernetes.
  • Collaborate closely with the software development team to refine CI/CD pipelines, e.g., using GitHub Actions and Cloud Build tools, enhance service interfaces, and improve the overall developer experience.
  • Architect and implement advanced observability solutions using tools like Prometheus and Grafana. Ensure real-time alerting and error tracking with Sentry and Pagerduty to maintain system health and performance.
  • Deploy comprehensive testing frameworks, including tools like Selenium for end-to-end testing. Ensure robust integration and system testing to maintain software quality.
  • Regularly monitor system health, analyze performance metrics, and recommend enhancements. This includes optimizing database queries and ensuring peak database performance.

Nice to Have

  • MLOps experience
  • Experience with Postgres query optimization and related performance improvement techniques.
  • Experience with event-driven data and machine learning infrastructure, including streaming pipelines, database systems, model training
  • Experience with air-gapped cloud environments or private clouds
  • Experience administering complex deployments on Azure, especially AKS

Qualifications

  • Bachelor's or Master's degree in Computer Science or related field.
  • 3+ years of experience in Infrastructure engineering, or a similar role
  • Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment.
  • Ability to work independently and as part of a team
  • Experience working with global teams

Compensation (California based candidates)

  • Standard base salary: $135,000-$225,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

Skills

KubernetesTerraformAWSAzureGCPHelmIstioAnsibleGitHub ActionsPrometheusGrafanaSentryPagerdutySeleniumPostgres

Similar roles

DevOps / SRE jobs

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering RAG, inference, and data pipelines for an aviation AI platform. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSVpc

Forward Deployed SRE

Site Reliability Engineer owns reliability of multi-cloud Kubernetes infrastructure for AI/ML platform, builds observability tooling as code, automates mitigations, leads incident response, and defines SLOs/SLIs. Requires extensive Kubernetes and observability experience.

135k – 285kSan Francisco, CA +1DevOps / SREHybridEKSGKE

Software Engineer - Platform

Build scalable infrastructure, integrations, and data platforms powering workforce management and AI agent products at enterprise scale. Requires 5+ years in backend/platform systems, with expertise in AWS, Kubernetes, Go/Python, and datastores like Postgres and Snowflake.

135k – 280kNew York, NYDevOps / SREOn-site5+ YOEGoAWS

Cloud Infrastructure Engineer

Designs, deploys, and improves scalable blockchain infrastructure using Kubernetes, Terraform, and cloud tools. Drives AI enablement, builds observability with Prometheus/Grafana, manages multi-cloud networks, and leads incident response. Requires 5+ years in SRE/infrastructure with strong automation focus.

135k – 240kSan Francisco, CA +1DevOps / SREHybrid5+ YOEAWSGCP

Platform Ops Lead

Leads platform operations team supporting developers on GitLab and Kubernetes-based DevOps platform. Resolves deployment issues, manages on-call support, trains team members, and ensures SLOs in microservices environment. Requires BS in STEM, Linux skills, and scripting experience.

135k – 165kBethesda, MDDevOps / SRERemoteAWSBash