Skip to content

Software Engineer, Infrastructure

Builds and maintains scalable infrastructure for real-time telemetry platform using cloud, containers, and DevOps tools. Requires 3+ years in distributed systems, hands-on with Kubernetes, Docker, AWS/GCP/Azure.

150k – 200kMarina del Rey, CALos Angeles, CASan Francisco, CADevOps / SREHybrid3+ YOE

About the role

Responsibilities

  • Design, build, and maintain scalable, resilient infrastructure solutions to support our growing platform and customer base.
  • Collaborate with software engineers to optimize application performance and reliability.
  • Implement monitoring, alerting, and logging systems to ensure proactive identification and resolution of issues.
  • Automate deployment processes and streamline infrastructure management using modern DevOps tools and methodologies.
  • Evolve our backend architecture/infrastructure for both cloud and on-premise deployments.
  • Work with the team to set and prioritize our roadmap to maximize customer impact.
  • Lead initiatives to improve infrastructure reliability, performance, and cost efficiency.

Requirements

  • 3+ years of relevant distributed systems experience focusing on designing and managing cloud-based environments (e.g., AWS, Azure, GCP).
  • Hands-on experience with containerization technologies (Docker, Kubernetes) and container orchestration platforms.
  • Passion for building and operating developer productivity tools, frameworks, and other aspects of platform engineering.
  • Familiarity with CI/CD pipelines and version control systems.
  • Knowledge of automated testing best practices and frameworks, ensuring software reliability through integration, performance, and end-to-end testing in distributed systems.
  • Good knowledge of cloud, on-prem, networking, and service architecture in multi-region multi-cloud setups.
  • Excited by the ambiguity and high-ownership culture of early-stage startups.
  • Pragmatic, solution-oriented, and scrappy.
  • Enjoy working collaboratively with a broad range of job functions and roles.

Tech Stack

  • Go, Java, React, TypeScript, Kafka/Redpanda, Flink, AWS, Azure, GCP, Docker, Kubernetes, Terraform/CDK.
  • Web frontend & backend: ECharts, Go, gRPC, PostgreSQL, Protobuf, Radix, React, Redux, TypeScript.
  • Data: Arrow, DataFusion, Flink, Parquet, Rust.
  • Infrastructure: Argo CD, AWS, Docker, GitHub Actions, Grafana, Kubernetes, Kustomize, Linux, Prometheus, Terragrunt.

Compensation

  • Salary range: $150,000 - $200,000 per year. Plus equity and benefits.

Skills

KubernetesDockerAWSGCPAzureTerraformGoPrometheusGrafanaCI/CDArgo CdTerragruntKafkaFlinkgRPC

Similar roles

DevOps / SRE jobs

Software Engineer - Networking Software and Services

Build software, services, and frameworks for network management, automation, and monitoring of large-scale GPU supercomputing fabrics. Requires deep network protocol knowledge and experience orchestrating tens of thousands of devices.

150k – 250kPalo Alto, CA +1DevOps / SREHybrid5+ YOEGoBGP

Software Engineer, Platform

Own infrastructure, CI/CD, and developer tooling for a fast-scaling AI-native ERP. Set technical direction for reliability, security, and API design in a hybrid NYC/SF environment.

150k – 270kNew York, NY +1DevOps / SREHybrid5+ YOEAWSCI/CD

Software Engineer, Enablement

Design, build, and operate AI-powered engineering tools and developer productivity platforms. Focus on AI pairing pipelines, automated workflows, and internal tooling to accelerate engineering velocity.

150k – 180kUnited StatesDevOps / SRERemote3+ YOEGoLLMs

Infrastructure Engineer

Flint is seeking an Infrastructure Engineer to own the systems powering their AI-generated pages at scale. This 0-to-1 role involves building production-grade cloud architecture, CI/CD, deployments, observability, and security, with a focus on managing parallel background agents.

150k – 250kSan Francisco, CADevOps / SREOn-siteAWSGCP

Infrastructure Engineer

Founding Infrastructure Engineer to architect and scale resilient systems for AI/ML workloads, implement monitoring/observability, and automate infrastructure. Requires 5+ years production experience, Python, Kubernetes, and strong reliability focus.

150k – 300kSan Francisco, CADevOps / SREOn-site5+ YOEPythonKubernetes