Skip to content

Platform Engineer, Model Shaping

200k – 290kSan Francisco, CADevOps / SREHybrid3+ YOE
Summary

Build and operate backend services and infrastructure for model customization and evaluation at Together AI. Requires 3+ years building production infrastructure, strong Python/Go skills, and deep experience with Kubernetes, Linux, and cloud platforms.

About the role

Responsibilities

  • Design and build Together’s systems and infrastructure for model customization, including user-facing features and internal improvements
  • Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response
  • Create and improve internal tooling for deployment, continuous integration, and observability
  • Build a job orchestration platform spanning multiple datacenters, supporting a highly heterogeneous hardware landscape
  • Partner with teams developing internal services, co-designing these services and incorporating them in systems built within Together

Requirements

  • 3+ years of experience in building infrastructure or backend components of production services
  • Extensive experience designing, operating, and troubleshooting production Linux environments and Kubernetes-based platforms
  • Strong software engineering background in Python or Go
  • Experienced with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
  • Cloud environment (e.g., AWS/GCP/Azure) administration experience, preferably with a hybrid bare-metal/cloud environment
  • Strong communication skills, be willing to document systems and processes and collaborate with peers of varying technical expertise
  • Comfortable operating across the stack, from cluster operations and infrastructure automation to backend service development

Nice-to-Haves

  • Developing large-scale production systems with high reliability requirements
  • Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte)
  • Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA’s networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA)
  • Deployment of services for AI training or inference
  • Networking fundamentals, including TCP/IP, DNS, routing, load balancing, TLS, and network debugging tools
  • Maintaining or contributing to open-source projects

Compensation & Benefits

  • Competitive compensation, startup equity, health insurance, and other benefits
  • Flexibility in terms of remote work
  • US base salary range: $200,000 - $290,000
Skills
PythonGoKubernetesTerraformAnsiblePrometheusGrafanaGitHub ActionsArgoCDAWSGCPAzureLinux
Similar roles at this salary range
All DevOps / SRE jobs →
Render

Software Engineer, Dev Velocity

Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.

170k – 290kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD
Airbnb

Senior Software Engineer, Dev Tools

Senior engineer building and operating cloud dev environments, Kubernetes platforms, and tooling for engineers and AI agents at Airbnb. Requires 5-9+ years building high-scale distributed systems on AWS.

196k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS
Pave

Senior Software Engineer - Developer Platform

Senior engineer building and scaling internal developer platforms with strong focus on AI tooling, reliability, and developer experience. Requires 4+ years in backend/infrastructure and proven project leadership.

196k – 265kSan Francisco, CA +1DevOps / SREHybrid4+ YOEGCPNode.js
Decagon

Senior Software Engineer, Platform Engineering

Senior Software Engineer building and evolving an internal developer platform including CI/CD, observability, and tooling to improve developer productivity and reliability. Requires 4+ years of production experience in platform/devtools/infrastructure.

200k – 400kNew York, NYDevOps / SREOn-site4+ YOECI/CDPython
Decagon

Senior Software Engineer, Platform Engineering

Senior Software Engineer building and evolving an internal developer platform including CI/CD, observability, and tooling to improve engineer productivity at a conversational AI company.

200k – 400kSan Francisco, CADevOps / SREOn-site4+ YOECI/CDPython