Platform Engineer, Model Shaping

200k – 290kSan Francisco, CADevOps / SREHybrid3+ YOEJun 23

Summary

Build and operate backend services and infrastructure for model customization and evaluation at Together AI. Requires 3+ years building production infrastructure, strong Python/Go skills, and deep experience with Kubernetes, Linux, and cloud platforms.

About the role

Responsibilities

Design and build Together’s systems and infrastructure for model customization, including user-facing features and internal improvements
Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response
Create and improve internal tooling for deployment, continuous integration, and observability
Build a job orchestration platform spanning multiple datacenters, supporting a highly heterogeneous hardware landscape
Partner with teams developing internal services, co-designing these services and incorporating them in systems built within Together

Requirements

3+ years of experience in building infrastructure or backend components of production services
Extensive experience designing, operating, and troubleshooting production Linux environments and Kubernetes-based platforms
Strong software engineering background in Python or Go
Experienced with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
Cloud environment (e.g., AWS/GCP/Azure) administration experience, preferably with a hybrid bare-metal/cloud environment
Strong communication skills, be willing to document systems and processes and collaborate with peers of varying technical expertise
Comfortable operating across the stack, from cluster operations and infrastructure automation to backend service development

Nice-to-Haves

Developing large-scale production systems with high reliability requirements
Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte)
Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA’s networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA)
Deployment of services for AI training or inference
Networking fundamentals, including TCP/IP, DNS, routing, load balancing, TLS, and network debugging tools
Maintaining or contributing to open-source projects

Compensation & Benefits

Competitive compensation, startup equity, health insurance, and other benefits
Flexibility in terms of remote work
US base salary range: $200,000 - $290,000

Skills

PythonGoKubernetesTerraformAnsiblePrometheusGrafanaGitHub ActionsArgoCDAWSGCPAzureLinux

Similar roles at this salary range

All DevOps / SRE jobs →

Render

Jun 24

Software Engineer, Dev Velocity

Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.

170k – 290kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD

Airbnb

Jun 24

Senior Software Engineer, Dev Tools

Senior engineer building and operating cloud dev environments, Kubernetes platforms, and tooling for engineers and AI agents at Airbnb. Requires 5-9+ years building high-scale distributed systems on AWS.

196k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Pave

Jun 24

Senior Software Engineer - Developer Platform

Senior engineer building and scaling internal developer platforms with strong focus on AI tooling, reliability, and developer experience. Requires 4+ years in backend/infrastructure and proven project leadership.

196k – 265kSan Francisco, CA +1DevOps / SREHybrid4+ YOEGCPNode.js

Decagon

Jun 23

Senior Software Engineer, Platform Engineering

Senior Software Engineer building and evolving an internal developer platform including CI/CD, observability, and tooling to improve developer productivity and reliability. Requires 4+ years of production experience in platform/devtools/infrastructure.

200k – 400kNew York, NYDevOps / SREOn-site4+ YOECI/CDPython

Decagon

Jun 23

Senior Software Engineer, Platform Engineering

Senior Software Engineer building and evolving an internal developer platform including CI/CD, observability, and tooling to improve engineer productivity at a conversational AI company.

200k – 400kSan Francisco, CADevOps / SREOn-site4+ YOECI/CDPython

Apply