Skip to content

Member of Technical Staff (Software Engineer)

Develops and optimizes Kubernetes-based infrastructure for high-performance AI inference services, including deployment, scaling, debugging, and integration with ML workflows. Requires Master's in CS and 1+ year experience with Docker, Kubernetes, Python, and related tools.

170k – 175kSunnyvale, CADevOps / SRERemote

About the role

Job Duties

  • Implement infrastructure to support high-performance, low-latency inference service.
  • Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.
  • Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.
  • Integrate inference services with containerized environments using Docker and Kubernetes for orchestration.
  • Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies.
  • Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
  • Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements.
  • Triage and resolve defects in the service by analyzing logs, metrics, and distributed traces.
  • Debug issues related to model deployment, container orchestration, or networking configurations, documenting steps to reproduce and root-cause defects.
  • Collaborate with cross-functional teams to address performance regressions, scalability issues, or integration failures in the inference pipeline.
  • Develop automated scripts to detect and mitigate common failure modes, improving system reliability.
  • Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers.
  • Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging.
  • Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability.
  • Participate in release planning and prioritization discussions to align infrastructure development with customer needs and business objectives.

Minimum Requirements

Master’s degree or foreign equivalent in Computer Science or related field and 1 year of experience as Software Developer, Student/Intern (Software Developer), Member of Technical Staff (Software Engineer), Software Engineer, or related occupation.

Required Skills:

  • Docker and Kubernetes
  • Java or C++
  • ActiveMQ and Kafka
  • Python or Groovy
  • JavaScript or TypeScript
  • Linux
  • SQL, OracleDB, and Redis
  • Git

Salary Range

$169,600 - $175,000 per year

Skills

KubernetesDockerPythonJavaC++KafkaActivemqLinuxRedisSQLGitTypeScriptJavaScriptGroovyOracledb

Similar roles

DevOps / SRE jobs

Staff Systems Engineer, Automations & Integrations

Hands-on technical leader designing enterprise automations, AI workflows, and integrations using Workato and Glean across business systems like Salesforce and NetSuite. Requires 8+ years experience, deep expertise in automation platforms, enterprise tools, and scripting; mentors peers in individual contributor role.

170k – 190kSan Francisco, CADevOps / SREHybrid8+ YOESQLETL

Senior Staff Network Architect (R4843)

Leads design, implementation, and optimization of complex network infrastructures with expertise in cybersecurity, hardware, data centers, and cloud/hybrid environments. Requires 10+ years experience, deep protocol knowledge, and hands-on enterprise networking skills.

170k – 250kDallas, TXDevOps / SREOn-site10+ YOEDNSAWS

Senior Staff Storage Systems Administrator

Leads architecture, operation, and vendor strategy for petabyte-scale storage systems optimized for AI/HPC workloads in sustainable cloud infrastructure. Requires 10+ years experience with enterprise storage, scripting, and RFP/vendor management.

170k – 215kSan Francisco, CADevOps / SREOn-site10+ YOEGoBash

Staff Modular Data Center Engineer

Leads electrical design, optimization, and roadmap for prefabricated modular AI data centers (Crusoe Spark). Requires 5+ years in modular electrical systems, power distribution for AI compute, and cross-functional collaboration. In-office role in Denver with 10-20% travel.

168k – 192kDenver, CODevOps / SREOn-site5+ YOEAutocadProduct Roadmap

Staff Platform Engineer

Staff Platform Engineer building and maintaining scalable infrastructure using Kubernetes and Terraform. Requires 7+ years experience, strong GitOps expertise, and the ability to mentor teams and drive platform initiatives.

167k – 190kNew York, NYDevOps / SREHybrid7+ YOEAWSFlux