# Sr. Machine Learning Engineer
**Company:** [Illumio](https://hotfix.jobs/companies/illumio)
**Location:** Sunnyvale, CA
**Salary:** $191K-$220K
**Experience:** 5+ years
**Skills:** Java, Python, Go, Apache Kafka, Spark, Apache Flink, Kubernetes, Autogen, Crewai, RAG, Langfuse, Terraform, AWS, Azure, GCP
**Posted:** 2026-04-27
> Architects high-scale distributed systems and agentic AI for cybersecurity platform, processing massive data with Kafka, Spark/Flink, and Kubernetes. Requires 5-8 years backend experience in Java/Python/Go and expertise in agentic frameworks, RAG, and cloud infrastructure.
## Job Description
## Your Impact

- **Asynchronous Systems**: Architect and optimize high-throughput, event-driven systems using Apache Kafka to handle real-time data flows.
- **Data Processing at Scale**: Build and maintain large-scale data pipelines using Apache Spark or Flink to provide the high-volume analytics that power our AI.
- **Agentic Systems at Scale**: Design sophisticated AI Agents capable of autonomous planning, memory management, and high-reliability tool-use across distributed environments.
- **Infrastructure & Orchestration**: Lead the architectural design of containerized services on Kubernetes, ensuring high availability and scalability across Cloud Infrastructure (AWS/Azure/GCP).

## Your Toolkit

- 5–8 years of experience in backend engineering using **Java**, **Python**, or **Go**.
- Expertise in distributed systems, asynchronous architectures (**Kafka**), and large-scale data processing (**Spark/Flink**).
- Hands on experience with agentic frameworks (e.g., **AutoGen**, **CrewAI**, or custom orchestration layers), **RAG**, **MCP**, fine tuning models and prompt engineering.
- Agentic observability using **Langfuse**, Evals frameworks for Testing/Resilience.

**Bonus Points**:
- Advanced IaC: Expertise in building reusable **Terraform** modules and managing complex multi-region cloud deployments.
- Vector DB Optimization: Deep experience in indexing strategies (HNSW vs IVF) and performance tuning for high-concurrency vector databases at scale.
- AI Ops: Experience with LLM deployment optimization (e.g., **vLLM**, **TensorRT-LLM**) or managing proprietary model inference endpoints.
**Apply:** https://hotfix.jobs/jobs/sr-machine-learning-engineer-at-illumio-1c2c00de-a45e-4eb5-b53e-458fdf68641c
**Canonical:** https://hotfix.jobs/jobs/sr-machine-learning-engineer-at-illumio-1c2c00de-a45e-4eb5-b53e-458fdf68641c