Skip to content

Thinking Machines Lab

AI research lab building customizable multimodal AI systems

San Francisco, CAOnsite51-200aiSeed
About

Thinking Machines Lab develops AI systems and platforms focused on multimodal models for research and production. They serve researchers, engineers, and enterprises by enabling customizable, collaborative AI through tools like Tinker for fine-tuning. Advances open science and efficiency in large-scale training and inference.

Tech stack
PythonPyTorchJAXRustKubernetesTensorFlowSparkRayTerraformLinuxvLLMTritonCUDAReactTypeScriptSwiftApache SparkKafkadbtAirflow
Browse by role
Open roles · 30
Thinking Machines Lab

Reliability Engineer, Supercomputing

Ensure reliability of large GPU supercomputing clusters by diagnosing hardware/firmware/OS issues, automating monitoring, driving firmware rollouts, and working directly with vendors.

350k – 475kSan Francisco, CADevOps / SREOn-siteBMCRust
Thinking Machines Lab

Network Engineer, Supercomputing

Own and debug multi-thousand-GPU network fabric (RDMA/RoCE, NVLink/NVSwitch) for large-scale AI training and inference. Requires backend language proficiency, large-scale cluster experience, and cross-stack ownership.

350k – 475kSan Francisco, CADevOps / SREOn-siteRustRDMA
Thinking Machines Lab

Reception & Workplace Experience Coordinator

Serve as the face of the San Francisco office, managing reception, guest experience, visitor access, and workplace hospitality for employees and visitors.

75k – 95kSan Francisco, CAOtherOn-site3+ YOEreceptionhospitality
Thinking Machines Lab

Assistant Controller

Own core accounting function and financial operations at an AI lab. Lead month-end close, financial reporting, internal controls, audit readiness, and build scalable accounting infrastructure.

325k – 400kSan Francisco, CAFinance & AccountingOn-site15+ YOECPAUS GAAP
Thinking Machines Lab

Associate General Counsel, Corporate & Commercial

Lead corporate legal function handling equity financings, governance, and commercial contracting for an AI research lab. Requires 10+ years experience in both corporate transactions and tech commercial law.

350k – 425kSan Francisco, CALegalOn-site10+ YOEM&AIP Licensing
Thinking Machines Lab

Compensation Partner

Designs and implements compensation frameworks, advises on pay decisions for offers/promotions/retention, builds equity strategy, and communicates comp transparently to build trust. Requires 7+ years in high-growth environments with strong POV on comp philosophy.

250k – 425kSan Francisco, CAFinance & AccountingOn-site7+ YOEpay analysisbenchmarking
Thinking Machines Lab

Software Engineer, Data Infrastructure

Builds and scales data infrastructure for distributed training pipelines, multimodal data catalogs, and petabyte-scale processing systems. Collaborates with researchers using distributed systems like Spark, Ray, Kafka, and cloud data architectures. Requires backend proficiency in Python/Rust and bachelor's in CS/engineering.

350k – 475kSan Francisco, CAData EngineeringOn-siteRaydbt
Thinking Machines Lab

Site Reliability Engineer (SRE)

Site Reliability Engineer drives end-to-end reliability for AI fine-tuning platform Tinker, including SLOs, monitoring, incident response, and multi-tenant GPU scheduling. Requires distributed systems experience, software proficiency for reliability, and production incident handling.

350k – 475kSan Francisco, CADevOps / SREOn-siteSLOsCI/CD
Thinking Machines Lab

Research, Vision Expertise

Conducts research on visual perception, multimodal learning, and large-scale AI model training. Designs architectures, builds datasets and evaluations, and collaborates on frontier models. Requires ML expertise, Python proficiency, and experimental rigor.

350k – 475kSan Francisco, CAML EngineeringOn-siteJAXPython
Thinking Machines Lab

Research Product Manager

Drives large-scale research products and programs in AI, coordinating cross-functional teams to translate technical ideas into scoped plans and integrate research into production systems. Requires CS/AI degree and experience in research or product management, thriving in technical, ambiguous environments.

175k – 475kSan Francisco, CAProduct ManagementOn-sitePre-TrainingMultimodality
Thinking Machines Lab

Research, Pre-Training Science

Conducts research on pre-training methodologies for large AI models, develops new architectures and data strategies, runs large-scale experiments, and publishes findings. Requires strong ML fundamentals, Python proficiency, and experience with deep learning frameworks.

350k – 475kSan Francisco, CAAI ResearchOn-siteJAXPython
Thinking Machines Lab

Research, Pre-Training Data

Designs and implements methods for sourcing, curating, and analyzing large-scale pre-training datasets for AI models, blending research with production-grade data engineering. Requires Python proficiency, deep learning frameworks, and strong ML fundamentals.

350k – 475kSan Francisco, CAML EngineeringOn-siteJAXPython
Thinking Machines Lab

Research, Post-Training Data

Conducts post-training research for AI models, designing data collection strategies, developing labeling pipelines, modeling human preferences, and iterating on evaluations to improve model alignment, reasoning, and helpfulness. Requires strong Python skills, ML framework proficiency, and experimental rigor.

350k – 475kSan Francisco, CAML EngineeringOn-siteJAXRLHF
Thinking Machines Lab

Research, Post-Training

Develops and tunes post-training recipes for AI models, iterates on evaluations, debugs configurations, scales methodologies, and publishes research to advance collaborative intelligence. Requires Python proficiency, deep learning frameworks, and strong ML fundamentals.

350k – 475kSan Francisco, CAML EngineeringOn-siteJAXRLHF
Thinking Machines Lab

Research Engineer, Infrastructure, Training Systems

Designs and optimizes distributed training systems scaling across thousands of GPUs for large AI models. Requires strong systems engineering, PyTorch/JAX expertise, and collaborative mindset to boost research productivity.

350k – 475kSan Francisco, CADevOps / SREOn-siteJAXXLA
Thinking Machines Lab

Forward Deployed Engineer, Tinker

Forward Deployed Engineer triages and resolves customer issues for AI fine-tuning platform Tinker, builds tools and automation, writes documentation, and influences product roadmap based on direct customer feedback. Requires experience with LLM fine-tuning, debugging distributed systems, and backend languages like Python or Rust.

350k – 475kSan Francisco, CASolutions ArchitectureOn-siteEntry levelGPURust
Thinking Machines Lab

Research Engineer, Infrastructure, RL Systems

Designs and optimizes infrastructure for scalable reinforcement learning training of large models, improving reliability, observability, and throughput. Collaborates with researchers to productionize RL algorithms; requires strong engineering skills and deep learning framework knowledge.

350k – 475kSan Francisco, CADevOps / SREOn-siteJAXPPO
Thinking Machines Lab

Research Engineer, Infrastructure, Numerics

Designs and optimizes distributed training infrastructure for large-scale LLMs, focusing on low-precision numerics, kernel optimizations, and communication frameworks to enable stable, scalable trillion-parameter model training. Requires strong systems engineering, deep learning frameworks knowledge, and collaborative research mindset.

350k – 475kSan Francisco, CADevOps / SREOn-siteJAXXLA
Thinking Machines Lab

Research Engineer, Infrastructure, Kernels

Designs and optimizes high-performance ML kernels (CUDA, CuTe, Triton) for large-scale LLM training, focusing on GPU efficiency, low-precision formats, and distributed compute. Collaborates with researchers to bridge algorithms and hardware.

350k – 475kSan Francisco, CADevOps / SREOn-siteJAXXLA
Thinking Machines Lab

Research Engineer, Infrastructure, Inference

Designs, optimizes, and scales infrastructure for high-performance AI model inference, focusing on latency, throughput, efficiency, and reliability. Collaborates with researchers to enable production deployment of large-scale models using deep learning frameworks and distributed systems.

350k – 475kSan Francisco, CADevOps / SREOn-siteJAXRay
Thinking Machines Lab

Research, Audio Expertise

Conducts research to advance audio capabilities in AI models, designing and training large-scale multimodal systems, building audio data pipelines, and publishing findings. Requires ML expertise, Python proficiency, and experience with deep learning frameworks.

350k – 475kSan Francisco, CAAI ResearchOn-siteJAXASR
Thinking Machines Lab

Infrastructure Engineer, Security

Owns and evolves security infrastructure across compute, storage, networking, and data platforms for foundation models. Architects secure patterns, manages identities/secrets, builds threat models, and automates security checks in Kubernetes/cloud environments. Requires strong systems programming and infra experience.

200k – 475kSan Francisco, CASecurity EngineeringOn-siteIAMRust
Thinking Machines Lab

HR Business Partner

Coaches managers on leadership, performance, and team dynamics while designing scalable people systems including compensation, career frameworks, and feedback processes for a high-growth AI research lab.

190k – 300kSan Francisco, CAPeople OpsOn-site5+ YOEEmployment LawFeedback Systems
Thinking Machines Lab

Engineering Manager

Leads a team of senior/staff engineers building scalable ML infrastructure and products, owning system design, reliability, and execution while contributing hands-on and hiring top talent. Requires 8+ years in production systems and 3+ years managing engineers.

400k – 500kSan Francisco, CAEngineering ManagementOn-site8+ YOEPyTorchsystem design
Thinking Machines Lab

Designer

Designs end-to-end AI product experiences, from concepts and prototypes to coded UI implementations, while collaborating on model training and contributing to design systems. Requires exceptional visual craft, coding ability to ship features, and interest in AI development.

350k – 475kSan Francisco, CAProduct DesignOn-siteReactSwift