ML Infrastructure Engineer

EngineeringSan FranciscoFull-time$200,000 - $280,000

About the Role

Join our ML Infrastructure team to build the systems that train, deploy, and serve our AI models at scale. You'll work at the intersection of machine learning and systems engineering.

What You Will Do

Build and maintain ML training pipelines and infrastructure
Design model serving systems for low-latency inference
Implement monitoring and observability for ML systems
Optimize GPU utilization and reduce training costs
Develop tools for model versioning and experiment tracking
Collaborate with researchers to productionize new models

What We Are Looking For

4+ years of experience in ML engineering or infrastructure
Strong proficiency in Python and experience with PyTorch or JAX
Experience with ML training frameworks and distributed training
Familiarity with model serving (TensorRT, ONNX, vLLM)
Experience with Kubernetes and container orchestration
Understanding of ML fundamentals and neural network architectures

Nice to Have

Experience with LLM fine-tuning and deployment
Background in systems programming (C++, Rust)
Experience with multi-GPU and multi-node training
Contributions to ML open-source projects

Benefits

Competitive salary and meaningful equity
Premium health, dental, and vision insurance
Unlimited PTO with encouraged minimum
Hybrid work with SF office access
$5,000 annual learning & development budget
Conference attendance and speaking opportunities

Apply for this position

By applying, you agree to our Privacy Policy

What You Will Do

Build and maintain ML training pipelines and infrastructure

Design model serving systems for low-latency inference

Implement monitoring and observability for ML systems

Optimize GPU utilization and reduce training costs

Develop tools for model versioning and experiment tracking

Collaborate with researchers to productionize new models

What We Are Looking For

4+ years of experience in ML engineering or infrastructure

Strong proficiency in Python and experience with PyTorch or JAX

Experience with ML training frameworks and distributed training

Familiarity with model serving (TensorRT, ONNX, vLLM)

Experience with Kubernetes and container orchestration

Understanding of ML fundamentals and neural network architectures