Applied AI Engineering
Machine Learning Engineer, Inference & Serving
New York, NYFull-time
Own the path from trained models to reliable, cost-efficient inference in production. You will help make the intelligence layer fast enough, robust enough, and affordable enough to support national-scale care delivery.
Responsibilities
- •Build and optimize model serving pipelines for low-latency medical product use cases
- •Improve routing, batching, caching, and model selection across heterogeneous workloads
- •Partner with research and infrastructure teams to operationalize new models and safety layers
- •Instrument serving systems for quality, cost, and uptime
- •Design production infrastructure that supports rapid iteration without compromising reliability
Requirements
- •Strong experience with production ML systems and inference infrastructure
- •Familiarity with GPU/accelerator workloads, model serving frameworks, and performance tuning
- •Excellent systems thinking and practical debugging ability
- •Experience optimizing real-time AI products under cost constraints
- •Comfort working across both research and production engineering domains
