Applied AI Engineering

Machine Learning Engineer, Inference & Serving

New York, NYFull-time

Own the path from trained models to reliable, cost-efficient inference in production. You will help make the intelligence layer fast enough, robust enough, and affordable enough to support national-scale care delivery.

Responsibilities

•Build and optimize model serving pipelines for low-latency medical product use cases
•Improve routing, batching, caching, and model selection across heterogeneous workloads
•Partner with research and infrastructure teams to operationalize new models and safety layers
•Instrument serving systems for quality, cost, and uptime
•Design production infrastructure that supports rapid iteration without compromising reliability

Requirements

•Strong experience with production ML systems and inference infrastructure
•Familiarity with GPU/accelerator workloads, model serving frameworks, and performance tuning
•Excellent systems thinking and practical debugging ability
•Experience optimizing real-time AI products under cost constraints
•Comfort working across both research and production engineering domains

Apply for this role