Research
Research Engineer, Evals & Medical Safety
New York, NYFull-time
Own the evaluation backbone for the lab. You will build the measurement systems that tell us whether our medical models are actually getting safer, more reliable, and more clinically useful over time.
Responsibilities
- •Design automated and human-in-the-loop evals for diagnosis support, triage, escalation, and treatment guidance
- •Build benchmark sets from real medical workflows in partnership with clinicians and operations teams
- •Measure hallucination, omission, overconfidence, safety regressions, and edge-case failures
- •Develop dashboards and release gates for model quality across research and deployment environments
- •Work across research, security, and clinical teams to define operating thresholds for real-world use
Requirements
- •Strong engineering and data skills with experience in evaluation systems or QA for ML products
- •Excellent Python and SQL proficiency
- •Ability to reason clearly about safety, reliability, and statistical validity
- •Experience with human review workflows, dataset curation, or model assessment is preferred
- •Comfort working in a high-ownership role that influences every model launch
