Interhuman AI is building the next generation of social intelligence infrastructure—multimodal AI systems that understand not just what humans say, but how they say it. We're developing models that interpret behavioral signals like hesitation, engagement, confusion, and interest across voice, facial expressions, body language, and natural language - in real time. We're looking for an AI Scientist to join our core team and lead the development of models that capture the nuances of human communication in real time.
Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation.
Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment.
Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss.
Stay at the frontier of multimodal research and translate relevant advances into our production stack.
Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements.
PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP.
Track record of building and shipping models—publications are great, but we care equally about what you've built.
Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow).
Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models).
You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input.
Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders.
Competitive salary and meaningful equity in an early-stage, venture-backed company.
Direct influence on technical direction—your work shapes the product, not just a feature.
A small, focused team where your contributions are visible and impactful from day one.
Flexibility on location and working arrangements.
At Interhuman AI, we're pioneering multimodal AI that reads the full bandwidth of human communication - facial expressions, vocal tone, body language, and words - to interpret social signals in real time. We're building infrastructure for AI interactions that feel adaptive, emotionally aware, and genuinely human.
We're a small, focused team backed by top investors, with a working MVP and a vision to become foundational infrastructure for the next era of conversational AI.
If you want to do work that matters, at the edge of what's possible, we'd love to hear from you.
This job comes with several perks and benefits
