Logo of Huzzle


Research Intern Generative AI for Multimodal Interactions

🚀 Off-cycle Internship
💻 Remote
🤑 $63K - $166K

AI generated summary

  • You need a PhD in EE or CS with ML/AI focus, 1+ year experience in DL, AI frameworks, Python/C++, and Linux. Familiarity with Vision-Language Systems, NLP, Intel technologies, ROS and top-tier publications.
  • You will develop efficient multimodal architectures for real-time task guidance in manufacturing, improve language retrieval pipelines, and build systems for scalable deployment.

Off-cycle Internship

Data, Software Engineering


  • ISR is looking for an Intern passionate about building novel Multimodal Interaction Agents.


  • The candidate must be pursuing a PhD degree in Electrical Engineering, Computer Engineering, Electrical & Computer Engineering, Computer Science or in related field with ML/AI focus.
  • 1+ year experience in below areas:
  • Deep Learning, i.e. Computer Vision and/or Natural Language Processing
  • AI frameworks (e.g. PyTorch, TensorFlow)
  • Python and C++ languages, and Linux OS
  • Preferred Qualifications:
  • 1+ year experience in Vision-Language Systems and RAG pipelines.
  • Knowledge of novel paradigms beyond supervised learning, such as weakly supervised multimodal reasoning, Human in the loop approach, uncertainty, and anomaly detection
  • 1+ year experience with NLP techniques such as natural language understanding, semantic role labeling, information extraction
  • Familiarity with Intel OpenVino and RealSense technologies
  • Familiarity with ROS (Robotic Operating System)
  • Publications in top-tier conferences and journals in machine learning related fields (NeurIPS, ICML, AAAI, ACL, etc.)

Education requirements

Currently Studying

Area of Responsibilities

Software Engineering


  • This work will involve developing efficient multimodal architectures and pipelines for vision/language interaction systems for real-time task guidance support, in particular (but not limited to) manufacturing domain.
  • In this context, the intern will be able to explore methods to improve LLM-based Retrieval-based pipelines from complex metadata in documents (e.g., free form text, tables and figures), prompting approaches (e.g., ReAct, multimodal-react, thought-prompting etc.) and conversational Human/Assistant dialogues to generate responses that are factually correct and may involve multi-step reasoning.
  • You will be building and evaluating systems and proofs-of-concept that will allow for the deployment of scalable approaches in real settings. This research will enable validation of the ideas and the publication of findings in both internal and external venues.


Work type

Full time

Work mode


Application deadline

Jul 1, 2024


63000 - 166000 USD