Logo of Huzzle


• Starts May 19

Data Science Intern

🚀 Summer Internship

New York

AI generated summary

  • Candidates must have a degree in a quantitative field, strong statistical and computation skills, experience with machine learning, Git, and ideally NLP, as well as familiarity with Linux and containers.
  • The Data Science intern at Medidata Solutions will be responsible for transforming business challenges into data pipelines and model frameworks, utilizing statistical tools and programming languages like Python and SQL to handle data independently, and applying machine learning and natural language processing techniques to drive successful projects and shape decision-making through effective communication.

Summer Internship

Software EngineeringNew York


  • The central mission of Medidata’s Platform Data Sciences (PDS) team is to collaborate across numerous platform organizations to bring machine learning, artificial intelligence, and data science expertise to their solutions. Among those solutions, Medidata’s family of Regulated Content Management (RCM) solutions are centered around the storage and submission of electronic documents. Currently, the Platform Data Science team is partnering with the RCM team to introduce new machine learning capabilities to their solutions portfolio. Beginning with the Electronic Trial Master File (eTMF) solution, the current project aims to leverage machine learning and natural language processing to classify documents into distinct document types to enable automated filing of these documents under the eTMF file plan. The summer intern will partner with the data science team to aid in the development of the machine learning pipeline and evaluation of newly generated predictive models.


  • Bachelors/Masters/PhDs in Math, Statistics, Computer Science, Physics, Engineering, Bioinformatics, or another quantitative field with a strong foundation in statistical methodology and computation.
  • Experience with machine learning techniques (classification, deep learning, etc.)
  • Experience using Git version control
  • Experience or interest in NLP is a plus
  • Experience in a Linux environment, container is a plus

Education requirements


Area of Responsibilities

Software Engineering


  • Ability to translate business challenges into data pipelines & model framework, owning and driving successful projects
  • Strong communication skills to articulate highly technical methods to diverse audiences to shape decision-making with a collaborative focus
  • Fluency in statistical tools and programming languages that allow you to be self-sufficient in handling data (e.g. Python, SQL, bash script)
  • Knowledge of machine learning and natural language processing and the ability to apply them to the project (tokenization, classification, embeddings, etc.)


Work type

Full time

Work mode


Start date

May 19, 2024


New York