• Starts May 19

Data Science Intern

🚀 Summer Internship

New York

Summer Internship

Software EngineeringNew York


  • The central mission of Medidata’s Platform Data Sciences (PDS) team is to collaborate across numerous platform organizations to bring machine learning, artificial intelligence, and data science expertise to their solutions. Among those solutions, Medidata’s family of Regulated Content Management (RCM) solutions are centered around the storage and submission of electronic documents. Currently, the Platform Data Science team is partnering with the RCM team to introduce new machine learning capabilities to their solutions portfolio. Beginning with the Electronic Trial Master File (eTMF) solution, the current project aims to leverage machine learning and natural language processing to classify documents into distinct document types to enable automated filing of these documents under the eTMF file plan. The summer intern will partner with the data science team to aid in the development of the machine learning pipeline and evaluation of newly generated predictive models.


  • Bachelors/Masters/PhDs in Math, Statistics, Computer Science, Physics, Engineering, Bioinformatics, or another quantitative field with a strong foundation in statistical methodology and computation.
  • Experience with machine learning techniques (classification, deep learning, etc.)
  • Experience using Git version control
  • Experience or interest in NLP is a plus
  • Experience in a Linux environment, container is a plus

  • Ability to translate business challenges into data pipelines & model framework, owning and driving successful projects
  • Strong communication skills to articulate highly technical methods to diverse audiences to shape decision-making with a collaborative focus
  • Fluency in statistical tools and programming languages that allow you to be self-sufficient in handling data (e.g. Python, SQL, bash script)
  • Knowledge of machine learning and natural language processing and the ability to apply them to the project (tokenization, classification, embeddings, etc.)


Full time

May 19, 2024


New York