Logo of Huzzle

Research Engineer Intern - Doubao (Seed) - Machine Learning System - 2025 Summer (MS)

image

ByteDance

2mo ago

  • Internship
    Full-time
    Summer Internship
  • Research & Development
  • San Jose

AI generated summary

  • You must be pursuing a MS in a technical discipline, skilled in machine learning algorithms and Python, with understanding of GPU/ASIC, and experience with distributed training frameworks, AI compiler stacks, and designing large scale systems.
  • You will research and develop machine learning systems, optimize system and AI algorithms, manage hardware including GPU and ASIC, and improve efficiency in large-scale distributed training jobs.

Requirements

  • Currently pursuing a MS in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
  • Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax.
  • Have basic understanding of how GPU and/or ASIC works.
  • Expert in at least one or two programmingf languages in Linux environment: C/C++, CUDA, Python.
  • Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.
  • Preferred Qualifications:
  • The following experiences will be a big plus:
  • GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs).
  • Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD.
  • AI compiler stacks such as torch.fx, XLA and MLIR.
  • Large scale data processing and parallel computing.
  • Experiences in designing and operating large scale systems in cloud computing or machine learning.
  • Experiences in in-depth CUDA programming and performance tuning (cutlass, triton).

Responsibilities

  • Research and develop our machine learning systems, including heterogeneous computing architecture, management, scheduling, and monitoring.
  • Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC).
  • Implement both general purpose training framework features and model specific optimizations (e.g. LLM, diffusions).
  • Improve efficiency and stability for extremely large scale distributed training jobs.

FAQs

What is the duration of the internship for the Research Engineer Intern - Doubao (Seed) - Machine Learning System - 2025 Summer (MS) position?

The internship runs for 12 weeks beginning in May/June 2025.

Technology
Industry
10,001+
Employees
2012
Founded Year

Mission & Purpose

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok. Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible. Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.