Logo of Huzzle

Senior ML Software Engineer, Scaling & Performance Optimisation



10d ago

  • Job
    Senior (5-8 years)
  • Software Engineering
  • London
  • Quick Apply

AI generated summary

  • You need Linux, distributed training, Python/C++, ML frameworks, profiling skills, and a passion for efficiency. CUDA/XLA experience and GPU/TPU knowledge are a plus. Stay sharp with ML trends and take on challenging projects.
  • You will scale ML models on diverse hardware, optimize system performance, design distributed solutions, research deep learning literature for algorithmic improvements, and write high-quality code for performance breakthroughs.


  • Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques.
  • Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.)
  • Expertise with Python and/or C/C++
  • Development with machine learning frameworks (JAX, Tensorflow, PyTorch etc.)
  • Passion for profiling, identifying bottlenecks, and delivering efficient solutions.
  • Highly Desirable:
  • Track record of successfully scaling ML models.
  • Experience writing custom CUDA kernels or XLA operations.
  • Understanding of GPU/TPU architectures and their implications for efficient ML systems.
  • Fundamentals of modern Deep Learning
  • Actively following ML trends and a desire to push boundaries.
  • Example Projects:
  • Profile algorithm traces, identifying opportunities for custom XLA operations and CUDA kernel development.
  • Implement and apply SOTA architectures (MAMBA, Griffin, Hyena) to research and applied projects.
  • Adapt algorithms for large-scale distributed architectures across HPC clusters.
  • Employ memory-efficient techniques within models for increased parameter counts and longer context lengths.


  • Scaling Expertise: Design and implement strategies to efficiently scale machine learning models across diverse hardware platforms (GPU/TPU).
  • Performance Optimisation: Analyse and profile ML systems under heavy load, pinpointing bottlenecks, and implementing targeted optimisations.
  • Distributed Systems Architecture: Create robust distributed training and inference solutions for maximum computational efficiency.
  • Algorithmic Optimisation: Research and understand the latest deep learning literature to implement and optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance.
  • Low-Level Mastery: Write high-quality Python, C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs


What qualifications are required for the Senior ML Software Engineer position focusing on Scaling & Performance Optimisation?

We are looking for candidates with a strong background in Machine Learning, experience in large-scale ML development, system-level analysis skills, and a passion for performance optimisation. A solid understanding of algorithms, hardware performance tuning, and the ability to think creatively about scaling AI solutions are essential for this role.

What specific responsibilities will the Senior ML Software Engineer have in this position?

The Senior ML Software Engineer will be responsible for analysing and improving the performance of our AI solutions at a system level. This will involve working closely with hardware to optimise performance, diving deep into algorithm optimisation, and ensuring that our ML models are scalable and efficient.

How does this role differ from a traditional Machine Learning Engineer position?

This role focuses specifically on scaling and performance optimisation, meaning that the Senior ML Software Engineer will be tasked with making our ambitious AI solutions a practical reality. This involves diving deep into hardware performance tuning, system-level analysis, and algorithm optimisation to maximise the efficiency and scalability of our ML models.

Accelerate the transition to an AI-first world that benefits everyone

Founded Year

Mission & Purpose

InstaDeep is a leading global technology company offering a range of AI solutions, ranging from optimized pattern-recognition, GPU-accelerated insights, to self-learning decision making systems. - Decision-making systems: Life and business are all about decisions. InstaDeep harnesses the power of reinforcement learning to create systems that can make decisions on their own, based on their own autonomous training. Many fields can benefit greatly from this technology, be it robotics, mobility, logistics, finance or healthcare. - GPU-accelerated insights: When you try to deploy AI in your business, compute power is key. A Multi-GPU setup can be messy and complicated. With Nvidia’s DGX-1 (one of the most powerful AI machines on the market), InstaDeep can help you achieve insane computing power to solve even the most intensive AI problems. - Optimized Deep Learning: Deep Learning delivers high-performance AI for pattern recognition yet is notoriously time-consuming to fine-tune. InstaDeep boosts this process to save you time and money on your computer vision, natural language processing or predictive analytics project.