Logo of Huzzle

Member of Technical Staff, Model Efficiency

image

Cohere

15d ago

  • Job
    Full-time
    Mid Level
  • Data
  • Toronto, +3
  • Quick Apply

AI generated summary

  • You need significant experience in high-performance ML algorithms, large language models, and a drive to solve challenging research problems. Experience in model compression, GPU programming, LLM inference, and ML framework internals is a big plus.
  • You will optimize Large Language Models for faster inference by improving model architecture and ML frameworks, tackling bottlenecks with innovative solutions.

Requirements

  • Significant experience in developing high-performance machine learning algorithms or machine learning infrastructure
  • Hands-on experience with large language models
  • Bias for actions and results
  • An appetite to solve challenging machine learning research problems
  • It is a big plus if you also have considerable experience with one of these areas:
  • Model compression techniques: quantization, pruning, sparsity, low-rank compression, knowledge distillation, etc.
  • GPU/Accelerator programming or high-performance computing
  • LLM Inference performance modeling
  • Machine learning framework internals

Responsibilities

  • Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment. The model efficiency team is responsible for increasing the inference efficiency of our foundation models by improving model architecture and optimizing ML frameworks.
  • As an engineer on this team, you’ll work on improving the key model serving metrics including latency and throughput by profiling the system, identifying bottlenecks, and solving problems with innovative solutions.

FAQs

What is the focus of the Model Efficiency team?

The Model Efficiency team is focused on increasing the inference efficiency of large language models by improving model architecture and optimizing ML frameworks.

Where are the offices located for this position?

Our offices are located in Toronto, San Francisco, New York, and London. We also embrace a remote-friendly environment, strategically distributing teams based on interests, expertise, and time zones for collaboration and flexibility.

What qualifications are needed to be a good fit for the Model Efficiency team?

To be a good fit for the Model Efficiency team, significant experience in developing high-performance machine learning algorithms or infrastructure, hands-on experience with large language models, bias for actions and results, and an appetite for solving challenging machine learning research problems are required.

What areas of expertise are considered a big plus for this position?

Considerable experience with model compression techniques, GPU/Accelerator programming, LLM inference performance modeling, and machine learning framework internals are considered a big plus for this position.

At Cohere, our mission is to build machines that understand the world, and to make them safely accessible to all.

Technology
Industry
51-200
Employees
2019
Founded Year

Mission & Purpose

Cohere provides unprecedented access to affordable, easy-to-deploy large language models. Our platform gives computers the ability to read and write - whether you want to better understand what your customers are saying, or you want to write compelling copy that speaks to your target audience, Cohere can help.