Member of Technical Staff, Model Efficiency

Cohere

Jul 1, 2024

Applications are closed

Job
Full-time
Mid Level
Data
Toronto, +3

Requirements

Significant experience in developing high-performance machine learning algorithms or machine learning infrastructure
Hands-on experience with large language models
Bias for actions and results
An appetite to solve challenging machine learning research problems
It is a big plus if you also have considerable experience with one of these areas:
Model compression techniques: quantization, pruning, sparsity, low-rank compression, knowledge distillation, etc.
GPU/Accelerator programming or high-performance computing
LLM Inference performance modeling
Machine learning framework internals

Responsibilities

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment. The model efficiency team is responsible for increasing the inference efficiency of our foundation models by improving model architecture and optimizing ML frameworks.
As an engineer on this team, you’ll work on improving the key model serving metrics including latency and throughput by profiling the system, identifying bottlenecks, and solving problems with innovative solutions.

FAQs

What is the focus of the Model Efficiency team?

The Model Efficiency team is focused on increasing the inference efficiency of large language models by improving model architecture and optimizing ML frameworks.

Where are the offices located for this position?

Our offices are located in Toronto, San Francisco, New York, and London. We also embrace a remote-friendly environment, strategically distributing teams based on interests, expertise, and time zones for collaboration and flexibility.

What qualifications are needed to be a good fit for the Model Efficiency team?

To be a good fit for the Model Efficiency team, significant experience in developing high-performance machine learning algorithms or infrastructure, hands-on experience with large language models, bias for actions and results, and an appetite for solving challenging machine learning research problems are required.

What areas of expertise are considered a big plus for this position?

Considerable experience with model compression techniques, GPU/Accelerator programming, LLM inference performance modeling, and machine learning framework internals are considered a big plus for this position.

Cohere

At Cohere, our mission is to build machines that understand the world, and to make them safely accessible to all.

Technology

Industry

51-200

Employees

2019

Founded Year

Mission & Purpose

Cohere provides unprecedented access to affordable, easy-to-deploy large language models. Our platform gives computers the ability to read and write - whether you want to better understand what your customers are saying, or you want to write compelling copy that speaks to your target audience, Cohere can help.

OpportunitiesView all

Software Engineer Intern/Co-op (Fall 2025)

Internship

London