Person using DeepSeek's large language model

Many people were unfamiliar with the Chinese artificial intelligence company DeepSeek – that is until this week’s news cycle. The company, which develops open-source large language models (LLMs) introduced the R1 model that delivers generative AI capabilities with comparable speed and accuracy to industry-leading models like OpenAI’s o-1 and Google Gemini’s 2.0, but at a fraction of the cost and far less compute.

To put it in perspective, DeepSeek reportedly spent $5.6 million over three months to develop R1. Meanwhile, United States firms have typically spent half a billion dollars developing similar models. The cost implications have certainly captured everyone’s attention, sparking debates about what this means for the future of AI. 

Only a few days after this initial announcement, DeepSeek is now being accused by OpenAI and Microsoft of using a technique called distillation, where they ask a model like OpenAI GPT and Meta Llama billions of questions and their answers are used to fine-tune the DeepSeek LLM. This approach could violate the terms of service of the LLMs they used.

DeepSeek did come up with a number of new innovative approaches:

DeepSeek activates the necessary neural networks for specific tasks, enabling its model to operate efficiently by engaging a subset of its parameters during task execution.

8-bit Computation Optimization

Most AI models use 32-bit floating-point (FP32) precision.

DeepSeek reduces memory usage by using 8-bit precision for most computations while selectively using FP32 only when necessary, resulting in massive reductions in compute and memory requirements without major accuracy loss.

Multi-Token Prediction (Parallel Token Generation)

Traditional models predict one token at a time. DeepSeek predicts multiple tokens at once, increasing inference speed while maintaining 85-90% accuracy, resulting in faster responses and lower inference costs.

Multi-Head Latent Attention (MLA) – A New Compression Method

DeepSeek compresses token data before storing it in memory.

The model trains on the compressed values, eliminating unnecessary data while keeping essential information, resulting in fewer GPUs due to lower memory demands.

At the initial launch of their news, we’ve seen the markets respond with panic as Nvidia – a technology company that makes AI chips, systems and software – lost about 17% (close to $593 billion in market value) in one day. Political leaders have also weighed in, highlighting the geopolitical significance of this development, particularly in the context of US-China relations.

What does this really mean when you cut through all the noise?

  1. We’ll see organizations like OpenAI enforce their terms of services and protect their IP.
  2. LLM vendors will learn from DeepSeek’s approach and could incorporate some of these approaches in how they build their models. Looking ahead, we’re likely to see smaller, domain-specific LLMs that organizations can deploy in their own private clouds.
  3. Decreasing costs, heightened accuracy, and flexible deployment options are rapidly expanding the reach of AI. As models become cheaper to develop and easier to host, even on modest hardware, businesses and individuals can more readily integrate AI into their everyday workflows. Consumers who have never interacted with conversational AI will soon find it woven into their online experiences, raising expectations across industries.
  4. LLM vendors will remain competitive. Companies developing applications on top of these models will greatly benefit as LLMs continue to improve, becoming faster and more affordable.

If you have any questions about this news or want to learn more about how we incorporate large language models into our platform, get in touch.