DeepSeek's new AI model appears to be one of the best "open" contenders yet

DeepSeek’s new AI model appears to be one of the best “open” contenders yet

A Chinese laboratory has created what appears to be one of the most powerful “open” AI models to date.

model, Deep Sec V3It was developed by artificial intelligence firm DeepSeek and released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial ones.

DeepSeek V3 can handle a range of text-based workloads and tasks, such as programming, translation, article writing, and emails, with a descriptive prompt.

According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms available downloadable “open” models and “closed” AI models that can only be accessed through an API. In a subset of programming contests hosted on Codeforces, a platform for programming contests, DeepSeek outperforms other models, including Meta’s Llama 3.1 405 b,OpenAI GPT-4oAnd Alibaba Qwen 2.5 72B.

DeepSeek V3 also crushes the competition on Aider Polyglot, a test designed to measure, among other things, whether a model can successfully write new code that integrates with existing code.

Deep Sec-V3!

60 codes per second (3 times faster than V2!)
API compatibility is intact
Completely open source templates and papers
671B Ministry of Education teachers
37B Anabolic parameters
Training was done on 14.8T high-quality codes

It outperforms the Llama 3.1 405b in almost every parameter https://t.co/OiHu17hBSI pic.twitter.com/jVwJU07dqf

– Chubby ♨️ (@kimonismus) December 26, 2024

DeepSeek claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent pieces of raw data — one million tokens equal about 750,000 words.

It’s not just a huge training set. DeepSeek V3 is massive: 671 billion parameters, or 685 billion on the Hugging Face AI development platform. (Parameters are models of internal variables that you use to make predictions or decisions.) This is about 1.6 times the size of the Llama 3.1 405B, which has 405 billion parameters.

DeepSeek (a Chinese AI company) makes it look easy today by releasing the open weights for a frontier-class MBA (LLM) trained on a limited budget (2,048 GPUs for two months, $6 million).

For reference, this level of power should require clusters closer to 16,000 GPUs, which… https://t.co/EW7q2pQ94B

– Andrei Karpathy (@karpathy) December 26, 2024

The number of parameters is often (but not always) related to skill; Models with more parameters tend to outperform models with fewer parameters. But larger models also require more powerful hardware to operate. The unoptimized version of DeepSeek V3 will need a bank of high-end GPUs to answer questions at reasonable speeds.

Although it’s not the most practical model, the DeepSeek V3 is an achievement in some ways. DeepSeek was able to train the model using a data center of Nvidia H800 GPUs in just about two months, which are the graphics processing units recently developed by Chinese companies. restricted By the US Department of Commerce of purchase. The company also claims to have spent $5.5 million just to train DeepSeek V3, a software part From the cost of developing models like OpenAI’s GPT-4.

The downside is that the model’s political views are a bit volatile. Ask DeepSeek V3 about Tiananmen Square, for example, and it won’t answer.

Deep Sec V3 — **Image credits:**Any chat

DeepSeek, being a Chinese company, is subject to… Performance measurement By China’s Internet Regulatory Commission to ensure that its models’ responses “embody core socialist values.” a lot Chinese artificial intelligence systems decrease To respond to topics that may raise regulatory concerns, such as speculation about… Xi Jinping order.

DeepSeek, which in late November unveil DeepSeek-R1, the answer to OpenAI’s “inference” modelIt’s a strange organization. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses artificial intelligence to inform its trading decisions.

High-Flyer builds its own server clusters for model training, and this is one of the newest It is said It has 10,000 Nvidia A100 GPUs and costs 1 billion yen (about $138 million). Founded by computer science graduate Liang Wenfeng, High-Flyer aims to achieve “super intelligent” AI through its DeepSeek foundation.

in interview Earlier this year, Wenfeng described closed-source AI, such as OpenAI, as a “temporary” moat. “[It] He noted that this did not prevent others from catching up.

actually.

TechCrunch has an AI-focused newsletter! Register here Get it in your inbox every Wednesday.

DeepSeek’s new AI model appears to be one of the best “open” contenders yet

Leave a Comment Cancel Reply

Sign up for our newsletters

Must Read

Leave a Comment Cancel Reply

Sign up for our newsletters