
A large language model (LLM) is an AI computing algorithm created for language recognition and generation.
It is an artificial neural network composed of:
- Nodes, which are the computational units or "artificial neurons" within each layer of the neural network. These are connected by edges, which model the synapses in the brain. Nodes perform calculations based on input data and pass their outputs to other nodes in subsequent layers. Each node processes information using weighted parameters to detect patterns in the data in order to generate sophisticated language patterns.
- Parameters, which are adjustable weights learned during training, analogous to "synaptic connections." They define how the model processes input and generates output, essentially encoding language patterns and meanings.
- Tokens, which are chunks of text (like words or parts of words) the model processes. For example, "running" might be split into two tokens: "run" and "ing." A token averages to about 4 characters or one word, but can vary depending on technicalities or formatting. A context window has a token count, which is short-term memory for a chat. A one-million token context window can demand up to 100G of RAM in the cloud.
ChatGPT uses a GPT (Generative Pre-trained Transformer) model. It is "pre-trained" on large amounts of text data, learning patterns in language. When given a prompt, the transformer architecture transforms input text into meaningful outputs by focusing on patterns and relationships within the text. It then generates relevant and coherent text based on the prompt input.
Gemini used the Pathways Language Model (PaLM) and now used Gemini. These models are transformer-based, like GPT. They are developed by Google DeepMind, optimized for multitask learning and scaling across multiple GPUs or TPUs.
Anthropic uses Claude which is GPT based. It is designed with "Constitutional AI" to prioritize safety, alignment, and ethical use, aiming to reduce harmful outputs.
Meta developed its own language model LLaMA (Large Language Model Meta AI) which is a transformer-based model.
Ernie Bot uses ERNIE (Enhanced Representation through Knowledge Integration).
List of competing LLMs[]
Name | Model | Parameters | Context window (token count) | Hardware |
---|---|---|---|---|
ChatGPT from OpenAI | GPT |
|
GPT-4: 128,000 | Microsoft Azure supercomputing infrastructure powered by Nvidia A100 and H100 GPUs |
Claude from Anthropic | Claude |
|
Claude 3 Opus: 200,000 (up to 1,000,000) | Nvidia A100 or H100 GPUs and TPUs |
Copilot from Microsoft | GPT | Same as GPT |
|
Microsoft Azure cloud infrastructure powered by Nvidia A100 GPUs |
Deepseek from China | GPT | 671 billion (37 billion activated per token) | 128,000 | H800 clusters. The H800 cards within a cluster are connected by NVLink, and the clusters are connected by InfiniBand. Between 10,000 and 50,000 Nvidia A100 GPUs. |
ERNIE from Baidu | ERNIE |
|
128,000 | Undisclosed. |
Gemini from Google | Gemini |
|
|
TPUs (Tensor Processing Units) which are built by Google who have the advantage of the best training data, with access to Google Search, images, videos in Youtube, and geospatial data in Maps. |
Grok from X.ai | GPT | Grok 2: 100-300 billion+
Grok 3: 2.7 trillion, trained on 12.8 trillion tokens |
Grok 3: 128,000 | xAI Colossus supercomputer, with 200,000 Nvidia H200 GPUs |
Meta from Meta | LLaMA |
|
|
|
LLMs can write and debug computer programs, mimic people, compose music, plays, stories and student essays, answer test questions, write poetry and song lyrics, translate and summarize text, and be taught to play games. They may be precursors to an Artificial General Intelligence and an existential threat.
See also[]
News[]
- 6 Jan 2025 - Nvidia's GB10 superchip debuts as world’s smallest AI supercomputer capable of running up to 405B-parameter models