Large Language Models: From Architecture to Hallucination
Source PublicationScientific Publication
Primary AuthorsGreen A.

Imagine a vast computational engine designed for a single purpose: continuation. This system has processed immense datasets, governed by specific scaling laws that dictate its growth and capability. When you provide a prompt, the system does not 'know' the answer in a human sense. It calculates probabilities.
If you input a sequence, the system scans its parameters to predict the most likely continuation. It operates as a probabilistic machine, generating text based on patterns learned during its creation. This, in essence, is the mechanism behind Large Language Models.
How Large Language Models Process Context
Older models read text linearly, often struggling to retain information over long passages. Modern systems rely on the Transformer architecture. This allows the model to process architectural choices that handle complex dependencies.
If a sentence is long and complex, the Transformer enables the model to weigh the importance of different elements simultaneously, rather than sequentially. By examining how these architectural choices function, we see how the model maintains context across vast inputs, allowing it to scale effectively without losing the thread of the data.
Training and Instruction Tuning
Raw models merely predict text based on statistical likelihood. To make them useful tools, they require specific training paradigms. The source highlights Reinforcement Learning from Human Feedback (RLHF) and instruction tuning as key methods.
In this phase, the model is guided to align with human preferences. If the model outputs text, human feedback or derived signals help steer it. Emerging approaches like Direct Preference Optimisation (DPO) further refine this process, optimising the model to prefer specific types of responses over others, effectively tuning the raw engine into a helpful assistant.
Hallucinations and Knowledge Integration
There is a critical issue here: hallucinations. This occurs when the model generates outputs that are structurally sound but factually incorrect. To address this, developers are moving beyond simple 'context stuffing' toward Retrieval-Augmented Generation (RAG).
If a user asks a specific question, a RAG-enabled system does not rely solely on its internal training. Instead, it retrieves relevant information from external sources and integrates that knowledge before generating an answer. While this aids in verification, the challenge of ensuring accuracy remains a central focus of current software developments.