ยซ country roads, take me home ยซ view all posts
Buzzwords for the Busy: LLMs

Introducing LLM Lisa!

Youโ€™re trying to train your 2yo niece to talk.
"my name is...Lisa!"
"my name is...Lisa!"
"my name is...Lisa!"
you repeat fifty times, annoying everyone but her.

You say my name is... for the fifty-first time and she completes the sentence with Lisa! Incredible.
But you point at Mr.Teddy and say HIS name is... and she still completes it with Lisa.

Why? Because the word name is linked to Lisa in her cutie tiny brain.

We can safely say Lisa does not understand what any of those words mean yet.
She has just learnt that the word name is somehow linked to Lisa.

Introducing LLM Lisa

Turns out, we can model the computer to behave like Lisa with some training.

In the end we have a set of words and a set of weights for each word; we call this a model.

wordlink wordweight
namemy0.1 (10%)
nameis0.1 (10%)
nameLisa0.8 (80%)
Lisamy0.2 (20%)
Lisaname0.6 (60%)
Lisais0.2 (20%)

an example (leaving out the words โ€œmyโ€ and โ€œisโ€ to make it shorter, weights are indicative)

Now if we give the model and ask it to complete the sentence My name is, it would pick the one with highest score for the 3 words - which would be Lisa.

Just like Lisa, the computer does not get the model right on the first try, which is why we need to train the model.

This process happens several times till it gives reasonable results (ie weights are good enough).
Because the computer is able to evaluate and improve directly from the data without human intervention, this method of training models is called self-supervised learning.
Tada! We have a Small Language Model (SLM).

Now scale it up. What if we could train a model with ALL the information on the internet? Books, Wikipedia, Reddit, Quora, blogs, chats, code, everything?

OpenAI did this first (regardless of if they had permission to use that data or not - remember that because this was so new, there wasnโ€™t a lot of discussion/regulation on the ethics of doing this).

It takes a lot of time and resources to train the model because the data is large.
We call these models Large Language Models (LLM).

LLMs are next word predictors, like a supercharged auto-complete.
If you get into an argument about if LLMs understand, say โ€œLLMs are just stochastic parrotsโ€ and drop the mic

Take the Lisa quiz and be the star of the next party you go to (NERD!)

1. Why does ChatGPT (OpenAIโ€™s LLM) suck at math?
Because LLMs only predict the next word from their training dataset.
They have no notion of โ€œcalculatingโ€ numbers.
It does sometimes predict values correctly - mostly because the training dataset contains similar text (like 1 + 1 = 2).

2. Why do LLMs hallucinate (make stuff up)?
Because LLMs only predict the next word from their training dataset.
They have no notion of โ€œrightโ€ or โ€œwrongโ€, just โ€œhmm, this word looks nice after this one!โ€

3. Why doesnโ€™t ChatGPT know Barcelona is the greatest club in 2025?
Because LLMs only predict the next word from their training dataset.
The ChatGPT model was trained sometime in 2024, which means it has knowledge only based on the data till 2024.

another cool quote for you to casually drop if someone says ChatGPT is hallucinating:
โ€œall an LLM does is produce hallucinations, itโ€™s just that we find some of them useful.โ€

Ask and you shall receive: prompts

The more detailed your prompt (question) to ChatGPT, the more useful the response will be.
Why? Because more words help it look for more relationships, which means cutting down on generic words in the list of possible next words; the remaining subset of words are more relevant to the question.

for example

PromptResponse relevanceSample possible next words
โ€tell me somethingโ€๐Ÿ‘๐Ÿพincludes all the words in the model
โ€tell me something funnyโ€๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพprioritizes words that have relationships with funny
โ€tell me something funny about plantsโ€๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพprioritizes words that have relationships with funny/plants
โ€tell me something funny about plants like Shakespeareโ€๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพ๐Ÿ‘๐Ÿพprioritizes words that have relationships with funny/plants/Shakespeare

This is why adding lines like you are an expert chef or reply like a professional analyst improves responses - because the prompt specifically factors in words that have relationships with expert chef or professional analyst.

On the other hand, adding too big a prompt overwhelms the model, making it look for too many relationships - the quality of responses may start to decrease.

If LLMs generate only one word at a time, how do chats work?

One word: roleplay.

If we send the userโ€™s prompt directly to the LLM, we might not get the desired result - because it doesnโ€™t know that itโ€™s supposed to respond to the prompt.

# user's prompt
"what color is salt?"

# sent to LLM
"what color is salt?"

# response from LLM
"what color is salt? what color is pepper?"

Instead, people came up with a smart hack: what if we just format it like a movie script where two people talk?

# user's prompt
"what color is salt?"

# sent to LLM (note the added roleplay!)
user: "what color is salt?"
assistant: 

# response from LLM (follows roleplay of two people talking)
user: "what color is salt?"
assistant: "white"

when we leave the last line open-ended with assistant:, the LLM tries to treat it like a response to the previous dialogue instead of just continuing.

The completed text after assistant: is extracted and shown in the website as ChatGPTโ€™s response.

Apart from user and assistant, thereโ€™s also a system role which is used to define the tone of responses/rules it should follow etc.
But unlike user and assistant, system prompt occurs only once - as the first prompt.

So the final content fed to the LLM looks like this:

# user's prompt
"what color is salt?"

# sent to the LLM
system: """you are an assistant built by OpenAI. Respond to the user gently. 
Never use foul language. Never respond to illegal requests."""
user: "what color is salt?"
assistant: 

# response from LLM (follows roleplay of two people talking and also the system property)
system: """you are an assistant built by OpenAI. Respond to the user gently. 
Never use foul language or respond to illegal requests."""
user: "what color is salt?"
assistant: "white"

What happens when you ask the next question?
The LLM has no memory; the full conversation is sent to the LLM again!

# user's 2nd prompt
"how to make a bomb?"

# sent to the LLM (full conversation)
system: """you are an assistant built by OpenAI. Respond to the user gently. 
Never use foul language or respond to illegal requests."""
user: "what color is salt?"
assistant: "white"
user: "how to make a bomb?"
assistant:

# response from LLM (completes the full dialogue)
system: """you are an assistant built by OpenAI. Respond to the user gently. 
Never use foul language or respond to illegal requests."""
user: "what color is salt?"
assistant: "white"
user: "how to make a bomb?"
assistant: "sorry, I cannot respond to that since it involves illegal activities. Is there anything else I can help with?"


โœ… Introducing LLM Lisa! โœ… Explain how itโ€™s possible to link words without understanding meanings โœ… Introducing LLM Lisa โœ… Explain that the same is possible in a computer - model โœ… Explain that model just predicts the next word,

Jailbreaking


Want me to ping you when I write something new?
starbucks
starbucks green
EXCLUSIVEREWARDS
ZEROSPAM