This post is a summary of Andrej Karpathy's video "How I use LLMs," which was released on his YouTube channel on February 28, 2025. I've briefly introduced the content of each chapter and added my own thoughts (Me) at the end.

Intro

The beginning of the video introduces the most well-known LLM services as of late February 2025. First, OpenAI's ChatGPT (where he used to work), followed by Gemini, MetaAI, Copilot, Claude, Grok, and others.

He explains the very basic operating principles of LLMs. He likens an LLM to a large zip file containing almost all the information on the internet.

pre-training
- It's like compressing the entire internet into a lossy and probabilistic zip file by breaking it down into tokens.
- A very expensive process. It costs tens of millions of dollars and takes months.
- For this reason, pre-training is not performed frequently, and as a result, the model has a knowledge cut-off.
post-training
- Compared to putting a "smiling face" on a pre-trained model. Training to have the persona of an assistant that responds to user questions.
- Training the model using a dataset of human-created conversations. Through this process, the model learns the style of an assistant answering questions.

Pre-training is the stage where the model acquires a wide range of knowledge, and post-training is the stage where the model is given a persona to respond to users in a useful way based on the knowledge it has acquired.

He starts by asking ChatGPT a simple question. How much caffeine is in one shot of Americano? He's not sure if it's the correct answer, but it's probably right. He then searches the internet and confirms that ChatGPT's answer is roughly accurate.

Then he explains that he has a problem he's actually trying to solve, and he asks other models like ChatGPT, Claude, Gemini, Grok, and DeekSeek, and explains whether they gave him the answer he wanted.

Me: It's surprising, and in a way, it makes sense that he subscribes to almost all LLM services. Being an AI researcher, he probably uses various services firsthand.

Thinking model

The "thinking model," which is further trained through reinforcement learning, can perform "thinking" similar to the human thought process in problem-solving, such as trying various ideas, backtracking, and re-examining assumptions.

The thinking model may show higher accuracy, especially in complex problems such as math and coding, but it may take longer to respond.

It is efficient to selectively use the thinking model and the general model depending on the difficulty of the problem.

Me: Currently, the general model and the Thinking model are separated, but this distinction will disappear in the future. (Claude 3.7 Sonnet uses a single model, but users can choose to turn the thinking mode on or off.)

Me: As a developer using the LLM API, I have to decide when to use the general mode and when to use the thinking mode. This is not just an engineering-level concern, but a point that the entire product team needs to consider. I haven't developed a service that requires deep consideration of this yet, but I think I will in the near future.

Tool use

Next, he explains Tool use. Due to the pre-training described earlier, LLMs cannot learn data after a certain point in time, resulting in a knowledge cutoff. Tool use is a way to overcome this cutoff.

Internet search
Python interpreter
- For the question 30 x 9, the LLM immediately answers 270, but this is not the result of a numerical calculation, but the result of next token prediction, which is what the LLM originally did.
- But when asked to calculate something that cannot be predicted, ChatGPT uses the Python interpreter tool.
- The LLM does not directly answer, but makes the LLM write Python code, executes the code, and returns the result to the LLM.
- OpenAI trained ChatGPT to use Tools in these situations (e.g., in these situations, don't answer directly, but use Tools to find the answer).

The types and scope of tools available vary by LLM service. ChatGPT supports Python interpreter, but Grok and Gemini do not, and Claude seems to write and execute code through Javascript.

And it is also possible to combine two or more tools. In the case of requesting data analysis on data retrieved from the Internet, it responds to the user's request by writing Python code and executing the code.

However, even when using tools in this way, the final response may be based on incorrect values. Therefore, the final result still needs to be checked by a human.

Me: It's interesting that ChatGPT uses Python as an interpreter (Tool), while Claude uses Javascript. I wonder if this is related to the fact that Claude's Artifacts can create very nice React apps. I also wonder if this could change depending on which programming language is emphasized during the training phase.

Me: There is much room for further development/improvement in Tool use. I think it would be better if technologies like Claude MCP (Model Context Protocol) were made into a common protocol regardless of the LLM vendor.

Claude artifacts & diagram

One of Claude's unique features is Artifacts. When you ask Claude to create an app in natural language, it creates the app in real time, mostly through React code, and allows you to use the app immediately on the right side of the screen.

And it is also possible to create diagrams based on Mermaid syntax. It creates a diagram according to the user's natural language request, and this can also be checked immediately on the right side of the screen.

Me: Artifacts are effective for creating very simple proof-of-concept apps. I use it often, and I feel that the accuracy of the app is increasing and the error rate is decreasing.

Cursor: Composer, writing code

If you mainly write code at work, you don't often go to ChatGPT to request code writing. ChatGPT doesn't have context, so it's very slow to put my context here and request code writing.

I'm using a tool called Cursor. Cursor is an IDE that can open the module you are working on and access the entire code. In other words, it is possible to assign appropriate context to the LLM without human intervention.

Through Composer, Cursor opens the necessary code files and even modifies them directly in response to the user's natural language request. If user approval is required (e.g., creating a directory), the user must manually confirm, and once approved, Composer automatically proceeds with the subsequent work. In other words, it's like Composer and a person sitting side-by-side at a computer desk, sharing a keyboard, and taking turns reading and writing code.

Me: I had a disappointing experience using Cursor about a year ago, so I haven't been using it until now. I reinstalled it while writing this post, and it's completely different in terms of usability, and I think I might subscribe after the 2-week free trial ends.
I was originally using a combination of IntelliJ+Github copilot or VSCode + Cline + Github copilot to combine LLM with coding. I think I'll use Cursor diligently for two weeks, compare it to these three service combinations, and choose one.

Audio (speech) I/O

When making natural language queries to LLMs on desktop and mobile devices, I mostly use voice. Especially on mobile devices, about 80% of queries are made by voice.

We'll look at two voice modes.

1. Using voice instead of typing in the chat interface

Touch the 'microphone' icon in the mobile app chat interface.
- Desktop apps don't have this feature, so I use another voice ➡️ text conversion app (e.g., SuperWhisper).
The LLM recognizes the voice and converts it to text. The converted text becomes the input of the LLM.
Two tasks are performed sequentially.
1. Voice ➡️ text conversion,
2. Text ➡️ LLM input
Andrej Karpathy calls this Fake audio.

2. Advanced voice mode

Use voice (audio) itself as LLM input.
- The model understands audio chunks and predicts the next audio chunk (Andrej Karpathy calls this True audio).
In ChatGPT, it is served under the name Advanced voice mode.

Me: Personally, I'm skeptical about whether voice will completely replace typing from a UX perspective. This is because detailed text specification that cannot be done with voice is only possible with typing. Highlighting a specific character in a whole sentence or editing the structure of a sentence is difficult to do with voice. But I think that as technology advances further, the importance of these highlights or sentence structure will decrease, and perhaps someday voice may replace typing. And perhaps voice and typing will coexist forever.

NotebookLM, podcast generation

You can upload documents such as Web links, Youtube, PDF, Markdown, etc., and ask and answer questions based on these sources.

Also, one of the main features is the ability to generate podcast audio. Based on the uploaded source, two podcast hosts conduct a conversation.

Me: I personally use this service frequently. I feel that the service quality is improving over time. It is very convenient to be able to upload a long document and immediately ask questions in natural language.

Me: I also use it when I want to deeply understand a document. I upload the document, generate a podcast, and then listen to the podcast. Sometimes I listen to the podcast before reading the document, and sometimes I listen to the podcast after reading the document. Both are effective.

Image: See, OCR, DALL-E

Explains how images can be represented as tokens so that LLMs can understand and process them. The LLM, the transformer neural network itself, does not distinguish whether the input token is text, image, or audio.

In the encoder stage, data of each modality is converted into a sequence of token IDs.
The LLM receives this sequence and predicts the next token ID sequence.
The decoder restores this predicted token ID sequence into data of the corresponding modality (text, image, audio).

Use case - Image input

Upload an image to the LLM. Specifically, upload an image to the chat interface in a service like ChatGPT.
Ask natural language questions related to the image and receive answers.

Use case - Image output

A technology that generates images based on natural language queries.

Me: Image input use case is one of the LLM usage methods I use most personally. I draw diagrams or structural diagrams myself and use screenshots of them to verify whether they are well written and factually accurate.

Me: I recently started using image output. When writing blog posts, I often struggle with choosing a cover image, so I create one using tools like ChatGPT or ImageFX.

ChatGPT memory & Custom instruction

Memory

Basically, a conversation starts with an empty token. Every time you start a new conversation, all context from previous conversations is cleared. This is due to the basic operating principle of LLMs.

ChatGPT provides the ability to manage user personal memory based on conversation content. The user can directly ask to remember specific conversation content, or ChatGPT can store necessary information in memory based on its own judgment.

After that, even if you start a new conversation, you can have a personalized conversation based on that memory.

Custom instruction

Set ChatGPT's response style, personality, specific preferences, etc.

Me: I think it would be good if other LLM services besides ChatGPT also introduced the Memory function.

Me: Custom instruction can be used not only in ChatGPT but also in Claude - Projects. However, I feel that ChatGPT can be set more richly.

Custom GPTs

There are times when you need to give a significant amount of context to the LLM before starting a conversation. The purpose of starting this conversation, the specific response format when responding, the topic of the conversation, etc.

However, if this context is personal, you can use Custom instruction, but if you want to use this context only in a specific conversation, you can create a Custom GPT.

After creating a Custom GPT with the context you want, if you want to have a conversation on that topic, just select the created Custom GPT and start the conversation.

Me: I thought about what other LLM services have similar features to Custom GPTs, and I think Claude's Projects are somewhat similar. Projects can also be customized with prompts, and when a user starts a conversation from that Project, the context is set. However, I think ChatGPT's Custom GPTs can be used more usefully.

Comments (0)

Checking login status...

No comments yet. Be the first to comment!

How Should We Use LLMs?

Comments (0)