-
Notifications
You must be signed in to change notification settings - Fork 8
Intro to llm augmentation #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
EricThomson
wants to merge
3
commits into
main
Choose a base branch
from
ai/llm_augmentation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| # Knowledge Augmentation for LLMs | ||
| Modern large language models are astonishingly good at generating fluent, human-like text. But as we learned in [Week 1](../05_AI_intro/README.md) of the AI module, they are ultimately very sophisticated autocomplete systems. They produce language that sounds right, but were not trained to generate *truth*. | ||
|
|
||
| This is why hallucinations happen: the model fills gaps with confident guesses, misremembers facts, or invents details that fit the conversation. Even when the model is trying to be helpful, it may prioritize what it thinks you want to hear over what is actually correct. And because LLMs are trained on data from the past, with a fixed cutoff date, they rapidly fall out of sync with current information. | ||
|
|
||
| When we want an LLM to behave more like a truth-tracking assistant rather than a storyteller, we need to augment its knowledge. Over the last few years, three main approaches have emerged: prompt engineering, fine-tuning, and retrieval-augmented generation (RAG). Each approach has its place, and professional AI systems often use more than one. If you want an LLM to be more truthful, more up-to-date, and more grounded in real data, these are the tools you will reach for. | ||
|
|
||
| In this quick overview, we'll discuss each method, and then in the rest of the hands-on lessons for this week, we will do a deeper dive into RAG. | ||
|
|
||
| ## Prompt engineering | ||
| The simplest way to give a model new information, [which we discussed last week](../05_AI_intro/prompt_engineering.md), is *prompt engineering*. In this context, it is sometimes called *prompt injection*. | ||
|
|
||
| Here, you attach the required information or facts directly into the prompt: you simply tell the model what it needs to know before asking it to complete a task. This is sometimes called "stuffing" or "in-context learning." If the information is short, stable, and changes often, this can be surprisingly effective. It is also simple and works very quickly. The downside is that prompts can become long and unwieldy, and the model can forget or ignore parts of them as the conversation grows. Also, these prompts consume tokens which can become costly: if you want to paste huge document stores into the prompt, then it is not cost effective. | ||
|
|
||
| ## Fine-tuning | ||
| A more heavyweight approach is fine-tuning. Here, we literally retrain the neural network that contains the knowledge embodied in the LLM. This is useful when you want the model to adopt a particular style or internalize patterns that are too complex to express in a prompt. The tradeoff is cost and rigidity: fine-tuning requires many examples, and can be quite costly. OpenAI has a fine-tuning API, and [it is fairly easy to use and build your own custom LLM](https://www.datacamp.com/tutorial/fine-tuning-openais-gpt-4-step-by-step-guide) using their interface, if you have training data. The main concern is cost. | ||
|
|
||
| ## RAG | ||
| The third approach sits between the above two in terms of complexity: retrieval-augmented generation, or RAG. Instead of trying to store new knowledge inside the model's parameters, RAG retrieves relevant information from an external knowledge store (such as a database), and puts that information into the LLM prompt so that the LLM can have that information in the query. | ||
|
|
||
| The way this knowledge retrieval works can be as simple as keyword search, or it can be more sophisticated semantic search over a database. RAG systems are powerful because they combine an LLM's reasoning with a searchable database: the database can be updated on premises at any time without retraining the model. | ||
|
|
||
| If your business has large corpus of information that you need to quickly access to build a more accurate LLM responses, RAG is a great avenue. | ||
|
|
||
| ### Where we are headed | ||
| In this week's coming lessons, partly because of their importance in industry, we will focus on RAG systems. We will start with the most brittle version, keyword-based retrieval, to highlight the basic logic of RAG. Then we move to naive semantic RAG, where chunks of text are converted to embeddings and stored in a vector database. Finally, we will move to a framework (LlamaIndex) to recreate the same idea in a more production-ready form with much less code. This is more like what you will see in industry. | ||
|
|
||
| # Additional materials | ||
| There are lots of resources on the problem of hallucination in LLMs. For instance: | ||
| https://www.nngroup.com/articles/ai-hallucinations/ | ||
|
|
||
| For more on prompt engineering, fine-tuning, and RAG, see the following: | ||
|
|
||
| - [Article: RAG vs fine-tuning vs prompt engineering](https://www.k2view.com/blog/rag-vs-fine-tuning-vs-prompt-engineering/#RAG-vs-fine-tuning-vs-prompt-engineering-compared) | ||
| - [Video: RAG vs fine-tuning vs Prompt engineering](https://www.youtube.com/watch?v=zYGDpG-pTho) | ||
|
|
||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" If you want an LLM to be more truthful, more up-to-date, and more grounded in real data, these are the tools you will reach for."
I think this part could be rephrased because I do not understand it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that is helpful -- I will expand on that!