Introduction to large language models (Lesson 01) #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

EricThomson wants to merge 14 commits into main from ai/llm_intro

+331 −3

Collaborator

EricThomson commented Sep 15, 2025

Overview of natural language processing and LLMs. Closes #27

Will mainly contribute python-200/lessons/05_AI_intro/01_intro_nlp_llms.md


          initial commit for intro to ai/llms

092ee14

EricThomson marked this pull request as draft

September 15, 2025 13:46

EricThomson requested a review from apoorva-machale

September 15, 2025 14:38

EricThomson added 7 commits

September 17, 2025 16:05


          intro to llms section added

d352278


          RLHF discussion added

cb5d75c


          LLM architecture section: tokenizers

a126036


          added discussion of embedding layer, with images

ec9b414


          attention mechanism

6cc365d


          ai/llm_intro attention discussion

a757458


          ai/llm_intro added embedding example

84a3d20

EricThomson marked this pull request as ready for review

October 7, 2025 00:45

EricThomson marked this pull request as draft

October 7, 2025 00:52

Collaborator Author

EricThomson commented Oct 7, 2025

Converted back to draft, this is way too wordy so I'm going to trim it down a bit before subjecting someone to reviewing this. 😄

EricThomson added 6 commits

October 22, 2025 12:31


          cleaned up llm explanation through transformer

aff5326


          resizing figures

943353b


          Merge remote-tracking branch 'origin/main' into ai/llm_intro

967c51f


          Merge remote-tracking branch 'origin/main' into ai/llm_intro

e5ec8ac


          updated transformer architecture description

f6acc8f


          fleshed out embedding visualization demo

6eb1cb8

EricThomson marked this pull request as ready for review

November 13, 2025 17:19

EricThomson requested a review from roshansuresh

November 13, 2025 17:20

roshansuresh reviewed

View reviewed changes

Collaborator

roshansuresh left a comment

Overall the intro lesson reads really well! The intro table of contents shouldn't include Abstraction layers there I think. I've left some small comments in the chapters.

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              - **1. Natural language processing**: an overview of the general field of natural language processing

              - **2. Introduction to LLMs**: what are large language models, and how do they work?

              - **3. LLM architecture**: a deeper dive into what makes LLMs tick.

              - **4. Demo: Visualizing embeddings** Visualizing how meanings are represented in LLMs.

Collaborator

roshansuresh Nov 26, 2025

For clarity, I would have it say "Demo - Visualizing embeddings:"

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              Ultimately, NLP aims to make human-computer interaction as intuitive as human-to-human exchanges, so it can be used in fields as diverse as healthcare diagnostics, explaining complex legal documents, and personalized education. 

              ### NLP Methods

Collaborator

roshansuresh Nov 26, 2025

Perhaps we can call this "NLP Methods - A History" or something?

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              There are really *two* different learning modes for LLMs. First, by training on huge bodies of text in the next-word-prediction task, we end up with what are often called *foundational*, *pretrained*, or *base* models. These are general purpose models that embody information from extremely broad sources. 

              However, these foundational models don't work well in special-purpose jobs like personal assistants, chatbots, etc. A *second* second training step fine-tunes these models on smaller, labeled datasets for specific applications.

Collaborator

roshansuresh Nov 26, 2025

This is actually a really important point. I think we should reiterate this in the chatbots lesson

Collaborator

roshansuresh Nov 26, 2025

Also, a typo in second second

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              If you have ever been using ChatGPT and it has asked you to rank two responses, this is OpenAI collecting data for future rounds of RLHF.

              The result with fine-tuning is the production of specialized models built on top of the same foundation. While in this course we will not go through the process of building your own LLM, the excellent book [Build a Large Language Model from Scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) by Sebastian Raschka, walks you through this in detailk using PyTorch if you are interested. The above picture is adapted from his book.

Collaborator

roshansuresh Nov 26, 2025

typo in "detailk*

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              The result with fine-tuning is the production of specialized models built on top of the same foundation. While in this course we will not go through the process of building your own LLM, the excellent book [Build a Large Language Model from Scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) by Sebastian Raschka, walks you through this in detailk using PyTorch if you are interested. The above picture is adapted from his book.

              In the next section we will dig into the details about how LLMS actually work. As we said, it isn't just that they are *large*, but their *architecture*, that makes them so powerful.

Collaborator

roshansuresh Nov 26, 2025

typo in "LLMS" (maybe fine but its probably better to have it be LLMs)

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              For a quick overview of LLM function, [check out this video](https://www.youtube.com/watch?v=5sLYAQS9sWQ).

              ## 4. Demo: Visualizing embeddings

              In the following demonstration we will visualize text embeddings based on their semantic similarity. Instead of similareity of single words like `apple` and `phone`, we will look at similarities among entire *sentences*.

Collaborator

roshansuresh Nov 26, 2025

typo in "instead of similareity"

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              ```

              We first need to load our OpenAI API key so you can use the embedding model that is part of their suite of models. We will discuss this more in the lesson on chat completions 

              > TODO: handle this discussion better -- put in README for this week and discuss handling of api keys, where to put it, that it will recursively seach parent dsirectories up to root). Please be sure not to share your API key!

Collaborator

roshansuresh Nov 26, 2025

You may want to remove this

lessons/05_AI_intro/01_intro_nlp_llms.md

    
              plt.show()

              ```

              This mapping shows that similar movies tend to cluster together in the embedding space. E.g., Christmas movies in one region of the space, romcons in another (note Love Actually is a Christmas Romcom). Feel free to drop new summaries in to see where they fall in this movie map.

Collaborator

roshansuresh Nov 26, 2025

Do we show the plot here or just ask them to run this code themselves?

lessons/05_AI_intro/README.md

    
              > To fill in later: brief motivational preview here. Briefly explain why this lesson matters, what students will be able to do by the end, and what topics will be covered. Keep it tight and motivating.

              > For an introduction to the course, and a discussion of how to set up your environment, please see the [Welcome](../README.md) page.

Collaborator

roshansuresh Nov 26, 2025

We can remove the prior statement

lessons/05_AI_intro/README.md

    
              2. [OpenAI Chat Completions API](02_open_ai_api.md)  

              Intro and overview of openai api chat completions endpoint. Go over required params (messages/model), but also the important optional params (max_tokens, temperature, top_p etc). Mention responses endpoint (more friendly to tools/agents). Discuss and demonstrate use of moderations endpoint.

              3. [Abstraction layers](03_abstractions.md)

Collaborator

roshansuresh Nov 26, 2025

We don't have a separate Abstraction layers chapter as far as I know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet