I just built a system called Cortex with my AI partner Claude. It's a four-layer cognitive memory system for AI agents. One piece of it automatically saves session context (what was worked on, files edited, user intent) into a vector database so future sessions can semantically search for past work. Here's the concrete system I built, so you have real examples to reference: **The pipeline:** 1. When a Claude Code session closes, a Stop hook fires 2. A Python script parses the session transcript (JSONL file with every user message, assistant response, and tool call) 3. It extracts key info: what the user asked for, what files were modified, what tools were used, what project it was in 4. It builds a plain-text summary of the session (~500-1000 chars) 5. That summary text gets stored in ChromaDB 6. ChromaDB runs it through an embedding model (all-MiniLM-L6-v2) which produces a 384-dimensional vector 7. The vector goes into an HNSW index for fast nearest-neighbor search 8. Metadata (wing, room, session_id, file list, timestamps) gets stored in SQLite alongside the vector 9. Later, when I search "what was I doing with the Legion Go 2 migration", that query text goes through the SAME embedding model, becomes a 384-dim vector, and ChromaDB finds the closest stored vectors by cosine distance **What I want you to teach me:** I'm a software engineer with 20 years experience (C#/.NET, infrastructure, networking) but I'm new to ML/AI internals. I understand the system I built at an architecture level but I want to actually understand what's happening under the hood. Walk me through these topics in order, building on each one. Use my Cortex system as the running example throughout so it's concrete, not abstract. 1. **Tokenization** — What is WordPiece? Why does "a24f45e7" become ["a", "##24", "##f", "##45", "##e", "##7"]? What do the ## symbols mean? Why not just split on spaces? How big is the vocabulary? What happens to words it's never seen? 2. **Embeddings** — What is an embedding layer? How does a token ID (an integer) become a 384-dimensional vector? What do those 384 numbers actually represent? Are they learned during training or computed on the fly? What does it mean for two vectors to be "close"? 3. **Self-Attention / Transformers** — How does the model know that "Legion" + "Go" together means a gaming device, not the verb "go"? What is self-attention doing mechanically? What are the 6 layers doing differently from each other? Keep it intuitive but don't oversimplify. 4. **Mean Pooling** — Why average all the token vectors into one? What information is lost? Are there alternatives (CLS token, max pooling)? Why does all-MiniLM-L6-v2 use mean pooling specifically? 5. **The embedding model itself (all-MiniLM-L6-v2)** — What was it trained on? How was it trained to make similar sentences produce similar vectors? What is contrastive learning / sentence-pair training? Why this model specifically vs larger ones? What are the tradeoffs at 384 dimensions vs 768 or 1536? 6. **Cosine Distance** — Why cosine and not Euclidean? What does the angle between two 384-dim vectors actually mean conceptually? When would cosine fail? 7. **HNSW (Hierarchical Navigable Small World)** — How does it find nearest neighbors in O(log N) instead of comparing every vector? What's the graph structure? What are the layers in "hierarchical"? How does it trade accuracy for speed? 8. **ChromaDB specifically** — How does it orchestrate all of this? Where does SQLite fit vs the vector index? When I do a filtered search (wing="sessions"), what order do the operations happen? Why can't a second process see vectors written by the first process until restart (the stale HNSW index issue I hit)? 9. **The full pipeline end-to-end** — Walk through my exact flow one more time with everything you've taught me, so I can see how all the pieces connect. Text in, searchable memory out. **Style:** I learn best with concrete examples, analogies to things I already know (databases, networking, data structures), and building from simple to complex. Don't dumb it down but don't assume I know ML jargon without defining it first. If something has a simple intuition AND a mathematical reality, give me both. I like knowing why things were designed the way they were, not just what they do.