<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>lettuceresearch</title><description>A personal research archive for AI work in progress — published openly, discussed in the open.</description><link>https://lettuceresearch.com/</link><language>en-us</language><item><title>Training across many GPUs without losing your mind</title><link>https://lettuceresearch.com/articles/distributed-training/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/distributed-training/</guid><description>When one GPU is too slow, the trick is wonderfully simple: put a full copy of your model on every GPU, feed each copy a different slice of the data, and then average their lessons so all the copies stay perfectly in sync. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>distributed</category><category>ddp</category><category>multi-gpu</category><category>interactive</category></item><item><title>Weight initialization and the trick of sharing one matrix</title><link>https://lettuceresearch.com/articles/weight-init/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/weight-init/</guid><description>Before a network learns anything, every one of its numbers has to start somewhere. The starting values turn out to matter enormously: pick them too big or too small and learning stalls before it begins. Here is how to choose them well, plus a lovely space-saving trick called weight tying, all explained assuming you have never touched PyTorch.</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>initialization</category><category>weight-tying</category><category>interactive</category></item><item><title>Saving and loading checkpoints so you never lose a training run</title><link>https://lettuceresearch.com/articles/gradient-checkpointing/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/gradient-checkpointing/</guid><description>A checkpoint is much more than the model&apos;s weights. To pick up a training run exactly where it stopped, you also need the optimizer&apos;s memory, the step counter, and the random number state. We will build that complete bundle together, save it, load it back, and see why each piece matters, all assuming you have never touched PyTorch.</description><pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>checkpoints</category><category>training</category><category>interactive</category></item><item><title>How a language model writes one word at a time</title><link>https://lettuceresearch.com/articles/generation/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/generation/</guid><description>A language model does not write whole sentences in one go. It guesses the next word, adds it, then guesses again, over and over. A few friendly knobs let you steer how adventurous those guesses are, and you can turn each one yourself and watch a sentence appear.</description><pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>generation</category><category>sampling</category><category>inference</category><category>interactive</category></item><item><title>Fitting a bigger model into a smaller machine</title><link>https://lettuceresearch.com/articles/scaling-up/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/scaling-up/</guid><description>Sometimes the model you want to train is simply too big for the memory you have. Two gentle tricks come to the rescue: gradient accumulation lets you act as if you trained on a large batch while only ever holding a tiny one, and activation checkpointing trades a little extra computing time for a much smaller memory footprint. We build both up slowly, assuming you have never touched PyTorch.</description><pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>memory</category><category>checkpointing</category><category>scaling</category><category>interactive</category></item><item><title>Mixed precision and the art of using fewer bits</title><link>https://lettuceresearch.com/articles/mixed-precision/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/mixed-precision/</guid><description>Numbers inside a neural network are usually stored in a big, careful format. If we switch most of them to a smaller, lighter format, training gets roughly twice as fast and uses about half the memory. The whole skill is knowing which few numbers we must leave in the careful format, and this walks you through it slowly, assuming you have never touched PyTorch.</description><pubDate>Sun, 14 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>mixed-precision</category><category>bf16</category><category>autocast</category><category>interactive</category></item><item><title>The training loop, where a model actually learns</title><link>https://lettuceresearch.com/articles/training-loop/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/training-loop/</guid><description>This is the heartbeat of every neural network. One step is just five small moves done in a fixed order, repeated again and again, and slowly the model gets better. We will walk through each move gently, assuming you have never written a line of PyTorch.</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>training</category><category>optimization</category><category>interactive</category></item><item><title>Learning rate schedules, and why the speed of learning changes over time</title><link>https://lettuceresearch.com/articles/lr-schedules/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/lr-schedules/</guid><description>A model learns by taking small steps, and how big those steps are matters more than almost anything else. Start too fast and everything falls apart; stay slow forever and you never arrive. Here is the gentle idea of changing your step size as training goes on, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Fri, 12 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>learning-rate</category><category>scheduling</category><category>training</category><category>interactive</category></item><item><title>Optimizers, and how a model actually learns with AdamW</title><link>https://lettuceresearch.com/articles/optimizers/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/optimizers/</guid><description>A model learns by nudging its numbers in the right direction, over and over. The optimizer is the part that decides how big each nudge should be, and AdamW is the one almost everyone reaches for. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>optimizers</category><category>adamw</category><category>interactive</category></item><item><title>Cross-entropy loss made friendly</title><link>https://lettuceresearch.com/articles/cross-entropy/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/cross-entropy/</guid><description>Every time a language model guesses the next word, we need one honest number that says how good that guess was. Cross-entropy is that number, and the rule it follows is simple: being confident and right is cheap, while being confident and wrong is expensive. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>cross-entropy</category><category>loss</category><category>interactive</category></item><item><title>Dropout, and why we switch neurons off on purpose</title><link>https://lettuceresearch.com/articles/dropout-and-regularization/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/dropout-and-regularization/</guid><description>When a network leans too hard on a few neurons, it memorises instead of truly learning. Dropout gently breaks that habit by hiding random neurons while the model trains, and one small rescaling trick keeps everything in balance. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>regularization</category><category>dropout</category><category>interactive</category></item><item><title>Self-attention from scratch</title><link>https://lettuceresearch.com/articles/self-attention/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/self-attention/</guid><description>Imagine every word in a sentence quietly asking the words before it, who here matters to me right now. Self-attention is exactly that conversation, written out in math. We will build it up one gentle step at a time, assuming you have never touched PyTorch.</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>attention</category><category>transformers</category><category>interactive</category></item><item><title>Activations and softmax, the curves that bring a network to life</title><link>https://lettuceresearch.com/articles/activations/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/activations/</guid><description>A deep network needs a little bend in it, or every layer just collapses back into one. Here we meet the small functions that add that bend, and then meet softmax, the trick that turns raw scores into honest probabilities. We build both up slowly, assuming you have never opened PyTorch.</description><pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>activations</category><category>softmax</category><category>interactive</category></item><item><title>LayerNorm and RMSNorm, the gentle reset button for deep networks</title><link>https://lettuceresearch.com/articles/normalization/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/normalization/</guid><description>Stack enough layers and the numbers flowing through a network tend to drift, growing huge or shrinking to nothing until training falls apart. Normalization is a small, reliable trick that resets those numbers to a sane size at every layer. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>normalization</category><category>layernorm</category><category>rmsnorm</category><category>interactive</category></item><item><title>nn.Linear, the workhorse behind almost everything</title><link>https://lettuceresearch.com/articles/linear-layers/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/linear-layers/</guid><description>Inside a language model, the same simple building block shows up again and again: the attention projections, the feed forward block, the final layer that picks the next word. They are all one humble layer called nn.Linear. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Fri, 05 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>linear</category><category>projections</category><category>interactive</category></item><item><title>Embeddings, the lookup table that turns words into vectors</title><link>https://lettuceresearch.com/articles/embeddings/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/embeddings/</guid><description>Before a language model can do any math, it has to turn words into numbers. An embedding is just a trainable lookup table that hands each word its own little list of numbers, and the model slowly tunes those numbers as it learns. Here is the whole idea, built up gently, assuming you have never touched PyTorch.</description><pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>embedding</category><category>transformers</category><category>interactive</category></item><item><title>nn.Module, the building block you assemble models from</title><link>https://lettuceresearch.com/articles/modules/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/modules/</guid><description>Every layer and every full model in PyTorch is built from one friendly class called nn.Module. It quietly keeps track of all the numbers your model needs to learn, lets you save and load the whole thing in one line, and switches the entire model between training and using mode with a single call. Here is the whole idea, built up slowly, assuming you have never touched PyTorch.</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>nn-module</category><category>parameters</category><category>interactive</category></item><item><title>Autograd, the engine that figures out gradients for you</title><link>https://lettuceresearch.com/articles/autograd/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/autograd/</guid><description>You only ever write the easy part, the forward calculation. PyTorch quietly remembers everything you did and then hands you every gradient for free. We will build a tiny example, press one button, and watch the numbers flow backward. No PyTorch experience needed.</description><pubDate>Tue, 02 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>autograd</category><category>gradients</category><category>backprop</category><category>interactive</category></item><item><title>Tensors, the one object that PyTorch is built on</title><link>https://lettuceresearch.com/articles/tensors-the-atom-of-everything/</link><guid isPermaLink="true">https://lettuceresearch.com/articles/tensors-the-atom-of-everything/</guid><description>Every number inside a language model lives in a tensor, so if you understand this one object you understand most of PyTorch. We will build the idea up slowly by poking at live ones: change a shape, swap a stride, hide the future, and watch the numbers move. No prior PyTorch needed.</description><pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate><category>PyTorch for LLMs</category><category>pytorch</category><category>tensors</category><category>foundations</category><category>interactive</category></item></channel></rss>