Vol. 312 · Feb 26, 2026

The smartest
five minutes
in ML.

Overnight arXiv papers. Kaggle shakeups. Production war stories. Distilled while you slept — lands before the first coffee cools.

Start Your Morning Briefing Read Yesterday's Digest ↗

14,200+ ML practitioners reading daily

train.py

$ python train.py --model llama3-finetune --lr 2e-5

Epoch 1/5 loss: 2.341 val_loss: 2.198 acc: 0.61

Epoch 2/5 loss: 1.876 val_loss: 1.754 acc: 0.71

Epoch 3/5 loss: 1.423 val_loss: 1.389 acc: 0.78

Epoch 4/5 loss: 1.102 val_loss: 1.087 acc: 0.83

Epoch 5/5 loss: 0.891 val_loss: 0.904 acc: 0.87

✓ Checkpoint saved → ./ckpt/llama3-ft-ep5.pt

$ wandb sync ./runs/llama3-finetune-0226

Syncing run "llama3-ft-0226" to W&B ...

$ python eval.py --split test --batch 32

Loading checkpoint from ./ckpt/llama3-ft-ep5.pt

Running evaluation on 4,218 samples ...

BLEU-4: 38.7 ROUGE-L: 0.621 BERTScore: 0.891

$ git add . && git commit -m "feat: add LoRA adapter"

[main 3f8a2c1] feat: add LoRA adapter — 4 files changed

$ python train.py --model llama3-finetune --lr 2e-5

Epoch 1/5 loss: 2.341 val_loss: 2.198 acc: 0.61

Epoch 2/5 loss: 1.876 val_loss: 1.754 acc: 0.71

Epoch 3/5 loss: 1.423 val_loss: 1.389 acc: 0.78

Epoch 4/5 loss: 1.102 val_loss: 1.087 acc: 0.83

$ python train.py --model llama3-finetune --lr 2e-5

Epoch 1/5 loss: 2.341 val_loss: 2.198 acc: 0.61

Epoch 2/5 loss: 1.876 val_loss: 1.754 acc: 0.71

Epoch 3/5 loss: 1.423 val_loss: 1.389 acc: 0.78

Epoch 4/5 loss: 1.102 val_loss: 1.087 acc: 0.83

Epoch 5/5 loss: 0.891 val_loss: 0.904 acc: 0.87

✓ Checkpoint saved → ./ckpt/llama3-ft-ep5.pt

$ wandb sync ./runs/llama3-finetune-0226

Syncing run "llama3-ft-0226" to W&B ...

$ python eval.py --split test --batch 32

Loading checkpoint from ./ckpt/llama3-ft-ep5.pt

Running evaluation on 4,218 samples ...

BLEU-4: 38.7 ROUGE-L: 0.621 BERTScore: 0.891

$ git add . && git commit -m "feat: add LoRA adapter"

[main 3f8a2c1] feat: add LoRA adapter — 4 files changed

$ python train.py --model llama3-finetune --lr 2e-5

Epoch 1/5 loss: 2.341 val_loss: 2.198 acc: 0.61

Epoch 2/5 loss: 1.876 val_loss: 1.754 acc: 0.71

Epoch 3/5 loss: 1.423 val_loss: 1.389 acc: 0.78

Epoch 4/5 loss: 1.102 val_loss: 1.087 acc: 0.83

Loss Curve↘ 0.891

Today's Edition

14 papers · 6 tools · 3 threads

~4 min read

Scroll to explore

arXivKaggleHuggingFaceGitHubPapers With CodeWeights & BiasesMLflowOpenReviewDistill.pubThe GradientarXivKaggleHuggingFaceGitHubPapers With CodeWeights & BiasesMLflowOpenReviewDistill.pubThe Gradient

01 / Overnight PapersFresh off the arXiv press · Feb 26

Six papers worth your attention this morning — each distilled to the one finding that changes how you think about the problem.

arXiv · cs.LG4 min

Flash Attention 3 cuts memory by 40% on A100s — and the math is surprisingly clean

Tri Dao's team rewrote the tiling algorithm to exploit hardware asynchrony. The key insight: overlapping GEMM and softmax reduces HBM round-trips by 2×. Production benchmarks show 1.8× throughput on 8k context.

Tri Dao et al.#attention

arXiv · cs.CL3 min

Mixture-of-Depths makes transformer inference 3× cheaper by skipping easy tokens

DeepMind#efficiency

HuggingFace5 min

Phi-4 hits GPT-4 parity on MMLU with 14B params — the distillation recipe is open

Microsoft Research#llm

arXiv · stat.ML6 min

Why your RAG pipeline retrieves the wrong chunks 31% of the time (and a fix)

Late chunking vs. early chunking ablations across 12 corpora. The culprit: semantic overlap at boundaries. Sliding window with 15% overlap drops retrieval miss rate to 8%.

Jina AI#rag

arXiv · cs.CV2 min

Segment Anything 2 video mode now runs real-time on a single 3090

Meta AI#vision

arXiv · cs.LG4 min

RLHF is overfit: reward hacking shows up at 1,000 preference pairs

Anthropic#alignment

02 / Workshop BenchTooling · Releases · Deployment

Index cards from the workshop bench — the releases and snippets your pipeline needs to know about before the sprint ends.

GitHub · Release3 min

LangChain 0.3 drops the callback hell — new streaming API is actually pleasant

The new .stream() interface replaces nested callbacks with async iterators. Migration guide: replace every chain.run() with await chain.stream(). Breaking change: output parsers now require explicit schema.

LangChain#llmops

GitHub · v2.12 min

MLflow 2.1 adds native LLM tracing — finally see what your chain is actually doing

Databricks#observability

GitHub · New4 min

Outlines: constrained generation that makes LLMs output valid JSON, every time

No more retrying until the JSON parses. Outlines uses finite-state machine masking to constrain token probabilities. Works with any HF model, adds ~3ms overhead per token.

.txt / dottxt-ai#structured-gen

GitHub · Patch2 min

vLLM 0.4 fixes the KV cache fragmentation bug that caused OOM on long contexts

vLLM Team#inference

GitHub · Alpha3 min

Marimo: reactive Python notebooks where every cell reruns correctly on change

Marimo#notebooks

HF · Dataset3 min

FineWeb-Edu: 1.3T tokens of filtered educational text — better than FineWeb for reasoning

HuggingFace#pretraining

Past Issue · Feb 19, 2026

Three cards from last Tuesday — fully readable

We don't blur the preview. You should know exactly what you're subscribing to.

arXiv · cs.LG5 min read

Mamba-2 closes the quality gap with Transformers on long-context tasks

Albert Gu's follow-up to Mamba addresses the weakness in associative recall. The new state space duality framework lets Mamba-2 match attention quality at 16k context while using 4× less memory. The key: structured matrix multiplication replaces the linear scan.

Albert Gu · CMU#ssm

GitHub · v1.43 min read

Instructor 1.4 adds async batch inference — 10× throughput for structured extraction

The new instructor.batch() API sends requests in parallel and deduplicates identical schemas. For extraction pipelines processing thousands of documents, this changes the unit economics. Migration: wrap your existing extract() calls in batch().

Jason Liu#structured-gen

X Thread4 min read

"Our model had a 94% accuracy. Our users had a 0% trust. Here's what we learned."

Head of ML at Weights & Biases on the gap between offline metrics and user adoption. The finding: users couldn't explain why the model made a decision, so they ignored it. Solution: add a one-sentence rationale to every prediction. Adoption went from 12% to 71%.

Weights & Biases#trust

Read the full Feb 19 issue ↗

03 / PostcardsCareer · Debates · Deadlines

Threads, competition shakeups, and conference deadlines — the community conversations worth joining before the 10 a.m. standup.

X Thread5 min

"We replaced our feature store with Redis + a 200-line Python script. 6 months later: no regrets."

Shreya Shankar's thread on pragmatic ML infrastructure is the most-shared post in the ML community this week. 847 retweets. The replies are equally good.

Shreya Shankar#mlops

X Thread4 min

The A/B test that showed our model was right and our metric was wrong

Eugene Yan#evaluation

Kaggle · Shakeup6 min

LLM Science Exam: 2nd place solution used no LLMs — just Wikipedia TF-IDF

Post-deadline shakeup moved the leaderboard by 847 positions. The winning insight: retrieval quality mattered more than model size. Full writeup with code.

cdeotte#competition

Community1 min

ICLR 2026 deadline extended to March 3rd — you have one more weekend

ICLR#deadline

Kaggle · Active2 min

Child Mind Institute competition: $50K prize, 3 days left, public LB flip risk is high

Kaggle#competition

X Thread4 min

How Notion's ML team reduced embedding inference cost by 70% without changing models

Notion Engineering#production

Free · No spam · Unsubscribe anytime

Join 14,200 practitioners who read
Digest before the standup.

One email, five minutes, every weekday morning. arXiv papers, Kaggle shakeups, and production war stories — curated by engineers for engineers.

Start Your Morning Briefing Read Yesterday's Digest ↗

14,200+

subscribers

312

issues published

4.8★

avg rating

~4 min

average read time

The smartestfive minutesin ML.

Flash Attention 3 cuts memory by 40% on A100s — and the math is surprisingly clean

Mixture-of-Depths makes transformer inference 3× cheaper by skipping easy tokens

Phi-4 hits GPT-4 parity on MMLU with 14B params — the distillation recipe is open

Why your RAG pipeline retrieves the wrong chunks 31% of the time (and a fix)

Segment Anything 2 video mode now runs real-time on a single 3090

RLHF is overfit: reward hacking shows up at 1,000 preference pairs

LangChain 0.3 drops the callback hell — new streaming API is actually pleasant

MLflow 2.1 adds native LLM tracing — finally see what your chain is actually doing

Outlines: constrained generation that makes LLMs output valid JSON, every time

vLLM 0.4 fixes the KV cache fragmentation bug that caused OOM on long contexts

Marimo: reactive Python notebooks where every cell reruns correctly on change

FineWeb-Edu: 1.3T tokens of filtered educational text — better than FineWeb for reasoning

Three cards from last Tuesday — fully readable

Mamba-2 closes the quality gap with Transformers on long-context tasks

Instructor 1.4 adds async batch inference — 10× throughput for structured extraction

"Our model had a 94% accuracy. Our users had a 0% trust. Here's what we learned."

"We replaced our feature store with Redis + a 200-line Python script. 6 months later: no regrets."

The A/B test that showed our model was right and our metric was wrong

LLM Science Exam: 2nd place solution used no LLMs — just Wikipedia TF-IDF

ICLR 2026 deadline extended to March 3rd — you have one more weekend

Child Mind Institute competition: $50K prize, 3 days left, public LB flip risk is high

How Notion's ML team reduced embedding inference cost by 70% without changing models

Join 14,200 practitioners who readDigest before the standup.

The smartest
five minutes
in ML.

Join 14,200 practitioners who read
Digest before the standup.