🦜 Towards Data Science - Medium
@towardsdatascience.com.source.rss----7f60cf5620c9---4@rss-parrot.net
I'm an automated parrot! I relay a website's RSS feed to the Fediverse. Every time a new post appears in the feed, I toot about it. Follow me to get all new posts in your Mastodon timeline!
Brought to you by the RSS Parrot.
---
Your home for data science. A Medium publication sharing concepts, ideas and codes. - Medium
Your feed and you don't want it here? Just
e-mail the birb.
Setting Up Your Own Large Language Model
https://towardsdatascience.com/setting-up-your-own-large-language-model/
Published: July 4, 2026 15:00
Still a long way to go, but the future is promising
The post Setting Up Your Own Large Language Model appeared first on Towards Data Science.
Stop Returning Text from RAG: The Typed Answer Contract That Prevents Hallucination
https://towardsdatascience.com/stop-returning-text-from-rag-the-typed-answer-contract-that-prevents-hallucination/
Published: July 4, 2026 13:00
Enterprise Document Intelligence [Vol.1 #8A] - The schema is the contract: every field is a question the pipeline asks the model, and every answer is checkable
The post Stop Returning Text from RAG: The Typed Answer Contract That Prevents Hallucination…
AI Agents Explained: What Is a ReAct Loop and How Does It Work?
https://towardsdatascience.com/ai-agents-explained-what-is-a-react-loop-and-how-does-it-work/
Published: July 3, 2026 16:30
How agents reason, act, and observe their way to a final answer, one step at a time
The post AI Agents Explained: What Is a ReAct Loop and How Does It Work? appeared first on Towards Data Science.
Long Context vs. Short Context Model: When Does a Long Context Model Win?
https://towardsdatascience.com/long-context-vs-short-context-model-when-does-a-long-context-model-win/
Published: July 3, 2026 15:00
Balancing context capability against cost, speed, and data
The post Long Context vs. Short Context Model: When Does a Long Context Model Win? appeared first on Towards Data Science.
LLM Wikis Are Over-Engineered — I Replaced Mine With a Pure Python Compiler
https://towardsdatascience.com/llm-wikis-are-over-engineered-i-replaced-mine-with-a-pure-python-compiler/
Published: July 3, 2026 13:30
Most "LLM wikis" use agents, embeddings, and repeated model calls to organize local notes. I built a deterministic alternative: a pure Python compiler that turns messy markdown into a linked, linted wiki using only the standard library. Along the way, I…
The Untaught Lessons of RAG Retrieval: Cosine Is Not the Foundation
https://towardsdatascience.com/the-untaught-lessons-of-rag-retrieval-cosine-is-not-the-foundation/
Published: July 3, 2026 12:00
Enterprise Document Intelligence [Vol.1 #7ter] - Six positions on the retrieval brick that contradict the cosine-first reflex of mainstream RAG
The post The Untaught Lessons of RAG Retrieval: Cosine Is Not the Foundation appeared first on Towards Data…
Tokenminning: How to Get More from Your Chatbot for Less
https://towardsdatascience.com/tokenminning-how-to-get-more-from-your-chatbot-for-less/
Published: July 2, 2026 16:30
Tokenmaxxing is out. Real patterns for reducing costs without sacrificing AI effectiveness
The post Tokenminning: How to Get More from Your Chatbot for Less appeared first on Towards Data Science.
Design Loops, Not Prompts
https://towardsdatascience.com/design-loops-not-prompts/
Published: July 2, 2026 15:00
But don't let the model check itself
The post Design Loops, Not Prompts appeared first on Towards Data Science.
Time-Series LLMs, Explained with t0-alpha
https://towardsdatascience.com/time-series-llms-explained-with-t0-alpha/
Published: July 2, 2026 13:30
t0-alpha is a decoder-style patch transformer for probabilistic time-series forecasting. Raw series are split into 32-step patches, embedded, processed through causal time-attention and group-attention layers, and decoded into future quantiles rather than…
The Untaught Lessons of RAG Question Parsing: Structure Before You Search
https://towardsdatascience.com/the-untaught-lessons-of-rag-question-parsing-structure-before-you-search/
Published: July 2, 2026 12:00
Enterprise Document Intelligence [Vol.1 #6ter] - Six positions on the question-parsing brick that contradict the mainstream RAG playbook
The post The Untaught Lessons of RAG Question Parsing: Structure Before You Search appeared first on Towards Data…
Why Powerful ML Is Deceptively Easy — Part 2
https://towardsdatascience.com/why-powerful-ml-is-deceptively-easy-part-2/
Published: July 1, 2026 16:30
The next leakage problem is not only temporal. It is spatial, structural, and coverage-related. AI-generated illustration created with DALL·E
The post Why Powerful ML Is Deceptively Easy — Part 2 appeared first on Towards Data Science.
Persistent Latent Memory for Multi-Hop LLM Agents: How a 6G Handover Paper Closes the Agent Cold-Start
https://towardsdatascience.com/persistent-latent-memory-for-multi-hop-llm-agents-how-a-6g-handover-paper-closes-the-agent-cold-start/
Published: July 1, 2026 15:00
Every hand-off in your multi-agent pipeline is an expensive tokenization round-trip. Discover how Inductive Latent Context Persistence (ILCP) transfers a compressed hidden state so downstream agents never have to re-create the same context.
The post…
What Can We Do When Memory Becomes the New Bottleneck in Data Engineering?
https://towardsdatascience.com/when-memory-becomes-the-new-bottleneck-in-data-engineering-what-can-we-do/
Published: July 1, 2026 13:30
How Pandas chunking, Dask, and Polars help process millions of records when adding more compute isn't an option.
The post What Can We Do When Memory Becomes the New Bottleneck in Data Engineering? appeared first on Towards Data Science.
Build and Run Your Own AI Agent in the Cloud
https://towardsdatascience.com/build-and-run-your-own-ai-agent-in-the-cloud/
Published: July 1, 2026 12:00
Build and deploy an agent on AWS with Strands and AgentCore
The post Build and Run Your Own AI Agent in the Cloud appeared first on Towards Data Science.
Context Engineering for RAG : The Four Typed Inputs Behind Every RAG Answer
https://towardsdatascience.com/context-engineering-for-rag-the-four-typed-inputs-behind-every-rag-answer/
Published: June 30, 2026 16:30
Enterprise Document Intelligence [Vol.1 #7bis] - Tobi LĂĽtke and Andrej Karpathy named the practice in 2025. For a single document, each brick emits typed pieces that converge on one LLM call. Corpus, conversation, and tool extensions are follow-up work
The…
Surviving the Data Science Behavioral Interview
https://towardsdatascience.com/surviving-the-data-science-behavioral-interview/
Published: June 30, 2026 15:00
In the age of AI, standing out here means a lot more than ever. Here are three tips to walk into your next interview with confidence.
The post Surviving the Data Science Behavioral Interview appeared first on Towards Data Science.
How to Maximize Codex Exec Command
https://towardsdatascience.com/how-to-maximize-codex-exec-command/
Published: June 30, 2026 13:30
Build a more powerful coding agent setup with a model ensemble
The post How to Maximize Codex Exec Command appeared first on Towards Data Science.
Stop Choosing Between Local and Cloud LLMs: A Field Guide to Hybrid Patterns
https://towardsdatascience.com/stop-choosing-between-local-and-cloud-llms-a-field-guide-to-hybrid-patterns/
Published: June 30, 2026 12:00
A hands-on walkthrough of a hybrid local-cloud workflow using Gemma 4 and GPT-5.4, with reasoning and structured outputs
The post Stop Choosing Between Local and Cloud LLMs: A Field Guide to Hybrid Patterns appeared first on Towards Data Science.
How Far Can Classical NLP Go? From Bag-of-Words to Stacking on Spooky Author Identification
https://towardsdatascience.com/how-far-can-classical-nlp-go-from-bag-of-words-to-stacking-on-spooky-author-identification/
Published: June 29, 2026 17:34
An end-to-end classical NLP experiment on Kaggle’s Spooky Author Identification task: from Vowpal Wabbit and TF-IDF/NB-SVM baselines to a tuned stacked ensemble, with a compact representation survey of Bag-of-Words, BM25, Word2Vec, and FastText for…
Prompt Engineering Fails Quietly —  Prompt Regression Is Why
https://towardsdatascience.com/prompt-engineering-fails-quietly-prompt-regression-is-why/
Published: June 29, 2026 15:00
Small prompt changes can silently break critical behavior in production. This article introduces a practical framework to detect hidden regressions before users notice.
The post Prompt Engineering Fails Quietly —  Prompt Regression Is Why appeared first on…
I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work
https://towardsdatascience.com/i-completed-five-years-in-analytics-consulting-5-lessons-that-changed-how-i-work/
Published: June 29, 2026 13:30
The tools I use for analytics and reporting have changed more than I expected, yet my questions for any analytics project haven't moved much.
The post I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work appeared first on…
How to Choose Between Small and Frontier Models
https://towardsdatascience.com/how-to-choose-between-small-and-frontier-models/
Published: June 29, 2026 12:00
The rise of small language models
The post How to Choose Between Small and Frontier Models appeared first on Towards Data Science.
Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows
https://towardsdatascience.com/tail-control-the-counterintuitive-engineering-of-reliable-agentic-workflows/
Published: June 28, 2026 15:00
Behind a customer's API, a high-quality answer isn't enough. It has to be usable, which means on time. Delivering that consistently is a problem about variance, not speed, and the fixes are counterintuitive.
The post Tail Control: The Counterintuitive…
I Pitted XGBoost Against Logistic Regression on 358 Matches. The Boring Model Won.
https://towardsdatascience.com/i-pitted-xgboost-against-logistic-regression-on-358-matches-the-boring-model-won/
Published: June 28, 2026 13:00
A concrete bias–variance lesson: why the smallest model had the best cross-validated fit, and how to know when to reach for the big hammer.
The post I Pitted XGBoost Against Logistic Regression on 358 Matches. The Boring Model Won. appeared first on…
We Built a Routing Layer to Cut Our AI Costs. It Broke the Product.
https://towardsdatascience.com/we-built-a-routing-layer-to-cut-our-ai-costs-it-broke-the-product/
Published: June 27, 2026 15:00
A team cut their AI inference bill by more than half. Three months later, customer satisfaction was dropping and the cost savings were tied to the quality loss. Cost-optimization routing layers are a Pareto trap, and here's the detection methodology that…
How to Build a Powerful LLM Knowledge Base
https://towardsdatascience.com/how-to-build-a-powerful-llm-knowledge-base/
Published: June 27, 2026 13:00
Use coding agents to power your knowledge base
The post How to Build a Powerful LLM Knowledge Base appeared first on Towards Data Science.