Juan C Olamendy

Juan C Olamendy

#large-language-models

Articles tagged with #large-language-models

Deploying vLLM with Docker: The Complete Guide to Production-Ready LLM Inference
Your GPU is sitting idle while your LLM inference requests queue up, one by one, painfully slow. You know there's a better way. You've heard about continuous batching, PagedAttention, and throughput numbers that seem too good to be true. Welcome to v...
Feb 3, 202610 min read19
Debug and Inject Data/Session to the Context in ADK
Ever stared at your AI agent wondering why on earth it gave that response? You're not alone. Debugging LLM-powered agents feels like shouting into a void—you send a prompt, something mysterious happens, and out comes... chaos. Here's the problem: mos...
Jan 22, 202613 min read3
Building Persistent Sessions with Google ADK: A Comprehensive Guide
Imagine having a conversation with someone who forgets everything you told them the moment you say goodbye. Every time you meet, you start from scratch, repeating your name, preferences, and context all over again. Frustrating, right? This is exactly...
Dec 3, 202521 min read7
The Statistical Reality of LLM Evaluation: What Works, What Doesn't, and When It Matters
Your LLM scored 85% on your test set. How confident are you in that number? What if I told you it might actually be anywhere between 70% and 95%? Most engineering teams ship LLM systems based on evaluation numbers that look precise but hide massive u...
Nov 25, 20258 min read2
Context Engineering: The Invisible Discipline Keeping AI Agents from Drowning in Their Own Memory
Your AI agent just crashed. Not because of a bug in the code. Not because the model failed. But because you fed it too much information. Every LLM has a memory limit, but most engineers treat it like infinite storage. They dump entire conversation hi...
Nov 17, 202518 min read10
The Runner-Session Architecture in Google ADK: Orchestrating Stateful AI Conversations
Your AI agent remembers that a user's favorite color is blue. It recalls their love for pizza and their excitement about The Matrix. But here's the question that separates amateur implementations from production-grade systems: How does this memory ac...
Oct 20, 202518 min read7

#large-language-models - Juan C Olamendy