Back to Home

PROJECT LOG.

A running record of AI systems built, experiments run, and architectures designed — from LLM pipelines to production inference engines.

RAG Feb 2026
Hybrid Search Architecture for Production RAG

Combining dense vector retrieval with BM25 sparse search and a cross-encoder re-ranker to push retrieval recall above 94% on domain-specific corpora.

Read Post
Agents Jan 2026
Multi-Agent Orchestration: Lessons from 6 Months in Production

What actually breaks in multi-agent systems at scale — loop detection, tool call reliability, memory contention, and the surprising cost of over-planning.

Read Post
Infrastructure Dec 2025
GPU Cluster Autoscaling for Bursty LLM Workloads

Designing an autoscaling strategy on AWS that handles 10× traffic spikes without pre-warming costs eating your margin. Spot instances, queue depth triggers, and warm pool tuning.

Read Post
LLMs Nov 2025
QLoRA Fine-tuning on Consumer Hardware: A Practical Guide

Fine-tuning a 13B model to 99% of full fine-tune quality on a single 24GB GPU — quantisation choices, rank selection, and gradient accumulation tricks.

Read Post
Research Oct 2025
Why Chain-of-Thought Prompting Degrades Under Latency Constraints

An empirical analysis of CoT reasoning quality vs. token budget constraints — and a structured prompt compression technique that preserves 88% of reasoning depth.

Read Post
RAG Sep 2025
GraphRAG vs. Naive RAG: A Head-to-Head on Legal Documents

Benchmarking knowledge-graph-augmented retrieval against standard vector search on 10,000-document legal corpora — where graph wins, where it doesn't.

Read Post
Agents Aug 2025
Designing Reliable Tool Use for LLM Agents

Building deterministic, type-safe tool interfaces for LLM agents — schema design, retry logic, and observability patterns that make debugging tractable.

Read Post
Infrastructure Jul 2025
Streaming LLM Responses at Scale with Server-Sent Events

End-to-end streaming architecture from GPU inference to browser — backpressure handling, connection pooling, and graceful degradation under load.

Read Post