all posts
May 17, 2026·4 min read·positioningaicareer

Why I now call myself a Full-Stack AI Systems Engineer

A title change isn't a re-skin. It's a commitment to a specific category of work — RAG systems, LLM agents, and cloud-native architecture — and to refusing the kind of AI work that doesn't actually do anything.

Most people calling themselves an "AI engineer" right now ship demo work — a Streamlit toy wrapped around a single gpt-4o call, a notebook that runs once for a screenshot, a Chrome extension that summarises tabs. That's fine as a hobby. It's not what I want on my invoices. "Full-Stack AI Systems Engineer" is a deliberately narrower label: production systems that take actions, hold state, integrate with auth and billing, and survive an audit. A title is a filter for what work you take, not a peacock feather. This post is why I picked this one.

What I mean by "AI systems"

Not chatbots. The shape of work I care about is agents that take actions in real workflows — book the appointment, draft the SOAP note, escalate the call, write the row to Postgres. That requires explicit state. I build these as LangGraph graphs: nodes are tools or sub-agents, edges are conditional routes, and the entire run is checkpointed so a dropped call or a 502 from a downstream API doesn't lose the user's place. Chained prompts in a while loop look like agents in a YouTube tutorial; they fall apart the moment a real customer interrupts mid-sentence.

Not embeddings demos. Production RAG means a retrieval evaluation harness (recall@k on a labelled set, not vibes), reranking, citation tracking back to source spans, and guardrails against prompt injection from retrieved chunks. If the answer can't cite where it came from, it's not RAG — it's a vector-flavoured hallucinator.

Not isolated services. The "full-stack" half of the label is load-bearing. The same agent that handles a call also needs auth, Stripe metering, a Next.js admin dashboard, Sentry, deploy pipelines, and a way for the customer to see why it did what it did. The canonical example is TradeAgent AI — a 24/7 voice receptionist for tradespeople, built on a LangGraph state machine with Deepgram for STT, ElevenLabs for TTS, and Twilio for multi-channel routing. I'm not going to re-write the case study here; the long version lives at /projects/tradeagent-ai. The point is: it's not one model call, it's a graph of them, plus everything around them.

What "cloud-native" actually means in this title

The boring half. Every AI feature I ship sits on top of Docker containers, deployed via GitHub Actions into Kubernetes (or a managed equivalent — Cloud Run, Fly, Render, whatever fits the budget), with Prometheus + Grafana for metrics, structured JSON logs shipped to a central store, Sentry for error tracking, and feature flags so I can dark-launch a new agent node without redeploying. None of that is interesting on its own. All of it is what separates a system that runs from a system that runs and you can debug at 2am.

A current example is AgentAudit OS — observability and compliance tooling specifically for LLM systems. Every agent action is written to a hash-chained audit log (tamper-evident, append-only), and the same events fan out as SIEM exports to Splunk, Sentinel, and QRadar. On top of that sits a multi-framework compliance mapper: EU AI Act, NIST AI RMF, ISO 42001, SR 11-7, HIPAA. The full breakdown is at /projects/agentaudit-os. I built it because every regulated-vertical client asked the same question in the first call: can you prove what the agent did?

The freelance lens

I've been working with international clients on Upwork since 2021 — top-rated, mostly US and EU founders — and that shapes what I take and what I refuse. "AI features" that are really copywriting tools dressed in chat UI: no. "Add AI to my SaaS" with no defined action the AI is supposed to take: no. AI agents that book real appointments and survive a Twilio reconnect: yes. MedScribe AI, which drafts SOAP notes from a clinician-patient conversation and waits for the clinician to approve before anything is committed to the EHR: yes, because the human-in-the-loop is the product, not a regret. That one's at /projects/medscribe-ai.

What this means for who I work with

If you're a founder shipping an AI product, you need one engineer who can do retrieval evaluation, write the React dashboard around it, run the Kubernetes cluster it lives on, and pass a SOC 2 readiness check — without four contractors and a Slack channel of finger-pointing. That's the engagement I'm built for. If you're a recruiter scanning this, read the title as a focused specialisation, not a generalist resume: I'm not the right hire for a pure-frontend role or a pure-MLOps role. And if you're at a company in a regulated vertical — healthcare, fintech, anything touching EU AI Act — you need your AI to be auditable from day one, not retrofitted under deadline. That's most of what I do now.

Closer

The work is the brand. Three AI systems shipped this year — TradeAgent, AgentAudit, MedScribe — define this title more than the title defines me; the label only matters because there are three production references behind it. If any of that lines up with what you need, the engagement shapes and pricing are at /services.