Hamza Khan

Generative AI Engineer

Building production-ready GenAI systems with strong engineering foundations.

I work across RAG, tool-connected assistants, evaluation, contextual memory, and observability to build AI products that are useful, measurable, and reliable in real environments.

AI Twin

Ask about my work

A conversational layer for discussing architecture, evaluation, systems, and project experience.

GitHub
Suggested prompts

How was the RAG app evaluated?

What guardrails did you add?

How do you approach production AI?

Welcome

Start a technical conversation.

Ask about RAG, evaluation, contextual memory, observability, shipping AI systems, or specific projects.

About

Applied AI engineering with full-stack depth.

My background combines production software engineering with recent deep work in Generative AI. I have built systems across agritech, geospatial, construction, and AI product workflows, with a focus on turning technical capability into practical, measurable delivery.

RAG Systems

Production retrieval built for real users

Grounded assistants with retrieval, source-aware answers, and architectures designed for reliability instead of demos.

Evaluation

Quality loops that make AI accountable

Offline datasets, baseline scoring, regression checks, and practical feedback loops for improving output quality over time.

Observability

Visibility into what the system is doing

Tracing, structured logs, metrics, and operational thinking that make debugging and iteration faster in production.

Experience

A path from shipped product work to applied GenAI systems.

2026 - Present

Independent Generative AI Engineer

Building production-style GenAI systems across RAG, evaluation, guardrails, contextual memory, and observability.

2024 - 2025

AI First Software Developer

Shipped AI-powered geospatial product features for agricultural monitoring workflows used across multiple farms and clients.

2022 - 2024

Software Engineer / Full-Stack Developer

Delivered data-heavy interfaces, upload workflows, and cloud-connected spatial products across construction and real estate.

Signals

Evidence over claims.

0.78 faithfulness baseline established through RAG evaluation

50% longer supported conversations via context management and memory strategies

45% faster map rendering by moving to COG tile workflows

Full-stack delivery across AI, geospatial, cloud, and product-facing systems

Projects

Case studies from real systems.

These projects are presented as engineering case studies rather than generic portfolio cards. The focus is on architecture, evaluation, observability, product usefulness, and what was actually built.

FastAPI, Pinecone, OpenAI, Ragas, AWS

Production RAG System

A production-oriented retrieval system with offline evaluation, retrieval-vs-generation benchmarking, Prometheus and Grafana observability, and grounded answer generation.

0.78 faithfulness baseline0.62 answer relevancy baseline0.75 context recall baseline

Built to show what a production-minded RAG stack looks like beyond a demo, with measurable quality and operational visibility.

LangGraph, FastAPI, Postgres, Pinecone, Phoenix

Agentic Legal Review Backend

A production-style multi-agent backend for legal review and workflow automation, combining retrieval, orchestration, persistence, human approval, and offline evaluation.

Multi-agent review orchestrationPostgres-backed run persistenceHuman-in-the-loop revision flow

Built to demonstrate a more realistic agentic backend with traceability, workflow state, review lifecycle persistence, and evaluation support.

Portfolio Expansion

More Applied AI Systems

This portfolio structure is set up to expand with more GenAI, evaluation, and internal tooling work as additional systems are published.

More live systems to addMore architecture breakdownsMore project write-ups coming

The goal is to turn this site into a stronger technical portfolio rather than a generic personal landing page.

Blogs

Writing on practical GenAI engineering.

This area can evolve into articles, technical notes, or architecture breakdowns. I have added professional placeholders so the structure is already there as you publish more work.

Building RAG Beyond the Demo Stage

Thoughts on grounding, evaluation, and why a working prototype is not the same thing as a production-ready assistant.

Drafting / Coming soon

Contextual Memory for Long-Running AI Interactions

How memory, trimming, and compaction patterns change the usability of assistants over longer conversations.

Drafting / Coming soon

What I Measure in Applied GenAI Systems

A practical view on evals, observability, and the signals that matter when you want AI systems to be trusted.

Drafting / Coming soon

Contact

Open to thoughtful AI engineering work.