Build a Document Q&A App with LangChain + Claude: An End-to-End Streamlit Tutorial

Wire LangChain, langchain-anthropic, and ChromaDB into a Streamlit chat UI that answers questions over uploaded PDFs, with retrieval evaluation included.

20 May 2026 Updated 20 May 2026 ~14 min read

LangChain Python framework documentation overview page at docs.langchain.com — the canonical reference the tutorial builds against

Image: LangChain Python framework overview, used for editorial coverage of the framework taught in this tutorial.

What you’ll build

A working document question-answering app that ingests a small library of PDFs, embeds them into a local Chroma vector store, retrieves the most relevant passages for any user question, and asks Claude Sonnet 4.6 to answer using only that retrieved context. The whole thing runs behind a Streamlit chat interface with file upload and conversation history.

By the end of this walkthrough you will have one Python file (app.py) plus a requirements.txt, running locally at http://localhost:8501, capable of being deployed for free on Streamlit Community Cloud. The pattern is the standard retrieval-augmented generation (RAG) loop documented in the LangChain RAG tutorial: load, split, embed, store, retrieve, generate.¹

Three pieces do the heavy lifting. LangChain orchestrates the chain and provides the document loaders, splitters, and retriever interface.² The langchain-anthropic package wraps Claude with a ChatAnthropic class that drops into any LangChain chain.³ Chroma stores the embeddings and answers similarity queries — it runs in-process, persists to a local directory, and needs no separate server.⁴ Streamlit gives you the UI primitives — chat messages, file uploader, session state — without writing a line of HTML.⁵

What you’ll need

Python 3.10 or later. The LangChain 1.x line targets 3.10 and above.²
An Anthropic API key. Sign in at console.anthropic.com, create a key, and budget for a few cents of Sonnet 4.6 tokens to follow the tutorial end to end. Per Anthropic’s pricing page, Sonnet 4.6 lists at $3 per million input tokens and $15 per million output tokens.⁶
A handful of PDFs to test against. A product manual, a research paper, a contract — anything you have lying around. Three to five files is enough for the walkthrough.
Roughly 45 minutes if you copy-paste the code, longer if you read every line.

The completed file sits at around 120 lines. No background in vector databases or embeddings is assumed; the relevant terms are defined inline.

Step 1: Install the packages

Create a fresh project directory and a virtual environment, then install the dependencies:

mkdir doc-qa-app && cd doc-qa-app
python -m venv .venv
source .venv/bin/activate   # on Windows: .venv\Scripts\activate

pip install \
    langchain==1.2.18 \
    langchain-anthropic==0.4.1 \
    langchain-chroma==0.2.6 \
    langchain-community==0.4.4 \
    chromadb==1.3.2 \
    streamlit==1.50.0 \
    pypdf==6.1.3 \
    sentence-transformers==3.4.1

A note on what each package does. The langchain and langchain-community packages provide the chain primitives and the PDF loader. langchain-anthropic adds the ChatAnthropic class — version 0.4.1 on PyPI as of May 2026 supports Claude Sonnet 4.6.⁷ langchain-chroma and chromadb together give you the local vector store. pypdf is the PDF parsing backend that PyPDFLoader calls into.⁸ sentence-transformers provides a free local embedding model so the tutorial can run without a separate embeddings API key; you can swap in OpenAI or Anthropic embeddings later if you prefer.

Screenshot of the LangChain Retrieval-Augmented Generation tutorial page showing the six-step RAG architecture diagram and the introductory code sample.

LangChain’s official RAG tutorial page at docs.langchain.com/oss/python/langchain/rag — the load-split-embed-store-retrieve-generate pattern this app follows.

Pin those versions verbatim if you want the tutorial code to compile on first try. LangChain ships frequently — the 1.x line splits across langchain, langchain-core, langchain-classic, and langgraph, each with its own version cadence — so loose pins drift fast.⁹

Step 2: Set up the API key

Store the Anthropic API key in a .env file at the project root and load it at startup. Create .env:

ANTHROPIC_API_KEY=sk-ant-api03-...your-key...

Add .env to .gitignore immediately. Then install python-dotenv and pull the key into the environment from your Python entry point:

pip install python-dotenv==1.1.1

You will load this in app.py in a moment. The ChatAnthropic class reads ANTHROPIC_API_KEY from the environment automatically, so once the variable is set there is no further configuration.³

Step 3: Wire the RAG chain

Create app.py and start with the document-ingestion plumbing. This block defines a function that takes a list of uploaded PDF files, splits them into chunks, embeds them with a local sentence-transformers model, and writes the embeddings to a Chroma collection.

import os
import tempfile
from pathlib import Path

import streamlit as st
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

load_dotenv()

PERSIST_DIR = "./chroma_db"
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 150


def build_vectorstore(uploaded_files):
    """Load uploaded PDFs, split into chunks, embed, and persist to Chroma."""
    documents = []
    for uploaded in uploaded_files:
        with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
            tmp.write(uploaded.read())
            tmp_path = tmp.name
        loader = PyPDFLoader(tmp_path)
        documents.extend(loader.load())
        Path(tmp_path).unlink(missing_ok=True)

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=CHUNK_SIZE,
        chunk_overlap=CHUNK_OVERLAP,
    )
    chunks = splitter.split_documents(documents)

    embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=PERSIST_DIR,
    )
    return vectorstore

A few things worth understanding before moving on. PyPDFLoader returns one LangChain Document per PDF page, with the page text in page_content and the source path plus page number in metadata.⁸ The RecursiveCharacterTextSplitter then splits each page into overlapping chunks roughly 1,000 characters long with a 150-character overlap — the LangChain docs recommend this splitter as the default for generic text because it tries paragraph, sentence, and word boundaries in order before falling back to character splits.¹⁰ The chunk size and overlap are knobs to tune later if retrieval quality is off.

HuggingFaceEmbeddings with all-MiniLM-L6-v2 runs the embedding model locally on CPU. It is small (about 22 million parameters), fast on a laptop, and good enough for tutorial-grade retrieval. Production systems often upgrade to a hosted embeddings API for quality, but this keeps the tutorial free to run.

Reference figure 1 from the cited vendor docs

Used for editorial coverage of the documentation referenced in this step.

Step 4: Build the retrieval chain

Below the ingestion function, add the retrieval chain. This is the core of the RAG loop: given a question, find the most relevant chunks, feed them to Claude with a system prompt that instructs the model to answer only from the provided context.

SYSTEM_PROMPT = (
    "You are a helpful assistant that answers questions about the user's "
    "documents. Use the following retrieved context to answer the question. "
    "If the answer is not in the context, say you do not know. Cite the "
    "source filename and page number from the metadata when possible.\n\n"
    "Context:\n{context}"
)


def build_qa_chain(vectorstore):
    """Wire the retriever and Claude into a retrieval chain."""
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4},
    )

    llm = ChatAnthropic(
        model="claude-sonnet-4-6",
        temperature=0,
        max_tokens=1024,
    )

    prompt = ChatPromptTemplate.from_messages([
        ("system", SYSTEM_PROMPT),
        ("human", "{input}"),
    ])

    combine_docs_chain = create_stuff_documents_chain(llm, prompt)
    qa_chain = create_retrieval_chain(retriever, combine_docs_chain)
    return qa_chain

What each call does. vectorstore.as_retriever returns a retriever interface that performs similarity search against the Chroma index; k=4 asks for the top four chunks per query.¹¹ ChatAnthropic is the LangChain wrapper around the Anthropic API — model="claude-sonnet-4-6" selects the current Sonnet release per Anthropic’s models overview.¹² temperature=0 keeps the answer deterministic, which matters for a Q&A app where you want the same question to yield the same answer.

create_stuff_documents_chain is the simplest document-combination strategy — it concatenates retrieved chunks into the system prompt’s {context} slot and asks the model to answer in one pass. For larger contexts you would graduate to map-reduce or refine chains, but stuff fits comfortably inside Sonnet 4.6’s 200K-token window for four 1,000-character chunks.

create_retrieval_chain glues the retriever and the combine-documents chain together. The resulting chain takes a question via the input key and returns a dict with answer, context (the retrieved chunks), and input.

Step 5: Add the Streamlit chat UI

The last block wires the Streamlit interface. Streamlit’s st.chat_message, st.chat_input, and st.file_uploader together give you a working chat app with file upload in under thirty lines.

st.set_page_config(page_title="Document Q&A", page_icon=":books:")
st.title("Document Q&A with Claude")
st.caption("Upload PDFs, ask questions, get answers grounded in your documents.")

if "messages" not in st.session_state:
    st.session_state.messages = []
if "qa_chain" not in st.session_state:
    st.session_state.qa_chain = None

with st.sidebar:
    st.header("1. Upload documents")
    uploaded_files = st.file_uploader(
        "Drop one or more PDFs",
        type=["pdf"],
        accept_multiple_files=True,
    )
    if uploaded_files and st.button("Build knowledge base"):
        with st.spinner("Embedding documents..."):
            vectorstore = build_vectorstore(uploaded_files)
            st.session_state.qa_chain = build_qa_chain(vectorstore)
        st.success(f"Indexed {len(uploaded_files)} file(s). Ask away.")

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

if question := st.chat_input("Ask a question about your documents"):
    if st.session_state.qa_chain is None:
        st.warning("Upload PDFs and build the knowledge base first.")
    else:
        st.session_state.messages.append({"role": "user", "content": question})
        with st.chat_message("user"):
            st.markdown(question)

        with st.chat_message("assistant"):
            with st.spinner("Thinking..."):
                result = st.session_state.qa_chain.invoke({"input": question})
            answer = result["answer"]
            st.markdown(answer)

            with st.expander("Retrieved context"):
                for i, doc in enumerate(result["context"], start=1):
                    source = doc.metadata.get("source", "unknown")
                    page = doc.metadata.get("page", "?")
                    st.markdown(f"**Chunk {i}** — `{source}` (page {page})")
                    st.text(doc.page_content[:500] + "...")

        st.session_state.messages.append({"role": "assistant", "content": answer})

A few Streamlit-specific notes. st.session_state is how Streamlit persists data across reruns — every interaction with the UI re-executes the script top to bottom, so the chain and message list have to live in session state to survive.¹³ The walrus operator on st.chat_input is the idiomatic Streamlit pattern: the input returns None on every rerun where the user has not typed something, so assigning and checking in one expression keeps the conditional clean.

The st.file_uploader widget accepts multiple files and returns them as UploadedFile objects with a .read() method — the ingestion function above writes each to a temp file because PyPDFLoader expects a file path.¹⁴ The with st.expander("Retrieved context") block surfaces the chunks Claude was given, which is the single most useful debug affordance in a RAG app — if the answer is wrong, you want to know whether the retriever missed or the model hallucinated.

Reference figure 2 from the cited vendor docs

Used for editorial coverage of the documentation referenced in this step.

Step 6: Run it

From the project directory:

streamlit run app.py

The browser opens to http://localhost:8501. Drop a couple of PDFs into the sidebar uploader, click Build knowledge base, wait for the spinner to finish, then ask a question in the chat input at the bottom.

The first embedding pass takes a few seconds per PDF — the sentence-transformers model loads on first use, then chunks process at a few hundred per second on a laptop CPU. Subsequent questions are fast: similarity search against four chunks plus a Sonnet 4.6 call typically lands in under three seconds end to end.

Expand the Retrieved context dropdown under each answer to see which chunks were retrieved and which files they came from. This is the loop to use when iterating on chunk size, overlap, or k — bad answers usually come down to bad retrieval, and you can see that directly.

Step 7: Deploy to Streamlit Community Cloud

For sharing the app with non-technical users, the free Streamlit Community Cloud is the path of least friction. The requirements per the deployment docs: a public GitHub repository containing app.py and requirements.txt, plus your Anthropic API key added as a secret via the Community Cloud dashboard.¹⁵

Push the code:

git init
echo ".venv\n.env\nchroma_db/\n__pycache__/" > .gitignore
git add app.py requirements.txt .gitignore
git commit -m "Initial commit: document Q&A app"
git remote add origin git@github.com:YOUR_USERNAME/doc-qa-app.git
git push -u origin main

Generate requirements.txt from the pinned versions in Step 1:

langchain==1.2.18
langchain-anthropic==0.4.1
langchain-chroma==0.2.6
langchain-community==0.4.4
chromadb==1.3.2
streamlit==1.50.0
pypdf==6.1.3
sentence-transformers==3.4.1
python-dotenv==1.1.1

Then sign in at share.streamlit.io, click New app, point it at your repository, and add ANTHROPIC_API_KEY under the app’s Secrets panel. The build takes a couple of minutes the first time; subsequent pushes redeploy in seconds.

A caveat the Streamlit docs surface: the free tier sleeps after roughly a week of inactivity and is rate-limited.¹⁵ For anything you want available reliably, either keep traffic warm or move to a paid host such as Render, Fly.io, or a self-managed container — the code is the same; only the deploy target differs.

What this tutorial didn’t cover

Three categories of production hardening are deliberately out of scope here, and worth flagging so the gap is visible:

Better embeddings. The MiniLM model is fine for a few PDFs of generic text. For domain-specific corpora (medical, legal, code), hosted embeddings from OpenAI, Cohere, or Voyage usually retrieve better. The swap is one import and one constructor call.
Persistent multi-user state. This app stores the vector index on the local filesystem and the chat history in browser-tab session state. A real product needs per-user collections in Chroma (or a hosted vector database such as Pinecone, Weaviate, or Qdrant) and a persistent message store.
Source citations in the answer text. The system prompt asks Claude to cite source filenames and page numbers, but the prompt-engineering needed to make citations reliable across edge cases — multi-chunk answers, missing metadata, contested facts — is its own project. The Retrieved context expander is the honest fallback.

For each of those, the LangChain ecosystem has either a documented pattern or a third-party integration. Treat this app as the load-bearing skeleton, not the production answer.

Troubleshooting

A few failure modes that catch first-time builders.

ANTHROPIC_API_KEY not found at runtime: confirm .env is in the same directory as app.py and that load_dotenv() runs before ChatAnthropic is instantiated.
ChromaDB sqlite errors on certain Linux distributions: install the pysqlite3-binary shim per the Chroma troubleshooting notes; the workaround is documented at the top of docs.trychroma.com.⁴
PDF parsing returns empty text: the document is scanned-image PDF without an OCR layer. PyPDFLoader reads text only; for image PDFs, swap in UnstructuredPDFLoader or pre-process the files with an OCR tool such as Tesseract or ocrmypdf.
Long pauses during the first question: the sentence-transformers model downloads on first use (about 90 MB). The download caches under ~/.cache/huggingface; subsequent runs skip it.

The retrieved-context expander is the single best debugging tool for answer-quality issues. If the answer is wrong, look at the chunks first — most “Claude is hallucinating” reports turn out to be “the retriever sent the wrong chunks.”

Where to go next

The LangChain documentation organises everything beyond the basic RAG loop under three headings worth reading in order:¹

Retrieval quality: hybrid search (BM25 + vector), query rewriting, parent-document retrieval, contextual compression. These are the levers when chunk-size tuning runs out of room.
Agents and tool use: replacing the static create_retrieval_chain with a LangGraph agent that decides when to retrieve, when to ask follow-up questions, and when to call external tools.
Observability: wiring LangSmith for trace inspection so you can see exactly which chunks were retrieved and what the model did with them on every call.

The skeleton in this tutorial — load, split, embed, store, retrieve, generate, render — is the foundation those advanced patterns build on. Get it working end to end first, then add complexity where the data tells you to.

How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.

Sources consulted

Cited Sources

1. LangChain RAG tutorial — the canonical load/split/embed/store/retrieve/generate pattern (accessed 2026-05-20) ↩
2. LangChain Python framework overview — version line and supported Python versions (accessed 2026-05-20) ↩
3. langchain-anthropic provider page — ChatAnthropic class and ANTHROPIC_API_KEY environment variable (accessed 2026-05-20) ↩
4. Chroma Getting Started — in-process persistent client and local-directory storage (accessed 2026-05-20) ↩
5. Streamlit Build Conversational Apps tutorial — chat_message, chat_input, session_state patterns (accessed 2026-05-20) ↩
6. Anthropic Claude API pricing — Sonnet 4.6 input/output token rates (accessed 2026-05-20) ↩
7. langchain-anthropic on PyPI — current release version (accessed 2026-05-20) ↩
8. PyPDFLoader documentation — one Document per page, metadata fields (accessed 2026-05-20) ↩
9. LangChain GitHub releases — multi-package version cadence (accessed 2026-05-20) ↩
10. RecursiveCharacterTextSplitter API — recommended default for generic text (accessed 2026-05-20) ↩
11. LangChain Chroma vector store integration — as_retriever and search_kwargs (accessed 2026-05-20) ↩
12. Anthropic models overview — claude-sonnet-4-6 model identifier (accessed 2026-05-20) ↩
13. Streamlit documentation home — session_state and rerun model (accessed 2026-05-20) ↩
14. Streamlit st.file_uploader API — multi-file upload and UploadedFile interface (accessed 2026-05-20) ↩
15. Streamlit Community Cloud deployment — free tier requirements, secrets, sleep behaviour (accessed 2026-05-20) ↩