Build a Document Q&A App with LangChain + Claude: An End-to-End Streamlit Tutorial
Wire LangChain, langchain-anthropic, and ChromaDB into a Streamlit chat UI that answers questions over uploaded PDFs, with retrieval evaluation included.
Image: LangChain Python framework overview, used for editorial coverage of the framework taught in this tutorial.
What you’ll build
A working document question-answering app that ingests a small library of PDFs, embeds them into a local Chroma vector store, retrieves the most relevant passages for any user question, and asks Claude Sonnet 4.6 to answer using only that retrieved context. The whole thing runs behind a Streamlit chat interface with file upload and conversation history.
By the end of this walkthrough you will have one Python file (app.py) plus a requirements.txt, running locally at http://localhost:8501, capable of being deployed for free on Streamlit Community Cloud. The pattern is the standard retrieval-augmented generation (RAG) loop documented in the LangChain RAG tutorial: load, split, embed, store, retrieve, generate. 1
Three pieces do the heavy lifting. LangChain orchestrates the chain and provides the document loaders, splitters, and retriever interface. 2 The langchain-anthropic package wraps Claude with a ChatAnthropic class that drops into any LangChain chain. 3 Chroma stores the embeddings and answers similarity queries — it runs in-process, persists to a local directory, and needs no separate server. 4 Streamlit gives you the UI primitives — chat messages, file uploader, session state — without writing a line of HTML. 5
What you’ll need
- Python 3.10 or later. The LangChain 1.x line targets 3.10 and above. 2
- An Anthropic API key. Sign in at
console.anthropic.com, create a key, and budget for a few cents of Sonnet 4.6 tokens to follow the tutorial end to end. Per Anthropic’s pricing page, Sonnet 4.6 lists at $3 per million input tokens and $15 per million output tokens. 6 - A handful of PDFs to test against. A product manual, a research paper, a contract — anything you have lying around. Three to five files is enough for the walkthrough.
- Roughly 45 minutes if you copy-paste the code, longer if you read every line.
The completed file sits at around 120 lines. No background in vector databases or embeddings is assumed; the relevant terms are defined inline.
Step 1: Install the packages
Create a fresh project directory and a virtual environment, then install the dependencies:
mkdir doc-qa-app && cd doc-qa-app
python -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
pip install \
langchain==1.2.18 \
langchain-anthropic==0.4.1 \
langchain-chroma==0.2.6 \
langchain-community==0.4.4 \
chromadb==1.3.2 \
streamlit==1.50.0 \
pypdf==6.1.3 \
sentence-transformers==3.4.1
A note on what each package does. The langchain and langchain-community packages provide the chain primitives and the PDF loader. langchain-anthropic adds the ChatAnthropic class — version 0.4.1 on PyPI as of May 2026 supports Claude Sonnet 4.6. 7 langchain-chroma and chromadb together give you the local vector store. pypdf is the PDF parsing backend that PyPDFLoader calls into. 8 sentence-transformers provides a free local embedding model so the tutorial can run without a separate embeddings API key; you can swap in OpenAI or Anthropic embeddings later if you prefer.
LangChain’s official RAG tutorial page at docs.langchain.com/oss/python/langchain/rag — the load-split-embed-store-retrieve-generate pattern this app follows.
Pin those versions verbatim if you want the tutorial code to compile on first try. LangChain ships frequently — the 1.x line splits across langchain, langchain-core, langchain-classic, and langgraph, each with its own version cadence — so loose pins drift fast. 9
Step 2: Set up the API key
Store the Anthropic API key in a .env file at the project root and load it at startup. Create .env:
ANTHROPIC_API_KEY=sk-ant-api03-...your-key...
Add .env to .gitignore immediately. Then install python-dotenv and pull the key into the environment from your Python entry point:
pip install python-dotenv==1.1.1
You will load this in app.py in a moment. The ChatAnthropic class reads ANTHROPIC_API_KEY from the environment automatically, so once the variable is set there is no further configuration. 3
Step 3: Wire the RAG chain
Create app.py and start with the document-ingestion plumbing. This block defines a function that takes a list of uploaded PDF files, splits them into chunks, embeds them with a local sentence-transformers model, and writes the embeddings to a Chroma collection.
import os
import tempfile
from pathlib import Path
import streamlit as st
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
load_dotenv()
PERSIST_DIR = "./chroma_db"
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 150
def build_vectorstore(uploaded_files):
"""Load uploaded PDFs, split into chunks, embed, and persist to Chroma."""
documents = []
for uploaded in uploaded_files:
with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
tmp.write(uploaded.read())
tmp_path = tmp.name
loader = PyPDFLoader(tmp_path)
documents.extend(loader.load())
Path(tmp_path).unlink(missing_ok=True)
splitter = RecursiveCharacterTextSplitter(
chunk_size=CHUNK_SIZE,
chunk_overlap=CHUNK_OVERLAP,
)
chunks = splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=PERSIST_DIR,
)
return vectorstore
A few things worth understanding before moving on. PyPDFLoader returns one LangChain Document per PDF page, with the page text in page_content and the source path plus page number in metadata. 8 The RecursiveCharacterTextSplitter then splits each page into overlapping chunks roughly 1,000 characters long with a 150-character overlap — the LangChain docs recommend this splitter as the default for generic text because it tries paragraph, sentence, and word boundaries in order before falling back to character splits. 10 The chunk size and overlap are knobs to tune later if retrieval quality is off.
HuggingFaceEmbeddings with all-MiniLM-L6-v2 runs the embedding model locally on CPU. It is small (about 22 million parameters), fast on a laptop, and good enough for tutorial-grade retrieval. Production systems often upgrade to a hosted embeddings API for quality, but this keeps the tutorial free to run.
Used for editorial coverage of the documentation referenced in this step.
Step 4: Build the retrieval chain
Below the ingestion function, add the retrieval chain. This is the core of the RAG loop: given a question, find the most relevant chunks, feed them to Claude with a system prompt that instructs the model to answer only from the provided context.
SYSTEM_PROMPT = (
"You are a helpful assistant that answers questions about the user's "
"documents. Use the following retrieved context to answer the question. "
"If the answer is not in the context, say you do not know. Cite the "
"source filename and page number from the metadata when possible.\n\n"
"Context:\n{context}"
)
def build_qa_chain(vectorstore):
"""Wire the retriever and Claude into a retrieval chain."""
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4},
)
llm = ChatAnthropic(
model="claude-sonnet-4-6",
temperature=0,
max_tokens=1024,
)
prompt = ChatPromptTemplate.from_messages([
("system", SYSTEM_PROMPT),
("human", "{input}"),
])
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(retriever, combine_docs_chain)
return qa_chain
What each call does. vectorstore.as_retriever returns a retriever interface that performs similarity search against the Chroma index; k=4 asks for the top four chunks per query. 11 ChatAnthropic is the LangChain wrapper around the Anthropic API — model="claude-sonnet-4-6" selects the current Sonnet release per Anthropic’s models overview. 12 temperature=0 keeps the answer deterministic, which matters for a Q&A app where you want the same question to yield the same answer.
create_stuff_documents_chain is the simplest document-combination strategy — it concatenates retrieved chunks into the system prompt’s {context} slot and asks the model to answer in one pass. For larger contexts you would graduate to map-reduce or refine chains, but stuff fits comfortably inside Sonnet 4.6’s 200K-token window for four 1,000-character chunks.
create_retrieval_chain glues the retriever and the combine-documents chain together. The resulting chain takes a question via the input key and returns a dict with answer, context (the retrieved chunks), and input.
Step 5: Add the Streamlit chat UI
The last block wires the Streamlit interface. Streamlit’s st.chat_message, st.chat_input, and st.file_uploader together give you a working chat app with file upload in under thirty lines.
st.set_page_config(page_title="Document Q&A", page_icon=":books:")
st.title("Document Q&A with Claude")
st.caption("Upload PDFs, ask questions, get answers grounded in your documents.")
if "messages" not in st.session_state:
st.session_state.messages = []
if "qa_chain" not in st.session_state:
st.session_state.qa_chain = None
with st.sidebar:
st.header("1. Upload documents")
uploaded_files = st.file_uploader(
"Drop one or more PDFs",
type=["pdf"],
accept_multiple_files=True,
)
if uploaded_files and st.button("Build knowledge base"):
with st.spinner("Embedding documents..."):
vectorstore = build_vectorstore(uploaded_files)
st.session_state.qa_chain = build_qa_chain(vectorstore)
st.success(f"Indexed {len(uploaded_files)} file(s). Ask away.")
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if question := st.chat_input("Ask a question about your documents"):
if st.session_state.qa_chain is None:
st.warning("Upload PDFs and build the knowledge base first.")
else:
st.session_state.messages.append({"role": "user", "content": question})
with st.chat_message("user"):
st.markdown(question)
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
result = st.session_state.qa_chain.invoke({"input": question})
answer = result["answer"]
st.markdown(answer)
with st.expander("Retrieved context"):
for i, doc in enumerate(result["context"], start=1):
source = doc.metadata.get("source", "unknown")
page = doc.metadata.get("page", "?")
st.markdown(f"**Chunk {i}** — `{source}` (page {page})")
st.text(doc.page_content[:500] + "...")
st.session_state.messages.append({"role": "assistant", "content": answer})
A few Streamlit-specific notes. st.session_state is how Streamlit persists data across reruns — every interaction with the UI re-executes the script top to bottom, so the chain and message list have to live in session state to survive. 13 The walrus operator on st.chat_input is the idiomatic Streamlit pattern: the input returns None on every rerun where the user has not typed something, so assigning and checking in one expression keeps the conditional clean.
The st.file_uploader widget accepts multiple files and returns them as UploadedFile objects with a .read() method — the ingestion function above writes each to a temp file because PyPDFLoader expects a file path. 14 The with st.expander("Retrieved context") block surfaces the chunks Claude was given, which is the single most useful debug affordance in a RAG app — if the answer is wrong, you want to know whether the retriever missed or the model hallucinated.
Used for editorial coverage of the documentation referenced in this step.
Step 6: Run it
From the project directory:
streamlit run app.py
The browser opens to http://localhost:8501. Drop a couple of PDFs into the sidebar uploader, click Build knowledge base, wait for the spinner to finish, then ask a question in the chat input at the bottom.
The first embedding pass takes a few seconds per PDF — the sentence-transformers model loads on first use, then chunks process at a few hundred per second on a laptop CPU. Subsequent questions are fast: similarity search against four chunks plus a Sonnet 4.6 call typically lands in under three seconds end to end.
Expand the Retrieved context dropdown under each answer to see which chunks were retrieved and which files they came from. This is the loop to use when iterating on chunk size, overlap, or k — bad answers usually come down to bad retrieval, and you can see that directly.
Step 7: Deploy to Streamlit Community Cloud
For sharing the app with non-technical users, the free Streamlit Community Cloud is the path of least friction. The requirements per the deployment docs: a public GitHub repository containing app.py and requirements.txt, plus your Anthropic API key added as a secret via the Community Cloud dashboard. 15
Push the code:
git init
echo ".venv\n.env\nchroma_db/\n__pycache__/" > .gitignore
git add app.py requirements.txt .gitignore
git commit -m "Initial commit: document Q&A app"
git remote add origin git@github.com:YOUR_USERNAME/doc-qa-app.git
git push -u origin main
Generate requirements.txt from the pinned versions in Step 1:
langchain==1.2.18
langchain-anthropic==0.4.1
langchain-chroma==0.2.6
langchain-community==0.4.4
chromadb==1.3.2
streamlit==1.50.0
pypdf==6.1.3
sentence-transformers==3.4.1
python-dotenv==1.1.1
Then sign in at share.streamlit.io, click New app, point it at your repository, and add ANTHROPIC_API_KEY under the app’s Secrets panel. The build takes a couple of minutes the first time; subsequent pushes redeploy in seconds.
A caveat the Streamlit docs surface: the free tier sleeps after roughly a week of inactivity and is rate-limited. 15 For anything you want available reliably, either keep traffic warm or move to a paid host such as Render, Fly.io, or a self-managed container — the code is the same; only the deploy target differs.
What this tutorial didn’t cover
Three categories of production hardening are deliberately out of scope here, and worth flagging so the gap is visible:
- Better embeddings. The MiniLM model is fine for a few PDFs of generic text. For domain-specific corpora (medical, legal, code), hosted embeddings from OpenAI, Cohere, or Voyage usually retrieve better. The swap is one import and one constructor call.
- Persistent multi-user state. This app stores the vector index on the local filesystem and the chat history in browser-tab session state. A real product needs per-user collections in Chroma (or a hosted vector database such as Pinecone, Weaviate, or Qdrant) and a persistent message store.
- Source citations in the answer text. The system prompt asks Claude to cite source filenames and page numbers, but the prompt-engineering needed to make citations reliable across edge cases — multi-chunk answers, missing metadata, contested facts — is its own project. The Retrieved context expander is the honest fallback.
For each of those, the LangChain ecosystem has either a documented pattern or a third-party integration. Treat this app as the load-bearing skeleton, not the production answer.
Troubleshooting
A few failure modes that catch first-time builders.
ANTHROPIC_API_KEYnot found at runtime: confirm.envis in the same directory asapp.pyand thatload_dotenv()runs beforeChatAnthropicis instantiated.- ChromaDB sqlite errors on certain Linux distributions: install the
pysqlite3-binaryshim per the Chroma troubleshooting notes; the workaround is documented at the top ofdocs.trychroma.com. 4 - PDF parsing returns empty text: the document is scanned-image PDF without an OCR layer.
PyPDFLoaderreads text only; for image PDFs, swap inUnstructuredPDFLoaderor pre-process the files with an OCR tool such as Tesseract orocrmypdf. - Long pauses during the first question: the sentence-transformers model downloads on first use (about 90 MB). The download caches under
~/.cache/huggingface; subsequent runs skip it.
The retrieved-context expander is the single best debugging tool for answer-quality issues. If the answer is wrong, look at the chunks first — most “Claude is hallucinating” reports turn out to be “the retriever sent the wrong chunks.”
Where to go next
The LangChain documentation organises everything beyond the basic RAG loop under three headings worth reading in order: 1
- Retrieval quality: hybrid search (BM25 + vector), query rewriting, parent-document retrieval, contextual compression. These are the levers when chunk-size tuning runs out of room.
- Agents and tool use: replacing the static
create_retrieval_chainwith a LangGraph agent that decides when to retrieve, when to ask follow-up questions, and when to call external tools. - Observability: wiring LangSmith for trace inspection so you can see exactly which chunks were retrieved and what the model did with them on every call.
The skeleton in this tutorial — load, split, embed, store, retrieve, generate, render — is the foundation those advanced patterns build on. Get it working end to end first, then add complexity where the data tells you to.
How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.
Sources consulted
Cited Sources
- 1. LangChain RAG tutorial — the canonical load/split/embed/store/retrieve/generate pattern (accessed ) ↩
- 2. LangChain Python framework overview — version line and supported Python versions (accessed ) ↩
- 3. langchain-anthropic provider page — ChatAnthropic class and ANTHROPIC_API_KEY environment variable (accessed ) ↩
- 4. Chroma Getting Started — in-process persistent client and local-directory storage (accessed ) ↩
- 5. Streamlit Build Conversational Apps tutorial — chat_message, chat_input, session_state patterns (accessed ) ↩
- 6. Anthropic Claude API pricing — Sonnet 4.6 input/output token rates (accessed ) ↩
- 7. langchain-anthropic on PyPI — current release version (accessed ) ↩
- 8. PyPDFLoader documentation — one Document per page, metadata fields (accessed ) ↩
- 9. LangChain GitHub releases — multi-package version cadence (accessed ) ↩
- 10. RecursiveCharacterTextSplitter API — recommended default for generic text (accessed ) ↩
- 11. LangChain Chroma vector store integration — as_retriever and search_kwargs (accessed ) ↩
- 12. Anthropic models overview — claude-sonnet-4-6 model identifier (accessed ) ↩
- 13. Streamlit documentation home — session_state and rerun model (accessed ) ↩
- 14. Streamlit st.file_uploader API — multi-file upload and UploadedFile interface (accessed ) ↩
- 15. Streamlit Community Cloud deployment — free tier requirements, secrets, sleep behaviour (accessed ) ↩
Further Reading
- Anthropic — Claude Sonnet 4.6 launch announcement (accessed )
- Chroma — Python client API reference (accessed )
Anonymous · no cookies set