Build a Knowledge-Graph Extractor With Claude and NetworkX: End-to-End Python Tutorial (May 2026)
Pipe a Wikipedia paragraph through Claude tool-use to extract subject-predicate-object triples, build a NetworkX DiGraph, and visualise it with PyVis.
Image: Anthropic platform docs — Tool use with Claude overview (platform.claude.com), used for editorial coverage of the API surface discussed below.
TL;DR
This tutorial walks through a working knowledge-graph extractor: a
Python script that feeds a Wikipedia paragraph to Claude, asks it to
emit subject-predicate-object triples through a tool-use call,
loads the triples into a NetworkX DiGraph, and renders an
interactive HTML visualisation with PyVis (plus a static Matplotlib
fallback). The stack is Python 3.11+, the official anthropic
Python SDK 1 , NetworkX 3.6.1 2 ,
and PyVis 0.3.2 3 . Total walk-through time runs
about 40 minutes for a developer comfortable with pip and a
virtual environment.
Per Anthropic’s tool-use overview, defining a tool with a JSON input schema and letting Claude populate it is the recommended path when you need a structured, guaranteed-shape output rather than free-form prose 4 . Triple extraction is the textbook fit: the schema enforces three string fields per row, and the agentic loop returns one well-formed tool call instead of prose the script would have to re-parse with regular expressions.
What you’ll need
- Python 3.11 or newer with
pipon PATH. NetworkX 3.6.1’s package metadata declaresPython !=3.14.1, >=3.112 . - An Anthropic API key from
console.anthropic.com. A new account ships with prepaid trial credit visible on the billing page. - A terminal, an editor, and a few minutes of patience for the first PyVis render.
Step 1: Project scaffolding
Create the project, set up a virtual environment, and install the three dependencies:
mkdir kg-extractor && cd kg-extractor
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install "anthropic>=0.103" "networkx>=3.6" "pyvis>=0.3.2" matplotlib python-dotenv
pip freeze > requirements.txt
The anthropic SDK is the official Python client published by
Anthropic on PyPI 1 . python-dotenv keeps the
API key out of source control by reading a local .env file into
os.environ at startup.
Create .env:
ANTHROPIC_API_KEY=sk-ant-your-key-here
Add .env to .gitignore before the first commit. API keys
committed to public GitHub repositories are scraped by automated
crawlers within minutes.
Image: anthropic on PyPI (pypi.org/project/anthropic), used for editorial coverage of the SDK installation step described above.
Step 2: Define the extraction tool
Create tool_schema.py. The tool is client-executed: Claude emits a
tool_use block with a JSON object matching the schema, the script
reads the triples out, and there is no Anthropic-side execution
involved 5 .
EXTRACT_TRIPLES_TOOL = {
"name": "record_triples",
"description": (
"Record subject-predicate-object triples extracted from the "
"user-supplied paragraph. Each triple captures one factual "
"relationship. Subjects and objects should be named entities "
"or noun phrases. Predicates should be short verb phrases "
"(2 to 4 words). Do not invent facts not stated in the text."
),
"input_schema": {
"type": "object",
"properties": {
"triples": {
"type": "array",
"items": {
"type": "object",
"properties": {
"subject": {"type": "string"},
"predicate": {"type": "string"},
"object": {"type": "string"},
},
"required": ["subject", "predicate", "object"],
},
}
},
"required": ["triples"],
},
}
The schema is a single object with one array field, triples, each
element of which is itself an object with three required string
fields. Per Anthropic’s Define tools page, the description text
is the most important field for tool-selection accuracy: Claude
reads the description to decide when to call the tool and how
to populate its fields 6 .
Step 3: Wire the extraction loop
Create extract.py. The script reads the paragraph, calls
client.messages.create with the tool attached, and pulls the
triples out of the tool_use content block returned in the
response 4 .
import json
import os
import sys
from dotenv import load_dotenv
import anthropic
from tool_schema import EXTRACT_TRIPLES_TOOL
load_dotenv()
client = anthropic.Anthropic()
SYSTEM = (
"You are a careful information-extraction assistant. "
"Given a paragraph, call the record_triples tool once with "
"every distinct factual relationship you can ground in the "
"text. Prefer precision over recall. Do not include opinions, "
"speculation, or facts that are not stated in the paragraph."
)
def extract_triples(paragraph: str) -> list[dict]:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
system=SYSTEM,
tools=[EXTRACT_TRIPLES_TOOL],
tool_choice={"type": "tool", "name": "record_triples"},
messages=[{"role": "user", "content": paragraph}],
)
for block in response.content:
if block.type == "tool_use" and block.name == "record_triples":
return block.input["triples"]
return []
if __name__ == "__main__":
text = sys.stdin.read()
triples = extract_triples(text)
print(json.dumps(triples, indent=2))
Two things to flag. First, the tool_choice parameter set to
{"type": "tool", "name": "record_triples"} forces Claude to call
this exact tool every time, removing the model’s freedom to answer
in prose. Per
Anthropic’s Define tools page, this is the recommended setting
when the application has only one structural shape it can accept.
Second, the agentic loop here is a single round trip: the script
does not need to construct a tool_result block and call the
model again because no downstream tool call follows. The model
emits one structured block; the script reads it and stops.
Image: Anthropic platform docs — How tool use works (platform.claude.com), used for editorial coverage of the loop pattern described above.
Step 4: Test with a Wikipedia paragraph
Pipe a paragraph in via standard input. The opening of the English Wikipedia article on Marie Curie is a useful test case because the text is dense with named entities and explicit relationships:
cat <<'EOF' | python extract.py
Marie Salomea Sklodowska-Curie was a Polish and naturalised-French
physicist and chemist who conducted pioneering research on
radioactivity. She was the first woman to win a Nobel Prize, the
first person to win a Nobel Prize twice, and the only person to win
a Nobel Prize in two scientific fields. Her husband, Pierre Curie,
was a co-winner of her first Nobel Prize, making them the
first-ever married couple to win the Nobel Prize.
EOF
You should see something close to the following on standard output. The exact wording of predicates will vary across runs because Claude is sampling tokens, but the structure of the response stays stable because the tool schema constrains it:
[
{"subject": "Marie Curie", "predicate": "was", "object": "physicist and chemist"},
{"subject": "Marie Curie", "predicate": "conducted research on", "object": "radioactivity"},
{"subject": "Marie Curie", "predicate": "was first woman to win", "object": "Nobel Prize"},
{"subject": "Marie Curie", "predicate": "was married to", "object": "Pierre Curie"},
{"subject": "Pierre Curie", "predicate": "co-won", "object": "Nobel Prize"}
]
If the array comes back empty, check three things: that
ANTHROPIC_API_KEY is exported in the shell that ran
python extract.py, that the paragraph contains at least one
explicit relationship, and that the system prompt has not been
softened to the point where Claude reads the input as opinion
rather than fact.
Step 5: Build the NetworkX DiGraph
Create graph.py. NetworkX’s DiGraph class represents a directed
graph; add_edge(u, v, **attr) adds an edge from u to v,
auto-creating nodes that do not exist yet, and accepts arbitrary
keyword attributes that attach to the edge 7 :
import json
import sys
import networkx as nx
def build_graph(triples: list[dict]) -> nx.DiGraph:
graph = nx.DiGraph()
for row in triples:
subject = row["subject"]
predicate = row["predicate"]
obj = row["object"]
graph.add_edge(subject, obj, label=predicate)
return graph
if __name__ == "__main__":
triples = json.load(sys.stdin)
g = build_graph(triples)
print(f"Nodes: {g.number_of_nodes()}")
print(f"Edges: {g.number_of_edges()}")
for u, v, data in g.edges(data=True):
print(f" {u} --[{data['label']}]--> {v}")
Storing the predicate as an edge_attribute rather than a node
means the graph is genuinely directed: subject points at object,
the verb labels the arrow. If the same subject and object recur
with a different predicate, NetworkX’s default DiGraph overwrites
the earlier edge attributes; switch to MultiDiGraph if every
relationship needs to survive as its own edge.
Pipe the two scripts together to see the structure:
cat marie_curie.txt | python extract.py | python graph.py
Step 6: Visualise interactively with PyVis
Create visualise.py. PyVis’s Network.from_nx(nx_graph) ingests
a NetworkX graph in place and lets the resulting HTML be opened in
any browser 8 . Network.show(filename) writes
the page to disk.
import json
import sys
import networkx as nx
from pyvis.network import Network
from graph import build_graph
def render(graph: nx.DiGraph, output_path: str) -> None:
net = Network(
height="700px",
width="100%",
directed=True,
notebook=False,
cdn_resources="remote",
)
net.from_nx(graph)
for edge in net.edges:
edge["arrows"] = "to"
edge["title"] = edge.get("label", "")
net.show(output_path, notebook=False)
if __name__ == "__main__":
triples = json.load(sys.stdin)
graph = build_graph(triples)
render(graph, "knowledge_graph.html")
print("Wrote knowledge_graph.html")
Run it end-to-end:
cat marie_curie.txt | python extract.py | python visualise.py
open knowledge_graph.html # Linux: xdg-open; Windows: start
cdn_resources="remote" is the setting that keeps the rendered
HTML lightweight: PyVis loads the underlying vis-network JavaScript
library from a CDN rather than inlining it. For airgapped
deployments, switch to "local" and ship the bundled JS alongside
the HTML.
Image: pyvis on PyPI (pypi.org/project/pyvis), used for editorial coverage of the visualisation library installed above.
Step 7: Matplotlib fallback for static export
PyVis is overkill when the output target is a paper, a README, or a Slack screenshot. Matplotlib + NetworkX’s drawing helpers cover the static case in under fifteen lines:
import json
import sys
import matplotlib.pyplot as plt
import networkx as nx
from graph import build_graph
def render_static(graph: nx.DiGraph, output_path: str) -> None:
pos = nx.spring_layout(graph, seed=42)
plt.figure(figsize=(12, 8))
nx.draw_networkx_nodes(graph, pos, node_color="#cfe2ff", node_size=1800)
nx.draw_networkx_labels(graph, pos, font_size=9)
nx.draw_networkx_edges(graph, pos, arrows=True, arrowsize=18)
edge_labels = nx.get_edge_attributes(graph, "label")
nx.draw_networkx_edge_labels(graph, pos, edge_labels=edge_labels, font_size=7)
plt.axis("off")
plt.tight_layout()
plt.savefig(output_path, dpi=200, bbox_inches="tight")
if __name__ == "__main__":
triples = json.load(sys.stdin)
graph = build_graph(triples)
render_static(graph, "knowledge_graph.png")
print("Wrote knowledge_graph.png")
spring_layout uses a force-directed algorithm; passing a fixed
seed keeps the layout reproducible across runs. For larger
graphs (say, fifty or more triples), nx.kamada_kawai_layout often
reads more cleanly than spring_layout.
Image: networkx on PyPI (pypi.org/project/networkx), used for editorial coverage of the graph library installed above.
Choosing PyVis vs Matplotlib
| Axis | PyVis (interactive HTML) | Matplotlib (static PNG/PDF) |
|---|---|---|
| Output target | Browser, web embed, dashboard | Paper, README, Slack, deck |
| Pan/zoom/drag | Yes, via vis-network in the browser | No |
| Best for graph size | Up to roughly 500 nodes before the browser slows | Up to roughly 100 nodes before label clutter wins |
| Reproducibility | Layout shuffles on each load by default | seed parameter pins the layout |
| Dependency footprint | Adds vis-network JS via CDN | Pure Python install |
For an analyst exploring the graph, PyVis is the right surface because hover-tooltips reveal the predicate without cluttering the canvas. For a printed figure, Matplotlib is the right surface because the layout is frozen and the file is a single image.
Where this script falls short
Three honest limitations to flag before any reader ships this into production:
- Single-document scope. The script extracts triples from one paragraph at a time. Building a corpus-scale graph means batching inputs, deduplicating entities (Marie Curie, Madame Curie, M. Curie should collapse to one node), and persisting the graph to a backing store such as Neo4j or a SQLite table. Entity resolution is the load-bearing problem that most production knowledge-graph projects underestimate.
- No coreference resolution. Claude often emits “she” or
“the husband” as a subject when the paragraph uses a pronoun.
Either pre-process the paragraph through a coreference resolver
(spaCy’s
coreferee, AllenNLP’scoref-spanbert) or post-process the triples to resolve pronouns against the most recent named entity. - Hallucination risk on long inputs. The tool description and system prompt instruct Claude to ground triples in the text, but longer paragraphs increase the risk of inferred-not-stated relationships slipping through. For high-stakes extraction, add a second pass that scores each triple against the source text with a verification prompt, and drop low-confidence rows.
Next steps
- Swap the model. The script works unchanged with Sonnet or Haiku tier models; the tool schema is model-agnostic. Cost- sensitive workloads typically start on Haiku and escalate to Opus only on paragraphs the smaller model fails on.
- Add a typed-entity schema. Replace the string
subjectandobjectfields with{name: string, type: "person" | "place" | "organisation" | "event"}and let Claude do entity-type classification in the same tool call. - Persist to Neo4j. Replace the NetworkX in-memory graph with
a Neo4j driver session and
MERGEstatements keyed on entity name; the rest of the pipeline stays the same.
How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.
Sources consulted
Cited Sources
- 1. anthropic on PyPI — release 0.103.1 dated 19 May 2026. (accessed ) ↩
- 2. NetworkX on PyPI — release 3.6.1 dated 8 December 2025, package metadata declares Python !=3.14.1, >=3.11. (accessed ) ↩
- 3. PyVis on PyPI — release 0.3.2 dated 24 February 2023. (accessed ) ↩
- 4. Anthropic — Tool use with Claude (overview). (accessed ) ↩
- 5. Anthropic — How tool use works. (accessed ) ↩
- 6. Anthropic — Define tools. (accessed ) ↩
- 7. NetworkX stable docs — DiGraph.add_edge method reference. (accessed ) ↩
- 8. PyVis tutorial — Network.from_nx and Network.show. (accessed ) ↩
Further Reading
- NetworkX — DiGraph reference (accessed )
- PyVis — pyvis.network module reference (accessed )
- Matplotlib — Pyplot tutorial (accessed )
- Wikipedia — Marie Curie (accessed )
Anonymous · no cookies set