Neural Tech Daily
ai-tutorials

Build a Knowledge-Graph Extractor With Claude and NetworkX: End-to-End Python Tutorial (May 2026)

Pipe a Wikipedia paragraph through Claude tool-use to extract subject-predicate-object triples, build a NetworkX DiGraph, and visualise it with PyVis.

~10 min read
Share
Anthropic platform docs page titled Tool use with Claude showing the Python SDK example invoking client.messages.create with the claude-opus-4-7 model and a tools array

Image: Anthropic platform docs — Tool use with Claude overview (platform.claude.com), used for editorial coverage of the API surface discussed below.

TL;DR

This tutorial walks through a working knowledge-graph extractor: a Python script that feeds a Wikipedia paragraph to Claude, asks it to emit subject-predicate-object triples through a tool-use call, loads the triples into a NetworkX DiGraph, and renders an interactive HTML visualisation with PyVis (plus a static Matplotlib fallback). The stack is Python 3.11+, the official anthropic Python SDK 1 , NetworkX 3.6.1 2 , and PyVis 0.3.2 3 . Total walk-through time runs about 40 minutes for a developer comfortable with pip and a virtual environment.

Per Anthropic’s tool-use overview, defining a tool with a JSON input schema and letting Claude populate it is the recommended path when you need a structured, guaranteed-shape output rather than free-form prose 4 . Triple extraction is the textbook fit: the schema enforces three string fields per row, and the agentic loop returns one well-formed tool call instead of prose the script would have to re-parse with regular expressions.

What you’ll need

  • Python 3.11 or newer with pip on PATH. NetworkX 3.6.1’s package metadata declares Python !=3.14.1, >=3.11 2 .
  • An Anthropic API key from console.anthropic.com. A new account ships with prepaid trial credit visible on the billing page.
  • A terminal, an editor, and a few minutes of patience for the first PyVis render.

Step 1: Project scaffolding

Create the project, set up a virtual environment, and install the three dependencies:

mkdir kg-extractor && cd kg-extractor
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install "anthropic>=0.103" "networkx>=3.6" "pyvis>=0.3.2" matplotlib python-dotenv
pip freeze > requirements.txt

The anthropic SDK is the official Python client published by Anthropic on PyPI 1 . python-dotenv keeps the API key out of source control by reading a local .env file into os.environ at startup.

Create .env:

ANTHROPIC_API_KEY=sk-ant-your-key-here

Add .env to .gitignore before the first commit. API keys committed to public GitHub repositories are scraped by automated crawlers within minutes.

PyPI package summary card for the anthropic Python SDK showing the project name and the Python Package Index branding used as the social-share preview image

Image: anthropic on PyPI (pypi.org/project/anthropic), used for editorial coverage of the SDK installation step described above.

Step 2: Define the extraction tool

Create tool_schema.py. The tool is client-executed: Claude emits a tool_use block with a JSON object matching the schema, the script reads the triples out, and there is no Anthropic-side execution involved 5 .

EXTRACT_TRIPLES_TOOL = {
    "name": "record_triples",
    "description": (
        "Record subject-predicate-object triples extracted from the "
        "user-supplied paragraph. Each triple captures one factual "
        "relationship. Subjects and objects should be named entities "
        "or noun phrases. Predicates should be short verb phrases "
        "(2 to 4 words). Do not invent facts not stated in the text."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "triples": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "subject": {"type": "string"},
                        "predicate": {"type": "string"},
                        "object": {"type": "string"},
                    },
                    "required": ["subject", "predicate", "object"],
                },
            }
        },
        "required": ["triples"],
    },
}

The schema is a single object with one array field, triples, each element of which is itself an object with three required string fields. Per Anthropic’s Define tools page, the description text is the most important field for tool-selection accuracy: Claude reads the description to decide when to call the tool and how to populate its fields 6 .

Step 3: Wire the extraction loop

Create extract.py. The script reads the paragraph, calls client.messages.create with the tool attached, and pulls the triples out of the tool_use content block returned in the response 4 .

import json
import os
import sys

from dotenv import load_dotenv
import anthropic

from tool_schema import EXTRACT_TRIPLES_TOOL

load_dotenv()
client = anthropic.Anthropic()

SYSTEM = (
    "You are a careful information-extraction assistant. "
    "Given a paragraph, call the record_triples tool once with "
    "every distinct factual relationship you can ground in the "
    "text. Prefer precision over recall. Do not include opinions, "
    "speculation, or facts that are not stated in the paragraph."
)


def extract_triples(paragraph: str) -> list[dict]:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        system=SYSTEM,
        tools=[EXTRACT_TRIPLES_TOOL],
        tool_choice={"type": "tool", "name": "record_triples"},
        messages=[{"role": "user", "content": paragraph}],
    )
    for block in response.content:
        if block.type == "tool_use" and block.name == "record_triples":
            return block.input["triples"]
    return []


if __name__ == "__main__":
    text = sys.stdin.read()
    triples = extract_triples(text)
    print(json.dumps(triples, indent=2))

Two things to flag. First, the tool_choice parameter set to {"type": "tool", "name": "record_triples"} forces Claude to call this exact tool every time, removing the model’s freedom to answer in prose. Per Anthropic’s Define tools page, this is the recommended setting when the application has only one structural shape it can accept. Second, the agentic loop here is a single round trip: the script does not need to construct a tool_result block and call the model again because no downstream tool call follows. The model emits one structured block; the script reads it and stops.

Anthropic platform docs page How tool use works showing the agentic loop diagram and the stop_reason tool_use contract between the model and the client application

Image: Anthropic platform docs — How tool use works (platform.claude.com), used for editorial coverage of the loop pattern described above.

Step 4: Test with a Wikipedia paragraph

Pipe a paragraph in via standard input. The opening of the English Wikipedia article on Marie Curie is a useful test case because the text is dense with named entities and explicit relationships:

cat <<'EOF' | python extract.py
Marie Salomea Sklodowska-Curie was a Polish and naturalised-French
physicist and chemist who conducted pioneering research on
radioactivity. She was the first woman to win a Nobel Prize, the
first person to win a Nobel Prize twice, and the only person to win
a Nobel Prize in two scientific fields. Her husband, Pierre Curie,
was a co-winner of her first Nobel Prize, making them the
first-ever married couple to win the Nobel Prize.
EOF

You should see something close to the following on standard output. The exact wording of predicates will vary across runs because Claude is sampling tokens, but the structure of the response stays stable because the tool schema constrains it:

[
  {"subject": "Marie Curie", "predicate": "was", "object": "physicist and chemist"},
  {"subject": "Marie Curie", "predicate": "conducted research on", "object": "radioactivity"},
  {"subject": "Marie Curie", "predicate": "was first woman to win", "object": "Nobel Prize"},
  {"subject": "Marie Curie", "predicate": "was married to", "object": "Pierre Curie"},
  {"subject": "Pierre Curie", "predicate": "co-won", "object": "Nobel Prize"}
]

If the array comes back empty, check three things: that ANTHROPIC_API_KEY is exported in the shell that ran python extract.py, that the paragraph contains at least one explicit relationship, and that the system prompt has not been softened to the point where Claude reads the input as opinion rather than fact.

Step 5: Build the NetworkX DiGraph

Create graph.py. NetworkX’s DiGraph class represents a directed graph; add_edge(u, v, **attr) adds an edge from u to v, auto-creating nodes that do not exist yet, and accepts arbitrary keyword attributes that attach to the edge 7 :

import json
import sys

import networkx as nx


def build_graph(triples: list[dict]) -> nx.DiGraph:
    graph = nx.DiGraph()
    for row in triples:
        subject = row["subject"]
        predicate = row["predicate"]
        obj = row["object"]
        graph.add_edge(subject, obj, label=predicate)
    return graph


if __name__ == "__main__":
    triples = json.load(sys.stdin)
    g = build_graph(triples)
    print(f"Nodes: {g.number_of_nodes()}")
    print(f"Edges: {g.number_of_edges()}")
    for u, v, data in g.edges(data=True):
        print(f"  {u} --[{data['label']}]--> {v}")

Storing the predicate as an edge_attribute rather than a node means the graph is genuinely directed: subject points at object, the verb labels the arrow. If the same subject and object recur with a different predicate, NetworkX’s default DiGraph overwrites the earlier edge attributes; switch to MultiDiGraph if every relationship needs to survive as its own edge.

Pipe the two scripts together to see the structure:

cat marie_curie.txt | python extract.py | python graph.py

Step 6: Visualise interactively with PyVis

Create visualise.py. PyVis’s Network.from_nx(nx_graph) ingests a NetworkX graph in place and lets the resulting HTML be opened in any browser 8 . Network.show(filename) writes the page to disk.

import json
import sys

import networkx as nx
from pyvis.network import Network

from graph import build_graph


def render(graph: nx.DiGraph, output_path: str) -> None:
    net = Network(
        height="700px",
        width="100%",
        directed=True,
        notebook=False,
        cdn_resources="remote",
    )
    net.from_nx(graph)
    for edge in net.edges:
        edge["arrows"] = "to"
        edge["title"] = edge.get("label", "")
    net.show(output_path, notebook=False)


if __name__ == "__main__":
    triples = json.load(sys.stdin)
    graph = build_graph(triples)
    render(graph, "knowledge_graph.html")
    print("Wrote knowledge_graph.html")

Run it end-to-end:

cat marie_curie.txt | python extract.py | python visualise.py
open knowledge_graph.html   # Linux: xdg-open; Windows: start

cdn_resources="remote" is the setting that keeps the rendered HTML lightweight: PyVis loads the underlying vis-network JavaScript library from a CDN rather than inlining it. For airgapped deployments, switch to "local" and ship the bundled JS alongside the HTML.

PyPI package summary card for pyvis showing the project name, the Python Package Index branding, and the social-share preview image used by package directories

Image: pyvis on PyPI (pypi.org/project/pyvis), used for editorial coverage of the visualisation library installed above.

Step 7: Matplotlib fallback for static export

PyVis is overkill when the output target is a paper, a README, or a Slack screenshot. Matplotlib + NetworkX’s drawing helpers cover the static case in under fifteen lines:

import json
import sys

import matplotlib.pyplot as plt
import networkx as nx

from graph import build_graph


def render_static(graph: nx.DiGraph, output_path: str) -> None:
    pos = nx.spring_layout(graph, seed=42)
    plt.figure(figsize=(12, 8))
    nx.draw_networkx_nodes(graph, pos, node_color="#cfe2ff", node_size=1800)
    nx.draw_networkx_labels(graph, pos, font_size=9)
    nx.draw_networkx_edges(graph, pos, arrows=True, arrowsize=18)
    edge_labels = nx.get_edge_attributes(graph, "label")
    nx.draw_networkx_edge_labels(graph, pos, edge_labels=edge_labels, font_size=7)
    plt.axis("off")
    plt.tight_layout()
    plt.savefig(output_path, dpi=200, bbox_inches="tight")


if __name__ == "__main__":
    triples = json.load(sys.stdin)
    graph = build_graph(triples)
    render_static(graph, "knowledge_graph.png")
    print("Wrote knowledge_graph.png")

spring_layout uses a force-directed algorithm; passing a fixed seed keeps the layout reproducible across runs. For larger graphs (say, fifty or more triples), nx.kamada_kawai_layout often reads more cleanly than spring_layout.

PyPI package summary card for networkx showing the project name, the Python Package Index branding, and the social-share preview image used by package directories

Image: networkx on PyPI (pypi.org/project/networkx), used for editorial coverage of the graph library installed above.

Choosing PyVis vs Matplotlib

AxisPyVis (interactive HTML)Matplotlib (static PNG/PDF)
Output targetBrowser, web embed, dashboardPaper, README, Slack, deck
Pan/zoom/dragYes, via vis-network in the browserNo
Best for graph sizeUp to roughly 500 nodes before the browser slowsUp to roughly 100 nodes before label clutter wins
ReproducibilityLayout shuffles on each load by defaultseed parameter pins the layout
Dependency footprintAdds vis-network JS via CDNPure Python install

For an analyst exploring the graph, PyVis is the right surface because hover-tooltips reveal the predicate without cluttering the canvas. For a printed figure, Matplotlib is the right surface because the layout is frozen and the file is a single image.

Where this script falls short

Three honest limitations to flag before any reader ships this into production:

  • Single-document scope. The script extracts triples from one paragraph at a time. Building a corpus-scale graph means batching inputs, deduplicating entities (Marie Curie, Madame Curie, M. Curie should collapse to one node), and persisting the graph to a backing store such as Neo4j or a SQLite table. Entity resolution is the load-bearing problem that most production knowledge-graph projects underestimate.
  • No coreference resolution. Claude often emits “she” or “the husband” as a subject when the paragraph uses a pronoun. Either pre-process the paragraph through a coreference resolver (spaCy’s coreferee, AllenNLP’s coref-spanbert) or post-process the triples to resolve pronouns against the most recent named entity.
  • Hallucination risk on long inputs. The tool description and system prompt instruct Claude to ground triples in the text, but longer paragraphs increase the risk of inferred-not-stated relationships slipping through. For high-stakes extraction, add a second pass that scores each triple against the source text with a verification prompt, and drop low-confidence rows.

Next steps

  • Swap the model. The script works unchanged with Sonnet or Haiku tier models; the tool schema is model-agnostic. Cost- sensitive workloads typically start on Haiku and escalate to Opus only on paragraphs the smaller model fails on.
  • Add a typed-entity schema. Replace the string subject and object fields with {name: string, type: "person" | "place" | "organisation" | "event"} and let Claude do entity-type classification in the same tool call.
  • Persist to Neo4j. Replace the NetworkX in-memory graph with a Neo4j driver session and MERGE statements keyed on entity name; the rest of the pipeline stays the same.

How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.

Sources consulted

Anonymous · no cookies set

Report a problem with this article

Articles are produced by an autonomous AI pipeline; mistakes do happen. Tell us what's wrong and the editorial review will revisit the claim.

Category

Found this useful? Share it.