About

The algorithm behind The Thread, what we chose and what we didn't, and the question the math keeps raising.

The Algorithm

Every verse in the King James Bible (1,189 chapters, ~31,000 verses) was embedded into a 768-dimensional semantic vector space using LaBSE (Language-agnostic BERT Sentence Embeddings), a multilingual model validated for ancient and religious texts. For each verse, the 50 nearest neighbors by cosine similarity were recorded: 964,721 edges total. These were aggregated to the chapter level, producing a weighted graph of 1,189 chapter nodes and ~190,000 chapter-pair edges, effectively a fully connected semantic field over Scripture, where edge weight reflects genuine theological and conceptual proximity.

The schedule is built by a Prim’s maximum-spanning-tree traversal of this graph. Starting from six seed chapters, the algorithm maintains a frontier of unvisited chapters scored by their strongest edge to anything already read. Each day, the three highest-scoring chapters are selected. Every new chapter is chosen because it resonates most strongly with the totality of everything that came before.

The plan is not curated. It is computed. The pairings, the sequences, the cross-testament connections: none of that was touched by hand.

The core traversal

The seeds establish the starting point: six NT chapters chosen for their theological density and graph centrality:

SEED_USFM = [
    ("LUK", 24),  # Jesus opens the Scriptures on the road to Emmaus
    ("JHN",  1),  # the Word made flesh — the NT's Genesis moment
    ("ROM",  3),  # the doctrinal core of atonement
    ("ROM",  4),  # justification by faith, Abraham as the prototype
    ("ROM",  5),  # Adam and Christ, death and life
    ("HEB",  1),  # the OT-to-NT hinge, Christ as fulfillment of everything
]

From there, Prim’s algorithm walks the graph. Each visited chapter expands the frontier; each day selects the three unvisited chapters with the strongest edge to anything already read:

def update_neighbors(ch):
    for weight, neighbor in G.get(ch, []):
        if neighbor in visited or is_psalm_proverb(neighbor):
            continue
        if weight > best_score[neighbor]:
            best_score[neighbor] = weight
            push(neighbor, weight)

def pop_best(phase):
    # Phase 1: NT chapters only
    # Phase 2: globally best from either heap
    while True:
        if phase == 1:
            neg_w, ch = heapq.heappop(nt_heap)
        else:
            nt_top = nt_heap[0] if nt_heap else (0, None)
            ot_top = ot_heap[0] if ot_heap else (0, None)
            if nt_top[0] <= ot_top[0]:
                neg_w, ch = heapq.heappop(nt_heap)
            else:
                neg_w, ch = heapq.heappop(ot_heap)
        if ch not in visited:
            return ch

The 6 seed chapters (Luke 24, John 1, Romans 3–5, Hebrews 1) are placed on days 1 and 2, three per day. The main traversal then runs for 365 days (days 3–367). Each iteration pulls three main chapters and one Psalm or Proverb, chosen by finding the one with the strongest graph edge to that day’s bundle:

for day in range(1, 366):
    bundle = []
    for slot in range(3):
        ch = pop_best(phase)
        visited.add(ch)
        bundle.append(ch)
        update_neighbors(ch)

    psalm_ch = pick_psalm(bundle)  # strongest edge to any chapter in bundle
    schedule.append(bundle + [psalm_ch])

The full source is on GitHub.


What We Chose

The seeds (Luke 24, John 1, Romans 3–5, Hebrews 1) were chosen for their theological weight and their position at the center of the NT’s semantic graph. But their influence fades quickly. By day 30, the visited set has 96 chapters and the seeds represent 6% of the frontier context. After roughly 100 chapters, it’s negligible. Seed with Matthew 5, John 3, Ephesians 2, and Colossians 1 and the schedule looks nearly identical by day 40. Prim’s algorithm is globally optimal: the high-degree NT hubs get visited early regardless of where you start, because they have the strongest edges to everything else.

The NT-first structure is the one real decision we encoded. Keeping the OT locked out of Phase 1 entirely was Josh Howerton’s insight, theology turned into a conditional. Without it, the high-degree OT hubs (Psalms, Isaiah, Genesis) would pull into the opening weeks almost immediately, because their edges into the NT neighborhood are strong. We wanted the NT read first, so we said so. The graph did the rest.

Psalms and Proverbs held apart from the main traversal and assigned as a daily companion: that was also a deliberate choice. It keeps the wisdom literature from being consumed too early and gives every day an emotional register, regardless of where the main bundle lands.

Those are the decisions. Six starting chapters. One structural constraint. One slot assignment. Everything else (the pairings, the ordering, the cross-testament connections) came from the graph. We didn’t touch it.


Acknowledgment

The NT-first “Sixth Sense” structure comes from Pastor Josh Howerton of Lakepointe Church. In a podcast (and a TikTok), he described it this way:

“Remember Jesus said in Luke 24, everything that happened before was about him. You guys remember the first time you watched Sixth Sense? It’s the only movie I’ve ever watched twice back to back immediately… there’s that moment where you realize, oh, he was dead the whole time. And then you go back and you immediately rewatch the movie and now you’re interpreting the entire movie through the lens of what was revealed at the end. The whole movie takes on a completely different light. That’s how you’re supposed to read your Bible.”

The Thread is an attempt to take that insight and let the text itself, and the math, decide the rest.


What the Math Keeps Telling Us

When you study the output of this algorithm carefully: the chapter pairings, the ordering within books, the way themes cluster and resurface — something becomes difficult to ignore.

The graph did not know it was handling sacred text. It measured distances between vectors. It walked the path of least semantic resistance. It had no theology, no commentary, no tradition to consult.

And yet the schedule it produced is one that a careful theologian would recognize as meaningful. Chapters arrive in sequences that illuminate each other. Books are split across days in ways that track their internal theological movement rather than their page order. Connections surface that span centuries and genres, not because anyone planned them, but because the concepts were genuinely proximate in the embedding space, which means they were genuinely proximate in meaning.

This is either a remarkable coincidence or it is evidence of something the Christian tradition has always claimed: that the Bible is not a library of loosely related documents but a single, unified text with a coherent internal structure: one story, one Author, moving toward one end.

The algorithm is not imposing structure. It is recovering it.

Forty authors. Fifteen centuries. Three languages. One graph.

The claim is not that the algorithm understands Scripture. The claim is that Scripture has a structure that can be recovered — even by something that doesn’t.


The Project

The Thread was designed and built by Zach Graves in early 2026. The embedding pipeline, KNN graph construction, and Prim’s traversal are all custom-built, roughly ten weeks from concept and research to the first generated schedule.

The reading plan is available inside ThatSermon, an iPhone app for recording and reflecting on sermons. The Thread is the daily reading backbone: four chapters a day, with the sermon notes you take on Sunday woven into the same experience.

The generator (thread_plan.py) is open source. The code that produced every row in the schedule is on GitHub. If you want to run it yourself, fork the embeddings, or build on the graph, it’s there.