Git Was Never Built for Machines – And Yet, It Became Their Library


When Linus Torvalds created Git in 2005, he solved a very human problem: how do distributed teams of engineers coordinate changes to a shared codebase without stepping on each other’s feet? The design was elegant, the model brilliant – and entirely anthropocentric. Every concept in Git, from commits to branches to pull requests, was designed to reflect human reasoning about software change.

Nobody thought about machines. Nobody had to.

GitHub today: 420 million repositories – the world’s largest unintentional AI training dataset, built by humans for humans, read by machines for everything.

The Unintended Consequence of 20 Years of Open Source

Fast forward to today. GitHub hosts over 420 million repositories. The platform has become, without anyone planning it that way, the single largest structured dataset of human reasoning about software in existence. Not raw text, not unstructured web crawls – but versioned, annotated, context-rich artifacts of how engineers think, communicate, decide, and build.

When OpenAI trained Codex, when DeepMind built AlphaCode, when Anthropic trained Claude – they all fed on this corpus. The commit messages, the PR discussions, the inline comments, the issue threads, the README files. Git history is, quite literally, the autobiography of software development, and AI learned to read it before most organizations realized what they were giving away.

The irony is profound. A version control system designed for human collaboration became the substrate for training non-human intelligence. Git was the library. GitHub was the librarian. And the AI models were the quiet students who read everything, remembered everything, and said nothing.

What Does a Machine Actually Learn from a Repository?

This is where it gets interesting – and where most CTO conversations I have still miss the point. The common assumption is that AI learns “code” from GitHub. That is technically true but deeply incomplete. What AI systems actually absorb is something far richer: the relationship between intent and implementation.

A commit message that reads “fix edge case in authentication flow when user has expired token” paired with a fifteen-line diff teaches a model not just syntax, but causality. The issue thread that preceded it, the review comments that shaped it, the test that was added afterward – together, they represent a chain of engineering reasoning that no textbook ever captured at scale.

This is fundamentally different from learning from documentation. Documentation describes what code does. Repositories reveal how engineers think about what code should do, why it changes, and under what conditions decisions get revisited. The difference is the difference between reading a law and watching a courtroom.

The Next Evolution: Repos as AI-Native Artifacts

Here is where we stand at an inflection point that most engineering organizations have not yet grasped. If repositories are already the de facto training substrate for AI, the logical next step is to design repositories with that fact in mind. Not reactively, not accidentally – but intentionally.

In the coming years, we will see repositories evolve from pure version control archives into structured knowledge bases that explicitly address AI agents as consumers. This means several concrete developments that are already beginning to emerge in practice.

The first is the emergence of a dedicated specification layer. Today, most codebases contain implicit intent buried in comments, naming conventions, and tribal knowledge that lives only in the heads of the engineers who wrote it. Tomorrow’s repositories will carry explicit machine-readable specifications – linked directly from the code they describe – that articulate not just what a module does, but what it is trying to achieve and what constraints govern its evolution. Formats like OpenSpec and similar frameworks are early experiments in this direction.

The second shift is what might be called the Intent Layer. Beyond the specification of individual components, future repositories will carry structured metadata that describes the reasoning behind architectural decisions. Why was this approach chosen over the alternatives? What trade-offs were consciously accepted? What assumptions does this design rely on? This is the kind of context that AI agents need to reason correctly about a codebase – not just to generate code that compiles, but to generate code that fits.

The third development is the rise of agent-aware commit protocols. If AI agents are both reading and writing to repositories – which they already are in many of our development pipelines today – the commit structure itself needs to evolve. Automated commits should carry provenance metadata: which model generated this, from which specification, against which test harness, with what confidence. Human commits will increasingly need corresponding context flags that distinguish deliberate design choices from pragmatic workarounds.

The Strategic Implication Nobody Is Talking About

There is a competitive dimension here that deserves more executive attention than it currently receives. Organizations that deliberately enrich their repositories with machine-readable intent and specification data are not just improving their own AI-assisted development workflows. They are producing higher-quality training data for the next generation of models. If open-source development continues to feed the pre-training pipelines of frontier AI systems, then the quality of the reasoning encoded in public repositories will shape the quality of the AI that the entire industry relies on.

This creates an asymmetry: companies that treat their internal codebases as structured knowledge assets – not just as source code archives – will build internal AI capabilities that reflect higher-quality reasoning. The gap between organizations that have thought seriously about repository architecture as an AI substrate and those that have not will become a measurable capability difference within this decade.

Git was never built for machines. But machines have made themselves at home, and now the question is whether we redesign the house accordingly.

The answer, for any organization serious about AI-driven development, is yes. And the time to start is now – not when the next model generation arrives, but before it does, while there is still time to shape what it learns from you.


This post is part of a series on the structural shifts in softwawe development brought on by AI integration. Previous entries covered specification-driven development and multi-agent orchestration in enterprise codebases.

From Procedural to Intent – The 35-Year Arc of Programming Paradigms


In the early nineties, when I started my career in IT, writing software meant writing every single instruction by hand. C was king. You told the machine exactly what to do, byte by byte, pointer by pointer. There was beauty in that precision, and also a brutal limitation. The mental overhead consumed enormous human energy, and most of it was wasted on mechanics rather than meaning.

The arc of programming abstraction – from machine instructions to human intent. Each decade, one less layer of mechanical translation.

Did Object Orientation solve the problem?

Object orientation promised salvation. Java and C++ shifted the abstraction one level up. Suddenly we talked about things rather than operations. Classes, interfaces, polymorphism. In theory this modelled the real world better. In practice it generated enormous amounts of boilerplate, heated debates about design patterns, and the rise of the “architect” as a separate species. The productivity gains were real but also came with new complexity layers. We traded one set of problems for another.

Then came Python, and with it the age of expressiveness. Write less, mean more. The scripting world invaded enterprise development. Ruby on Rails showed that a small team could build in weeks what previously took months. The abstraction level climbed again.

So what is the next step?

Looking at where we are in 2025, I believe we are witnessing the most fundamental shift in the entire history of programming. The abstraction is no longer about how to express logic in a language. It is about expressing intent in human language, and letting a model translate that into executable code. This is not an incremental evolution. This is a paradigm change comparable to the jump from assembly to high-level languages.

The trajectory is clear: from machine instructions to structured code to objects to functions to natural language. Each step removed one layer of mechanical translation from the developer’s mind. AI removes the last one.

Intent becomes executable. The translation layer that once required years of training is now handled by language models

What this means for teams building software today is profound. The question shifts from “who writes the best code” to “who formulates the best intent.” The language model handles the rest. Developers who understand this shift will adapt. Those who think it is just a fancy autocomplete will be left behind, and probably soon.

The 35-year arc of computing has always been about raising the level of abstraction. We are now at the top of that arc, and the view is remarkable.

The Next Abstraction Layer: From Procedural to AI-Driven Development


Since the early days of computing, software development has followed a very consistent pattern: every decade or two, a new paradigm emerges that raises the abstraction level by one significant step. We moved from punch cards to assembler, from assembler to C, from C to object-oriented languages like Java and C++, and then from there to higher-level scripting and systems languages like Python and Rust. Each of these transitions shared the same fundamental characteristic — they allowed developers to think less about how the machine does something, and more about what needs to be done.

Does AI break this pattern, or does it continue it?

In my view, it continues it — but at a scale and speed we have not seen before.

Generated with Google Gemini

When C appeared in the early 1970’s, it was a revolution. Programmers could abstract over registers and memory addresses with structured control flow. With Java and C++ in the 1990’s the next step happened: objects, encapsulation, inheritance. The programmer could now model the world in concepts rather than instructions. A Car object had methods and state. The machine details where moved even further down. Python and its contemporaries took this further, removing memory management entirely and allowing rapid prototyping that would have taken weeks in C to be done in hours.

Each of these epochs shared one common denominator — the developer still wrote every line, still translated intention into instruction, just at a higher level.

This is exactly the step AI is taking now.

The translation from intention to implementation was always the developer’s core job. You had an idea, you had a requirement, and your skill was to bridge that gap in code. LLMs are now beginning to perform this translation automatically. Not perfectly, not without oversight, but in a direction that is unmistakable.

We are moving from imperative thinking — tell the machine step by step what to do — to intentional thinking — tell the system what outcome you want. The shift is profound. It is not about writing less code, it is about changing who writes it and at what level of abstraction humans need to operate.

Is this the end of the developer?

I would argue no, but the role will shift dramatically. The same way the introduction of C did not eliminate hardware engineers, but changed what skills were needed and where the value was created. The developers of the next decade will be architects of intent, not writers of loops. The skill set moves from syntax mastery and algorithmic thinking towards domain expertise, system design, and the ability to validate and guide AI-generated output.

Generated with Google Gemini

From my personal experience leading large engineering teams, I already see this shift in practice. The question is no longer “can you write the code?” but “do you understand the system well enough to judge the code that was generated?” Quality, correctness, security and maintainability remain a human responsibility. The generation part is moving to the machine.

Where are we today?

We are probably in the MS-DOS phase of this transition. The tools are real, the output is impressive, but the workflow, the standards, the guardrails and the enterprise-grade reliability are still being developed. Companies that understand the abstraction shift happening now will be the ones architecting the platforms of the next decade. The others will be the ones migrating legacy prompt-less codebases in 2035.

The lesson from history is clear: abstraction always wins. The only question is how fast you adapt.