Skip to main content

Software Is an Encoding

· 6 min read

"Programs are meant to be read by humans and only incidentally for computers to execute." — Donald Knuth

Code is documentation. Not a supplement to documentation — the thing itself.

Encoding a Model

The previous article argued that the real work in software is modeling. The model captures the rules, the data flows, the distinctions that matter. Code is the translation.

But translation isn't passive. When you encode a model in code, you're making communicative choices. How precisely does the code represent the model? How easily can someone read the rules back out? These aren't aesthetic questions. They determine whether your code can stand alone as the authoritative record of how a system works.

Precision and Size

Information theory has something useful to say here. Any message has a core and a periphery. The core is the information that matters. The periphery is everything else — greetings, context, protocol, style. Consider an email asking a colleague to send you a file. The core is "send file X." The greeting, the pleasantries, the sign-off — they might matter for the relationship, but they don't change the content.

Two properties determine message quality. Precision: how unambiguous is the message? A perfectly precise message has exactly one interpretation. An ambiguous one has many. And density: how much of the message can you remove without losing core meaning?

These pull against each other. Reducing ambiguity often means adding words. Legal documents are the extreme case — long and full of particular language, because they're trying to minimize the number of valid interpretations. The goal is a contract with exactly one reading.

Good code faces the same tradeoff. Terse code can be ambiguous. Verbose code buries the signal in noise. The aim is the same as legal drafting: maximum precision at minimum size.

Structure Reduces Ambiguity

The way out of the tradeoff is better language. Math is the canonical example. The expression f(x) = x² is tiny and completely unambiguous. It achieves this through structure — well-defined primitives combined according to strict rules. The structure does the disambiguation work, so individual statements can stay small.

This is one goal of good programming language design. Find the right primitives. Define precise composition rules. Then higher-level constructs can be both dense and unambiguous — they inherit precision from their foundations.

It's also a goal of good code within whatever language you use. Well-named things, clean decomposition, expressive use of the language — these increase information density. The code says more with less.

Code as Source of Truth

Here's the test: is the code the authoritative record of how the system works?

If the code is imprecise — vague names, tangled control flow, rules scattered across undocumented branches — it can't be the source of truth. The gap between what the code says and what the system actually does gets filled by humans. Someone who was there when it was built. A product owner who holds the rules in their head. A team that schedules meetings to reconstruct intent.

Human memory is neither precise nor accurate. The original developers leave. The product owner gets promoted. Three years later, nobody knows why the billing logic branches the way it does, and everyone's afraid to touch it.

The advantage of encoding rules in code is that code can be precise and accurate, indefinitely. That advantage only materializes if the code actually says what it means. If it takes a person to interpret the code correctly, you've outsourced your source of truth to the least reliable storage medium available.

AI coding assistants are now consumers of the code too. When you ask an assistant to extend or modify a system, it reads the codebase to understand the rules. Precise code helps the AI reason correctly about what exists and what to add. Vague code leaves gaps — and AI, like humans, fills those gaps with assumptions. The quality of AI-assisted development is downstream of the quality of the encoding.

What Language?

Language choice matters more than it's usually given credit for.

Languages are abstractions stacked on top of what the machine actually does. At the bottom, a computer is a state machine: it holds a state, takes inputs, and produces a new state. Every language above that level translates those operations into something more expressive.

The question is how well a language maps to the domain you're modeling. A language that fits the problem lets you encode rules directly. One that doesn't forces you to express the model in roundabout ways — and roundabout encodings are imprecise ones.

State machine languages map naturally to a huge range of software problems, because most software is managing state. Explicit states, explicit transitions, explicit inputs. The rules are visible in the structure. Compare that to code where state is scattered across mutable variables and control flow is implicit — reading back the model from that encoding is hard, and modifying it safely is harder.

The choice of abstraction is a choice about how clearly your model can be expressed. Choose a language that fits the problem, and precision gets easier. Fight the problem, and you're working against yourself.

Plain English Is Also an Encoding

Increasingly, business rules aren't only encoded in code. Spec documents, product requirements, markdown files that guide coding assistants — these are encodings too. Plain English used to be a communication medium between humans. Now it's also an input to automated systems.

The precision/size tradeoff still applies. Natural language has expressive power that formal languages lack — you can convey intent, context, nuance. But it carries more inherent ambiguity. A vague spec produces vague code, whether a human or an AI is doing the translation.

The most useful natural language specs read almost like formal specifications: terms defined clearly, edge cases stated explicitly, conditions expressed without wiggle room. The medium is different. The goal is the same — a message with one interpretation.

The Implication

If code is an encoding of a model, then its quality as code is inseparable from its communicative value. Clever code that nobody can read is a bad encoding. Verbose code that buries the rules in noise is a bad encoding. Code that requires a human expert to interpret is a bad encoding.

The measure of good code isn't that it runs. It's that it says what it means — precisely enough that the model can be read back out by anyone who needs to understand the system. The computer executes it. The humans have to live with it.