Just Because You Wrote It Doesn’t Mean You Own It

Randy isn’t really a programmer. It’s just that he’s obsessed with fantasy role playing, and has turned to computers in order to achieve the kind of game-play realism that would otherwise be impossible.

More specifically: he aims to simulate the collective metabolism of ancient native American hunter gatherer tribes. Inland, they would have had to work very hard just to stay alive, whereas populations living by the ocean, where food was plentiful and nutritious, could relax and spend more time creating intricate art. That type of thing.

It wasn’t really Randy’s idea initially. He just stumbled upon it when, working as a clerk at the university library, Andrew started ordering up books on the subject.

Andrew was just as fanatic about the subject, but not at all into computers. Instead he worked the anthropological angle. He’d disappear into the wilderness for days, living off of insects, rodents and whatever else he could scavenge. He was dead set on experiencing what it must have been like to have been an ancient tribesman.

The reason Randy can try his hand on turning Andrew’s problem into a computer simulation by the way, is the fact that he met Charlene. She’s an arts student and as such she can give Randy access to her account at the local university’s UNIX mainframe.

Learning to code on it is hard. Randy would have much preferred one of those PC’s that are just starting to come into the mainstream, but can’t afford to buy one on his meagre librarian salary, so he’ll have to make do.

Hence Randy disappears from the face of the earth, only to emerge 18 months later with a piece of software that actually works. A fully functional digital roleplaying game, at least a full decade before that really became a thing.

Then his creation gets discovered by an anonymous corporate entity offering to pay him a thousand dollars plus royalty on future sales.

Ecstatic, he tells Andrew-the-anthropologist, whom Randy thinks of as a friend. But Andrew turns on Randy. Smelling money, he suddenly claims that since Randy’s game is inspired by Andrew’s research, Andrew is entitled to fifty percent of whatever proceeds that the game will generate. Given that his wealthy father owns a legal firm, Andrew has the resources to enforce his wish.

Then it gets worse still. A pack of lawyers working for the university gets wind of the impending deal. Since the code was written and stored on machines belonging to the university, they threaten to sue Randy unless they too get a cut.

In the end, Randy is left with less than nothing. Not only is he ruined by legal fees, there’s also no way in hell that anyone will want to buy his game anymore. In the words of Neal Stephenson, in whose novel Cryptonomicon Randy’s story appears:

The software was never sold to anyone, and indeed could not have been; it was so legally encumbered by that point that it would have been like trying to sell someone a rusty Volkswagen that had been dismantled and its parts hidden in attack dog kennels all over the world.

It was the only time in [Randy’s] life when he had ever thought about suicide. He did not think about it very hard, or very seriously, but he did think about it.

I love Neal Stephenson’s books not just for its vibrant prose, but also because unlike many other sci fi writers, he never skims over the details that make up most of what it’s like to actually live and breath technology.

Because it’s only after something becomes a “breakthrough” that it starts looking glamorous and shiny. A vast graveyard of inventions speaks to the annoyingly mundane reasons why even the most brilliant technologies often never makes it out of the lab.

Randy’s story played out in the 80’s, when the legal framework for intellectual property hadn’t yet adjusted to modern technologies. Surely things have improved since?

On some level, yes. We now have established practices for software licensing; proprietary and otherwise. There’s also an understanding that the copyright of a piece of code belongs to whoever wrote it. Progress have been made, we’ve caught up with the era of shrink-wrapped software and even to some extent to that of the Internet as the dominant distribution platform.

Predictably however, tech has moved on and is again far ahead of the legal framework’s curve. The implications of which are really far worse than what happened to poor Randy, if you think about it.

Here’s what I have in mind:

When Microsoft bought GitHub in 2018 and then started taking control of OpenAI the following year*, they got away with a move that would have made a Bond villain blush.

*GitHub was acquired for 7,5 billion dollars. OpenAI is technically still not fully controlled by Microsoft, which only owns 49 percents of its shares, but it’s been reported that Microsoft has the right to 75 percent of OpenAI’s profits until it’s recouped its investment. The size of that investment is speculated to be in the range of eleven billion dollars, mostly payed for in the currency of access to Microsoft controlled compute power.

If you know anything about OpenAI, you know that it’s the source of wildly popular ChatGPT. It is also, however, the maker of Codex, an artificial intelligence model that is used to power GitHub’s developer tool Copilot. (In fact, Codex is simply a modified production version of GPT-3, the model that powers ChatGPT).

While ChatGPT is getting all the limelight, Copilot moves behind the scenes in ways that might have far reaching consequences.

The AI-race these days is all about having access to copious amounts of data, upon which to train your model. Copilot (/Codex) is trained on the billions of lines of GitHub code. If you plug it into your IDE, every line of code you write will touch GitHub’s (=Microsoft’s) servers, which will duly get back to you suggesting where to go from there. Copilot is known to have had security flaws, but that doesn’t stop more and more coders from taking it for a spin, and many of them seem to be blown away by what Copilot can do.

There’s at least one problems with this from an intellectual property point of view: Copilot was trained on open source code, without asking the copyright holders for permission*.

*A company that didn’t own GitHub would probably never even try getting away with this. The fact that Microsoft do however, likely means it can claim that whoever uploaded code onto its servers also gave permission to whoever controls those servers to read said code, and technically speaking read-rights is all it takes to train a model. That’s just me speculating though, it’ll be interesting to see what really happens when the jury is ready to announce its verdict.

What that means is that the value Microsoft captures by selling access to Copilot, is made on the back of thousands of unpaid volunteer coder’s contributions to open source projects. Contributions they made – if they’re anything like the coders I know – to make the world a better place, not to enrich one of the largest corporations in the world. I wonder what Randy would have made of that.