asciijpg background

a s c i i j p g

The digital encoding of…most human knowledge, occurred over only the past 30 years or so, and really, mostly in the past 15. (volume-wise, it is likely that the past 15 years is responsible for 75% or more of this work). The digitization of human knowledge and experience happened in an urgent and almost furious way. Why the rush? It’s kind of funny, because we didn’t really know why at the time, although it certainly seemed like a necessity. Oh, ostensibly it was about commerce and sharing and connecting and convenience. But, I think very few if anyone knew that it would be a requirement to building AI systems, as we’re now seeing that it is and was. Until maybe only 7-10 years ago, there really was no consensus on what was required data-wise to bootstrap human-level AI. Even data and DNN-focused research kind of assumed we would need lots of labels. And, symbolic AI researchers thought we would need more rules and systems, even if part of what they were doing was to “read” the internet.

Well, now we know that DNNs and plain-ol’ digitized text and images are enough, at least for this first round and possibly for the next several. Yes, distributed computing was necessary. Yes, it seems GPUs (which were also quite serendipitously developed and produced) would also be a key ingredient. But, the digitization of human knowledge in text and images seems to be unmistakably a significant requirement, and amazingly, this really was done in a quite distributed way. It’s almost as if we all knew that we would need this. Of course, the benefit of this shared endeavour may not distribute as evenly as the work to do the digitization…

asciijpg is an homage of sorts to what I think are ultimately the key encodings which enabled self-supervised AI systems like GPT4. Ok, technically text is now more frequently encoded as utf, but the grand-daddies are ascii and jpg. There could be other contenders for the most important encoding standards; of course, html is terribly important. But, the simplest reduction of the unified global digitization of human knowledge has really happened on two simple standards for text and image digitization respectively: ascii and jpg (jpeg).

[email protected]