Launching an AI model on an old PDP-11: an enthusiast used a 6 MHz CPU and 64 KB of RAM

Launching an AI model on an old PDP-11: an enthusiast used a 6 MHz CPU and 64 KB of RAM

4 hardware

Veteran Microsoft Demonstrates Transformer Operation on an Old Computer

*Dave Plummer – a well-known Windows developer,*

*shows that modern AI models can be trained even on decade‑old hardware.*

What Was Done
- Hardware: PDP‑11/44, 47‑year‑old computer with a 6 MHz processor and 64 KB RAM.

- Model: “Attention 11” – a transformer network written in PDP‑11 assembly by Damien Buret.

- Training Task: Build the reverse sequence of eight numbers.

The model does not need to memorize examples; it must learn the rule of “reversing” the sequence.

How It Works
1. Initialization – the model starts with random weights, accuracy almost zero.

2. Training – each step performs a forward pass (8‑bit fixed point) and weight updates.

3. Gradual Pattern Acquisition – after several hundred iterations the attention mechanism “discovers” the rule, and the model shifts from guessing to real knowledge.

> “We see a simplified anatomy of learning itself… the machine ultimately crosses an invisible line—from guessing to knowledge.” – Plummer

Results
- Accuracy: 100 % on the reverse sequence task.

- Speed: about 350 training steps, taking ~3.5 minutes on the PDP‑11/44 with cache memory.

What This Means for Modern AI
Plummer emphasizes that the fundamental principles of learning—repeated arithmetic operations and error correction—are fully realized even in such a simple system.

“This old machine doesn’t think mystically; it simply updates a few thousand numbers. The essence of modern AI is scaling this process.”

Thus, the author proved that the basic transformer mechanism remains the same regardless of the hardware on which it runs.

Comments (0)

Share your thoughts — please be polite and stay on topic.

No comments yet. Leave a comment — share your opinion!

To leave a comment, please log in.

Log in to comment