For most of AI's history, building an intelligent system meant writing rules. A lot of them. If the patient has a fever and a cough, check for these conditions. If the order total exceeds this threshold, flag it for review. The programmer — or more often, a team of human experts — would sit down and try to encode everything a system needed to know as explicit if-then logic.
This approach, called rule-based AI (or symbolic AI), worked well enough in narrow, well-defined domains. In the 1970s and 80s, so-called expert systems proliferated across industries — two-thirds of Fortune 500 companies were using them by the mid-1980s. They could diagnose certain diseases, configure computer hardware, and navigate specific legal questions with genuine competence.
The problem was the ceiling.
Rules work when the world is tidy. But most interesting problems aren't tidy. Teaching a rule-based system to recognize a handwritten digit, understand a sentence, or identify a face requires capturing something that resists being written down — the kind of intuitive pattern recognition that humans do effortlessly and can barely explain. The earliest successful expert systems, like XCON (used by Digital Equipment Corporation to configure computers), proved too expensive to maintain as the real world kept introducing edge cases the rules hadn't anticipated. They were brittle: give them an unusual input and they'd fail in ways no human expert would.
The shift that changed things wasn't a new rule. It was the decision to stop writing rules altogether.
Machine learning inverts the approach. Instead of a programmer specifying what to look for, you feed the system examples — thousands, eventually millions — and let it figure out the patterns on its own. The rules aren't written in advance; they're discovered from data. A system trained on labeled images of cats and dogs doesn't know what a cat is because someone defined "cat." It knows because it has seen enough cats to recognize the pattern, even in photos it's never encountered before.
This sounds almost too simple, but the implications are significant. Problems that were genuinely unsolvable under the rule-based paradigm — speech recognition, image classification, language translation — became tractable once you had enough data and a system that could learn from it. The shift began gaining momentum in the 1990s as computational power grew and data became more available, and it accelerated sharply from there.
What you've read about in Machine Learning is the foundation of that shift. What comes next is where it gets structural — how learning systems are actually built, and what happens when you make them deeper.


