Comments on William Calvin's book
"The Cerebral Code"

by Ted Kaehler

Draft of 16 Apr 98. Please comment.

In his book, The Cerebral Code, William Calvin presents his theory of how the brain thinks thoughts. As an engineer, I am used to two kinds of explanations: Those I understand enough to derive the details of how they actually work, and the other kind, those that don't really explain anything. I am overjoyed that The Cerebral Code contains the first real explanation of how thoughts occur on our neural hardware. To my engineer's mind, it has the distinct ring of truth about it. I think it will be shown to be essentially correct. But correct or not about human brains, it may be able to serve as the basis for building an artificially intelligent program.

The Cerebral Code proposes that the surface of the neocortex is covered with hexagons, each a half a millimeter across. The hexagons are 'virtual', and are not anatomical features. A hexagon is the unit repeating cell of a group of minicolumns. A set of spaciotemporal firings of the minicolumns in a hexagon is a 'thought'. It is a melody of firings. Each hexagon is listening to its six neighbors. When a neighboring pattern of firings matches more closely with what the hexagon 'knows', it switches to that pattern. There is Darwinian selection among the patterns. Imprefect copying and competition for the scarce resource of hexagons causes survival of the fittest among rival thoughts. This occurs on the millisecond time scale. Through this Darwinian selection of patterns, even random noise can be 'improved' into a coherent new idea.

But don't trust my feeble explanation, read the book. The Cerebral Code has its own homepage. And, the full text of the book is on the web! You may start with the prologue.

The Cerebral Code describes a computing system. It is heavily parallel in its computation. Like Eurisko by Doug Lenat, it is a Darwin Machine. Like Production Systems by Alan Newell et al, it has its current attention focused on a small collection of 'symbols', and looks that combination up in a massive database of 'rules'. Like an ecosystem, it orbits around basins of attaction, and sometimes jumps over to a new mode.

The big question is whether there is a distinct computational level above neural nets or not. If there is, then the neural nets are just implementing another computing medium, and we can simulate that medium with a von Neuman machine. If not, then nothing short of modeling every synapse will do the trick. A working array of hexagons is itself, or course, a new level, but the question is what it takes to simulate that level. Either there is a 'symbolic' computational model for a hexagon, or you are forced to use neural nets to implement them. The advantage of a distinct higher level is that we can throw away the contortions that nerve cells must go through to implement the higher level. Here are some arguments for the possibility of a higher level simulation:

The neo cortex is surprisingly uniform over its extent. All parts of it look the same in detail, as if it were uniformly implementing a different computing medium. This is in spite of the fact that regions of it are dedicated to very different functions. Why aren't they anatomically specialized?
Computation is so much cleaner when there are bit-perfect memory units, wires that can stay on for more than a pulse, memory that is not changed by adding new info, high speed switching, and clocks for synchronization, etc. Networks of nerve cells are bad at this. It would make sense for the lowest level to implement a virtual machine that has better computational properties than bare nerve cells do.

Some parameters of the system:

A hexagon is 0.5 mm across. A minicolumn is 0.03 mm across. How many minicolumns are in a hexagon? The area of a minicolumn is 0.0009 sq mm if it were square. A hexagon would be 0.25 sq mm in area if it were a square. There would be 0.25/0.0009 = 278 minicolumns in a hexagon. But at the same center spacing, hexagons pack 1.25 times as densely as squares. Both the minicolumns and the large hexagons pack 1.25 times more densely than we calculated, so these factors cancel out. There are about 280 minicolumns per hexagon. On page 120 of 'How Brains Think', Calvin says there are about 100 minicolumns in a hexagon. Which is right?

A minicolumn is closely coupled with other minicolumns on a circle whose diameter is 0.5 mm. How many minicolumns are on this circle? Circumference is D*pi = 1.57 mm. That leaves room for 52 whole minicolumn diameters, or about 9 per adjoining hexagon. Suppose the circle is sloppy, and it makes substantial contact with parts of three times as many minicolumns. That is 157 around the circle, or an average of 26 per neighboring hexagon.

Calvin mentions that the orientation of the hexagons at one place is probably not fixed. How many unique orientations are possible? For two angles to be distinct, each minicolumn in one hexagon must correspond to a different minicolumn in the next hexagon at the two angles. If two angles are close enough that they hit the same minicolumn 0.5 mm away, then they are effectively the same angle. We only need to consider 60 degrees, for one adjoining hexagon, before things repeat. We just calculated that a minicolumn contacts 26 others in the next hexagon, so there can't be more than 26 distinct angles. We used liberal assumptions about contact, so this is probably an upper limit.

Does anyone know what might control which angle is being used? Howard and I are both skeptical that a rotated set of hexagons plays any role at all. Makes more sense to stick with one orientation. (But then you'd expect to see some morphological manifestation of it at multiples of 60 degrees.)

How many hexagons in a brain? The area of the cortex is about the area of four sheets of paper. Until a more authoritative reference comes along, we will use this "Office Supply Measure" of coritcal area. A sheet of paper is 8.5 in by 11 in, or 21 cm by 28 cm. The area is 2350 sq cm. Four is 9400 sq cm. This is 9.4x10^5 sq mm. The area of a hexagon is (0.5*0.5)/1.25 packing factor = 0.31 sq mm. There are 30x10^5 or 3 million hexagons in a neocortex. On page 120 of 'How Brains Think', Calvin says there are about a million hexagons in a cortex.

In a single hexagon, there are 280 minicolumns (850 million per cortex), each with 100 cells (85 billion per cortex) or 28,000 cells per hexagon. Each cell has between 2000 and 10000 input synapses. Let's say 6000 on average, or 168 million synapses per hexagon (5x10^14 per cortex). If we allow 256 levels of potentiation per synapse (8 bits worth, or one byte), we have 168 megabytes of storage per hexagon (500,000 gigabytes per cortex).

These large numbers are far beyond what we can simulate today. I'd very much like to know which of these numbers change for a less intelligent animal like a rat. Is the number of synapses per cell the same? Is the number of cells per minicolumn the same? Is the number of columns per hexagon the same? (A rat's cortical area is about the size of a postage stamp.)

Let's Build a Learning Machine

The goal is to write a simulation program that learns voraciously. Give it something to manipulate and some input as feedback, and it will explore the space, learn how to make things happen repeatably, and get interested in problems it makes up for itself.

Calvin has given us a giant hint in 'The Cerebral Code'. Let's exploit it. I have my own additional list of things that might lead to a good learning program.

We do not have to simulate every neuron, or even the neural circuitry. We only have to model the distinct computational level that is what whole hexagons are doing. What are the properties of this 'Hexagon Level' computation?

Each hexagon is performing a mapping. The output is this hexagon's next 280 bit 'answer'. The inputs are the 280 quantities from each of six near neighbors. Calvin thinks that the next six hexagons have input too -- the ones that are two hops away in a straight line. The most important input is the hexagon's own 280 bit output from the previous time period (its current state). And there is input from the outside world that enters the bottom of a hexagon. Taken together, we have a function from about 13*280 bits to 280 bits. That's a pretty big lookup table.

I cannot overemphesize how important memory is in intelligence. It has been vastly overlooked. This is a holdover from the days of small computer memories and tiny hard disks. If you've spent time with an eldery person who is not able to form new memories, you know how smart they are in the moment, but how unproductive a long conversation is. Remembering what just happened is a fundamental base upon which intelligence must rest. I know of no computer program that does it well. (It is astounding that personal computers are considered useful, when they have no idea of what just happened!) There is probably a spectrum of useful levels of memory -- transcripts of inputs, current state, long term rules, info needed to see if we are stuck in a loop, etc.

"Be surprised but not too surprised." This is a useful goal for a system. If it is not surprised by its input, then nothing new is happening, and it is not learning. If everything coming in is unexpected, then it does not know what is going on at all. There is probably some good ratio, where most things are recognised, but some are new.

Web as input

Symbols.

Alan Newell's Production Systems, but with an organic messiness, and a Darwinistic process for generating new rules.

Lenat's Beings.

Consider a large collection of rules (agents) with bank accounts. Some of them have a huge bank account because they are right, and are used all the time. Others have tiny bank accounts because they are rarely used, wrong, or beside the point. But there is a group with intermediate success that are not old and successful. The processor should spend all of its effort on these. The process of learning is the process of testing these rules, trying to generalize them, and forcing their bank accounts toward success or bust.

Rules can get wealthy. Once they are, no bank account is kept. Only the rate of gain is measured. As long as it stays up, the rule stands.

An economy of rules! Degrees of recognition. LTP by recording which other rules fired in the N time periods just before I fired. I am activated if 3 rules in last 4 time slices are in my activation set. Or some other combination thatI have learned, such as 1 rule 5-10 times ago and 3 rules in last 2 slices.

Rule fires if its absolute char pattern has just come in, and if it is activated. It has a table of when it gets activated. Can be only by input.

What is in the queue? Input chars, names of rules that just fired. Tokens tossed in by those rules. How treat permanent condition? Like what day it is. Not in queue, but "present". Once activated, it can query state. A token is dumped in if the state is right. (This is turning out very Alan Newell, Producton System like. He was missing the evolutionary, unplanned, out of control aspect of it.)

I seed it with recognisers of the best quality I can make!!! It decides when to use them.

First just prediction of a stream of web pages. Then reward and punishment is by putting the answer into the stream.

Calvin has a glossary of terms on his web site. Here is an example link into it: Postsynaptic

Back to Ted Kaehler's Homepage.

This page is... http://www.squeakland.org/~ted/calvin.html

Ted Kaehler (email to my first name at SqueakLand.org)

Comments on William Calvin's book "The Cerebral Code"

by Ted Kaehler

Some parameters of the system:

Let's Build a Learning Machine

Comments on William Calvin's book
"The Cerebral Code"