Abstract

Once upon a time about five hundred million years ago a tadpole lookalike creature swam the ocean depths. It was the juvenile phase of an ancient sea squirt, sometimes referred to as a tunicate. Its job at this developmental stage was to find a suitable rock on the ocean floor to anchor down upon and morph into an adult sea squirt. This adult form would spend the rest of its life attached to that rock, passively feeding on what passing currents would bring its way. Not a bad life. But that's not the end of our story, it's just the beginning. Because a very strange twist occurs in this tale. Some say it's apocryphal, others say an exaggeration, but I think it might be true. Give or take.One day an enterprising juvenile tunicate decided not to settle down on a rock; it just kept swimming. And swimming. It got bigger, and bigger. It developed eyes and gills. And got bigger still. It developed fins and sexual reproduction. It got bigger, and fins developed into feet, and it walked upon the land trading gills for lungs. It spent time walking on all fours and time living in trees. It came down from the trees and walked upright and learned to talk to other descendants of juvenile tunicates. And then one day it looked back out upon the ocean and said, “Thanks.” Thanks to that enterprising juvenile tunicate who decided to not eat its own nervous system. Wait, what! Not do what?Wasn't she supposed to say something like “Thanks, oh ancient one, for your bold and courageous swimming”? Not really. If it were that simple there would be no introduction like this for a review of a book like A Thousand Brains. As I mentioned earlier, there's a twist to this tale.The twist comes in the strange turn of events following the typical juvenile tunicate's successful rock anchoring. In the words of British neuroscientist Daniel Wolpert, “It digests its own nervous system for food!” He says this in the opening to his 2011 TED talk “The Real Reason for Brains.” And he explains that we have brains for one reason and one reason only, and that is to have adaptable and complex movements; there is no other reason to have a brain. You see, even tadpole-style swimming requires a brain. Not a lot of brain. A rudimentary nervous system like that of the juvenile tunicate gets the job done. But once you don't need to move, Wolpert says, you don't need that brain. Thus the tunicate's first meal.Hawkins doesn't reference Wolpert in his book, although Wolpert is mentioned in several Numenta papers. But they are kindred spirits. Wolpert by his own declaration is a movement chauvinist. Hawkins, by his inspired dedication, is an evangelist. As you will see when you read the book, “to be” is to be moving. That's my way of capturing the defining feature of Hawkins's model. We have to physically “graze” the world in order to perceptually grasp it. The story Hawkins weaves is compelling. But all good stories yearn for an origin, a genesis. So I recruited J.T., the Juvenile Tunicate. Perhaps mythical but in any case a likeable character. An ancestor who refused to sit on a rock and eat its own brain. An ancestor who used that rudimentary nervous system to set out across deep waters, swimming unknowns and surfing uncertainties. Thanks, J.T.The full title of Hawkins's book is A Thousand Brains. A New Theory of Intelligence. There are 263 pages sectioned into three parts—“A New Understanding of the Brain,” “Machine Intelligence,” and “Human Intelligence”—for a total of 16 chapters and a summary of “Final Thoughts,” all introduced with a foreword by none other than Richard Dawkins. Dawkins does not hold back his praise. He opens his commentary by noting parallels between Charles Darwin and Jeff Hawkins. Yes, that Charles Darwin. One parallel being the fact that Darwin did work, and Hawkins is currently doing work, outside of universities and without government research grants. “Well, you get the parallel,” says Dawkins. Yes. And clearly knowing that that's a lot of parallels to live up to, Dawkins goes on to say that the ideas of both men require book-length treatments. He gets specific about this by calling attention to reference frames, which are key features in Hawkins's theory. These reference frames, as we learn in the book, are used to make predictions and orchestrate movements. And taking reference frames even further, “The very act of thinking is a form of movement” (p. vii). “Bulls-eye!” Dawkins exclaims; the ideas of both men are enough to fill a book. He ends the introduction by noting how a book revealing that the brain works in such a way is “nothing short of exhilarating.”So will the Bay Area really be the next Galápagos? And will brighter-than-a-bird's-brain computer chips be Hawkins's finches? History will decide. But in the meantime it is just possible that Hawkins is enough outside the constraints of normal science to pull together threads that are not normally found in the same fabric. Keep in mind that this thing we call thinking has been elusive for a long time. Long enough and elusive enough to drive many a thinker into “get thee to a nunnery” exasperation. In this case the “nunnery” is a place just outside Plato's cave. The place where pure things exist. “Pure” being a stand-in for stuff we can't explain.But maybe we can explain thinking. Maybe the features of thinking are just the same features as planning, predicting, executing, and verifying movements. And just maybe a full court press at trying to build a machine to emulate that citadel of thought, the neocortex, will shine new light on an old problem. I'm reviewing this book because I think that is worth a shot.This first section is part personal story and part technical description. The personal part, initially to understand a framework for how the brain works and, subsequently, to use that understanding to reconceptualize computers, began in 1979. That's when Hawkins read a Scientific American article by molecular biologist Francis Crick called “Thinking About the Brain.” In that article Crick called attention to the fact that scientists had collected a large number of facts about the brain. But in spite of this knowledge accumulation the brain's workings were still mysterious, and, furthermore, there was a conspicuous absence of any sort of broad unifying framework of ideas. The young Hawkins was inspired by Crick's essay. He thought that this mystery could be solved in his lifetime, and that is exactly what he set out to do.The technical part is woven in and around this personal story. Fundamentally it recasts the brain as an organ of prediction. As Hawkins and Ahmad said in a 2016 paper, “We propose that the most fundamental operation of all neocortical tissue is learning and recalling sequences of patterns.” And they evoked the sentiments of Karl Lashley that this is “the most important and also the most neglected problem of cerebral physiology” (Lashley, 1951, p. 114). Hawkins takes us through this understanding of the brain in Chapters 1 through 7. We learn that the brain is doing one thing over and over again: It is forming models of the world by sampling it sequentially. This is basically what Hawkins's new theory of intelligence is about. Intelligence is the ability to form models of the world. One thing that is important to understand here is that this definition of intelligence is independent of goals or drives. Hawkins is not trying to reverse engineer a person. So intelligence is like a map. It is a tool for achieving a goal, but it has no desires or aspirations of its own. Later in this review we will look at the details of how sequence memory is acquired at learning time and at how prediction is initiated at remembering time.Hawkins attended Cornell University, where he received a bachelor's degree in electrical engineering in 1979. After that he worked for a short time at Intel but moved on in 1982 to a smaller, more agile company called GRiD Systems. He also took some time to apply as a graduate student to MIT's A.I. lab. He was told that his proposal to create intelligent machines based on brain theory was pointless because the brain was just a messy computer. Whatever sense or no-sense that made, it did not deter Hawkins. Back at GRiD he participated in pioneering many of the technologies that are present in the mobile and handheld devices we have today.But the work at GRiD did not offer the opportunities to explore the unifying features of the brain that were still first and foremost in his mind. In 1986 Hawkins enrolled in a neuroscience PhD program at the University of California, Berkeley. As he tells the story, his ideas and ambitions were well received. His approach to understanding the brain was viewed as sound, and there was wide agreement that it was one of the most important goals in modern science. There was one problem, however, that he says he did not foresee. He was told that to get through the program he would have to work for a professor doing what that professor was doing. And no one at Berkeley was doing what he wanted to do. Disappointment here and there never seemed to bother Hawkins all that much. So for the next 2 years he took advantage of the setting by spending his days in the university's libraries getting what he calls a “first-class albeit unconventional education.” Hawkins then returned to GRiD and created one of the first tablet computers, called the GRiDPad. The rest, as the saying goes, is history. In 1992 Hawkins founded Palm Computing, beginning a 10-year span of innovation including the familiar PalmPilot and the Treo. With these successes behind him, he was faced with a dilemma: continue rolling out ever more game-changing innovations or take a break to solve one of the world's most profound problems. He opted for the latter. Why not?In 2002, with the help and encouragement of a few neuroscience friends, he founded the Redwood Neuroscience Institute (RNI). With 10 full-time scientists, all interested in large-scale theories of the brain, RNI became a gathering place for regular lectures open to the public and hours of discussion and debate. During the next 3 years it attracted over a hundred visiting scholars. But the structure of RNI did not facilitate focusing everyone's efforts on the very specific questions that motivated Hawkins. He felt he needed an organization in which he could lead his own research team. It was decided to move the RNI to Berkeley—yes, Berkeley—where it continues today as The Redwood Center for Theoretical Neuroscience. Then, in 2005 Hawkins founded Numenta, an independent research company. There were two goals: to develop a theory of how the neocortex works and apply this theory about the brain to machine learning and machine intelligence. It might be a little too early to say “and the rest is history,” but it is the right time to turn to the substance of Hawkins's book.Alexa, next section, please.It is said that a well-known British physicist once told Wolfgang Köhler that all their great discoveries came from the three B's, the Bath, the Bus, and the Bed. This refers, no doubt, to “aha” moments like Archimedes’ bathtub, Poincaré’s understanding of Fuchsian functions as his foot hit the first steps of a trolley car, and August Kekulé’s benzene ring dream. Add to these Jeff Hawkins's coffee cup. It was a late February day in 2016, and he was sitting in his office holding a Numenta coffee cup in his hands when . . . well, perhaps I should let him tell the story.The operant term in that quote is “reference frame.” This is the Secular Grail Hawkins had been searching for since reading Crick's call for a framework back in 1979. It wasn't a standalone “aha” moment, however; it was preceded by two other supporting revelations. One happened in 1986, when he noticed that he could be surprised at even small changes in the locations of common objects on his desk, one of which was a cup, coincidentally. Or, if his stapler made a different sound, for example, he would notice. Likewise for the clock on the wall and the cursor on the screen. Thus he confirmed for himself what others are coming around to believing these days, and that is that the brain is an organ of prediction and is vigilant for its violations. Or, more specifically for Hawkins, prediction is a ubiquitous function of the neocortex.The other discovery came on the heels of answering the next logical question: “Just how does a brain make these predictions?” And the answer to that question required finding the commonality between predictions the brain makes under two different situations. The first one, referred to as the melody prediction problem, concerns predictions we make when the world changes around us. You might be listening to a sequence of notes in a song, for example. And the other situation is about predictions we make when it is our own actions that change the world. This could be the simple sequence of reaching for a cup. These are two very different situations, yet they share an important neurological commonality: The brain is ahead of the game, forming expectations about outcomes of each sequential segment before they occur.For the resolution to this question Hawkins drew upon the work of Johns Hopkins neuroscientist Vernon Mountcastle. In a 1978 essay called “An Organizing Principle for Cerebral Function: The Unit Module and the Distributed System,” Mountcastle proposed that the entire neocortex was an evolutionary appendage, a change in scale from a smaller, more primitive organ but very little change in function. Meaning that the differences from region to region in the neocortex are minor compared to the similarities.While visual experiences seem very different from tactile ones, and both of those are quite different from hearing, the physical structures of the cortical regions are much the same. The roughly 2.5-mm-thick neocortex is made up of six layers, and those six layers are transected by functional columns that receive stimuli from small regions of the sensory organs. Those columns are, in turn, composed of smaller mini-columns that respond to specific features of the stimuli coming from those small sensory regions. Mountcastle didn't specify what the functions of these columns were, but his intuition was that they were all carrying out the same basic algorithm. Hawkins, inspired by Mountcastle's intuition, took advantage of a Thanksgiving holiday in 2010 to figure it all out. I'll let the reader consult page 43 of A Thousand Brains for the morsels of that discovery and, for now, just relay the take-home message that every cell in every column participates in a never-ending cycle of learning and predicting.So thanks to a 2010 Thanksgiving holiday, Hawkins and his team had the answer to the melody prediction problem. That understanding came 24 years after his earlier realization that the brain was basically an organ of prediction (the arc of knowledge bends slowly). But it was followed in a mere 6 years by a resolution to the mystery about the brain that he had hoped to resolve in his lifetime, the problem posed in Crick's 1987 paper of there being no framework uniting the many facts known about the brain.The details of these discoveries are presented in Chapter 4, “The Brain Reveals Its Secrets.” In his summary of that chapter Hawkins says that the goal was to introduce the reader to the idea that every cortical column in the neocortex creates reference frames. As he tells it, the resolution came to him suddenly in his office, and “I was so excited that I jumped out of my chair and ran to tell my colleague Subutai Ahmad” (p. 52). In racing the 20 feet to Subutai's desk he recounts that he almost knocked over his spouse, Janet, who was coming to join him for lunch. But he didn't knock anyone over. Instead, his brain predicted what would happen if he carried out certain nonfunctional sequences and quickly organized a proper apology and generated an invitation to share a frozen yogurt. So the very process that Hawkins was studying saved a marriage, shared a yogurt, and allowed some time for seasoning before being presented to a friend and colleague. Sometimes things just work out.The three discoveries just described are detailed in a sequence of Numenta publications between 2016 and 2019. How the brain forms predictions in the presence of changing stimuli is explored in a 2016 paper called “Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex.” That was followed in 2017 by a further exploration of predictive learning in the context of an active agent. That paper is called “A Theory of How Columns in the Neocortex Enable Learning the Structure of the World.” And the long-sought framework in which these two predictive behaviors function is proposed in a 2019 paper called “A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex” (Hawkins, Lewis, Klukas, Purdy, & Ahmad, 2019). For details, I highly recommend reading these papers. The book provides an excellent overview of Numenta's agenda, but the papers have the facts: current models, with graphic illustrations of the neocortex, for example, and also how these models are implemented in computer designs, and discussions about the degrees of successes and failures.At the time of this writing, the mechanisms of the 2017 and 2019 papers are works in progress. They explore the complex issues of “Location,” how the brain keeps track of the changing relationship between the sensing organ and object properties being sensed, and “Framework,” how the brain maintains a sense of unity amid these ever-changing relationships. I mention this ongoing process of exploration and discovery only in passing since the present goal is to review Hawkins's book as it is. That being said, I encourage readers to view the videos, podcasts, and meet-ups that are made available on Numenta's website. Numenta does not hide the sometimes zig-zaggy starboard-and-port tack-and-advance process of discovery. Contrary to what von Bismarck might have said, sometimes it is worthwhile to see your sausage being made. Enjoy it or not, in Numenta's case it is a feature, not a flaw. It is an unusually rare opportunity to see scientific knowledge accumulation in action, flaws, failures, surprises, successes, flights of elation, and all.The kind of cognitions that Hawkins would like to implement in machines are best described by the umbrella term general intelligence. And so, as opposed to AI—artificial intelligence—it is AGI, artificial general intelligence, that Hawkins is hoping to emulate. A defining feature of general intelligence is flexibility. And flexibility is not a common feature of extant AI. Current AI tends to excel at specific tasks: winning at games like Chess, Go, and Jeopardy, for example, or carrying out routine operations like factory floor assembly or well-defined behaviors, as seen in self-driving cars. To accomplish these tasks, the machines are laboriously trained on large datasets. And after training they can only do that one thing. And if the task requirements of that one thing change, then the machine has to be trained again. That takes time, money, people, and resources. And taking time, spending money, and using people and resources are not typically associated with flexibility. But learning on the fly, acquiring new behaviors while doing that, not forgetting previously learned behaviors in the process, and doing it all on a lower budget—that could reasonably be thought of as “flexible.” Not being confined to fixed behaviors, learning while not forgetting, and doing this in sync with a changing world would, I think, qualify as intelligent in a very general way. Implement this in a machine and voilà, artificial general intelligence.Hawkins characterizes general intelligence with four attributes: learning continuously, learning via movement, having many models of the world, and using reference frames to store knowledge. The details of the neural architecture supporting these attributes are beyond the scope of this review. But there is a defining feature common to all four of them that I think is worth understanding in detail. And that feature is learning continuously, which means acquiring new sequences and remembering them. This feature was mentioned earlier in reference to Hawkins and Ahmad's assertion that learning sequences of patterns is the most fundamental operation of the neocortex.There are two settings in which sequence learning takes place: when the learner is more passive and the world outside the body is changing, and when the learner is intentionally exploring that world. In the tutorial that follows I will look at only the first, when the learner is more passive. But this first kind of sequence learning underwrites the second and is a common and nonoptional feature of all four attributes of general intelligence. So once this basic property is understood, all the other attributes can be appreciated on their own terms. So with that in mind I would like to turn to illustrating the most important skill of the neocortex: learning and remembering sequences.Figure 1 shows the basic architecture of the neocortex that underwrites sequential learning and memory. The basic graphic I will use to illustrate the dynamics of sequential learning is an expanded version of one of the rectangles shown in Figure 1d. Each rectangle represents one of six horizontal layers in the neocortex. And each gray dot represents a neuron within that layer. The model that describes the dynamics of these layers is called hierarchical temporal memory (HTM), a set of capabilities that ensures the acquisition and preservation of temporal sequences, which are regulated by both hierarchical and horizontal reciprocating loops.Figure 1a shows the outer convoluted surface of the neocortex. Jeff Hawkins points out that if you could unfurl this, it would be about the size of a dinner napkin. And as Figure 1b indicates, it is about 2.5 mm thick. The small objects in Figure 1b represent neurons. The differences in their size and shape point to the diversity of cell types. And the geometric patterns evident in Figure 1b indicate a characteristic distribution of vertical columns. These columns transect the six horizontal layers. Figure 1c depicts an expanded view of these columns, referred to as mini-columns in the literature. These are the columns that Vernon Mountcastle noted had such a high degree of uniformity throughout the neocortex. Figure 1d, as mentioned above, is a graphic depiction of the neocortex with the rectangles representing the six layers and the small dots representing neurons. Figure 1e is a drawing of a typical excitatory pyramidal cell, and Figure 1f represents this neuron's logical equivalent in the HTM model.The structure of the neocortical pyramidal neuron is critical to the ability of the brain to learn sequences. Sequences being defined as repeating temporal patterns of neuronal action potentials. There are three zones of influence that independently affect the likelihood of a cell generating an action potential. The three zones of influence are shown in Figure 1e. Two of these are referred to as distal because they are located at some distance from the axon hillock where certain action potentials take place. The third zone is called proximal because of its proximity to this part of the cell body. All three zones consist of dendritic branches that have reception sites for axons from cells in other regions of the brain.An enlargement of a segment containing some of these reception sites is shown in the box on the right-hand side of Figure 1e. The term apical dendrite refers to dendritic branching that is located at the furthest distance from the cell body. The afferent fibers coming to this region are believed to be bringing feedback to the cell. The basal dendrites, closer to the cell body, are believed to be carrying more local contextual information. And the proximal dendrites are typically thought of as bringing feed-forward inputs to the cell. Prototypical feedforward inputs would be afferents from the sensory organs of the body: vision, touch, and hearing, for example.Figure 1f is the HTM model of this pyramidal neuron. The circles beneath the horizontal bars feeding into the OR gates represent synaptic locations on the dendritic branches. And each circle represents the terminal end of an axon from a cell in a different part of the brain. It could be from within the same layer or from a more distant region. These inputs are horizontal connections from within the same layer as the target cell. Dark dots indicate active, or cell firing, locations on that branch. Therefore, a particular distribution of active sites along a dendritic segment represents a unique spatial pattern of activity into that cell.Figure 2c represents a cortical macro column in one layer of the neocortex. The vertically aligned dots represent 21 mini-columns in that macro column. I place this alongside renderings of neocortical mini-columns, Figure 2a, and their HTM counterparts, Figure 2b, just as a reminder of the origin of this graphic. The feature I want to call attention to is the horizontal connectivity throughout this layer in the brain. Cells in these layers have large-scale horizontal projections to other cells in the layer. If that connectivity were visually represented, then the figure would be a mass of curved lines, so I have drawn only two. Cells (2, 1), (4, 6), and (13, 5) have axons projecting to the dendrites of cell (7, 4). Cells (10, 1), (16, 5) and (21, 1) have axons projecting to the dendrites of cell (15, 2). The significance of this resides in how stimulation of cells in widely separated columns can simultaneously affect the firing likelihood of a particular cell in a different column.In this section we will look at how “neurons that fire together wire together.” That phrase is credited to Canadian neuroscientist Donald O. Hebb (1949). The mechanism that enables the firing together is the temporal overlap of a depolarization in the dendrites and a depolarization in the proximal zone of a cell. The cells that are “wired together” are the distant cells that stimulated the dendritic segment and the cell that that segment belongs to.At time T1 consider the pyramidal neuron in Figure 3 to be cell (9, 3) in Figure 4a. In this example the axons from cells (3, 1), (13, 2), and (18, 6) converge in close proximity to each other on the same dendritic segment. I have indicated this by darkening the synaptic locations in the dendritic box enlargement of Figure 3. As mentioned earlier, there is massive horizontal connectivity throughout the layer. For simplicity I am just showing one convergence.At time T2 a feedforward pattern arrives from a sensory organ. It could be a single musical note. This is indicated by the three upwardly pointing arrows labeled “pattern 1.” The three arrows index three features of that musical note. They could be the fundamental frequency and two of its harmonics, for example. Each of these features arrives at the proximal zone of all of the cells in their respective mini-column, causing all of them to fire. This firing is shown by changing the gray dots to black ones in Figure 4b.Hawkins and Ahmad (2016) note that learning a new pattern requires about 15–20 active synapses collocated along a short dendritic segment. For illustrative purposes in Figure 4, I have shown only three, reflecting the three-feature pattern of columns 3, 13, and 18.This temporal confluence of these three afferents can be strong enough to cause the membrane of that dendritic segment to depolarize. I have indicated this dendritic depolarization of cell (9, 3) with an outlined gray dot. This local depolarization of the cell membrane can cause the neighboring membrane segment to depolarize, which in turn can cause the next membrane segment to depolarize, and so on. Thus you can have a membrane depolarization spreading away from the initial dendritic site and expanding out across the cell body. I have indicated this with light gray shading in Figure 3. This depolarization can reach all the way to the axon hillock, but typically it will not cause the cell to fire.During this time it is possible that a feedforward signal will arrive from a peripheral organ at the proximal zone of the cell. It could be another note in a musical sequence, for example. This is indicated by the upwardly pointing arrows in Figure 4c labeled “pattern 2.” I have tagged this event as T2' to indicate that the arrival of pattern 2 is within the refractory period of the previous dendritic depolarization. That is, the dendritic depolarization is still active. This feedforward pattern 2 signal can initiate a local depolarization, causing all cells in the columns to fire. In Figure 3 I've indicated this depolarization with a darker shade of gray in the proximal zone of the cell and in Figure 4c by larger dashed circles around a black cell. This event will generate both an action potential traveling away along the axon and a membrane depolarizing spreading backward along the cell body and out toward the dendrites. This passive spreading depolarization away from the axon hillock and into the cell body toward the dendrites is sometimes referred to as backpropagation, or backprop for short. What happens next is the Hebbian learning that modifies the active dendritic segment to become a coincidence detector and turn that cell into a predictor of its own state change.If the spreading depolarization due to the axon potential reaches the recently depolarized dendritic segment while that segment is still active, then a chemical process known as synaptic plasticity ensues. This means that the dendritic segment now has a lower depolarization threshold for patterns similar to the one that caused its depolarization in the first place. Specifically, the cell will respond sooner to this input than it did previously. I ha

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call