Category Archives: Intelligence

Resurrection of the Blade Runner

I spent one afternoon in 2014 wandering the underground concourse stretching from Tokyo’s Shiodome to the old Shimbashi station. It took me less than half hour to get to Shimbashi station; it took one and a half hours to find my way back. I realized that I wasn’t quite as good as I had thought at throwing down my mental bread crumbs as I wandered from my hotel. I had somehow gotten turned around as they say, heading southeast when I thought I was going northwest. As I made my way through the maze, up and down escalators, reaching a point that was clearly taking me further from my destination and sometimes into dead ends, I thought about the movie Blade Runner and how much Tokyo resembled the futuristic depiction of the city in the movie. There were no flying food carts and although it was rainy it wasn’t nearly as dismal as the scenes in the movie. What reminded me of the futuristic version of the city was its density, the way the buildings were efficiently laid side by side with the pathways laid end to end between them and the constant streams of people flowing in and out of the gated train entrances; pouring down off of the streets like water flowing into a storm sewer. At 37 million people, Tokyo is the world’s largest city. It is amazing that any given time of day there are so many people walking from one place to another. Walking from the train to work, walking from work to the train, going to lunch, headed to dinner, going from point a to point b.

As humans, we spend a lot of time and energy relocating ourselves. If we want to construct a new building or start a new company we need to locate and assemble a group of people suited for the various tasks that make up the larger project at hand. Not only to we need to do this initially but this becomes a recurring activity. As our project progresses we may find that we need additional resources. We may find that some of our resources are not performing as expected and need to be replaced. These organizational activities are perhaps the greatest inefficiency in execution of our day to day endeavors. 

Machines don’t have these problems, at least not the same way humans do. Of course they break down from time to time, but in general they perform consistently. They don’t need to commute to the office every day only to return home at night. As a matter of fact in many cases they are capable of working 24 hours a day, seven days a week. In the movie Blade Runner, like many sci-fi flicks, the machines known as Replicants were in many ways superior to humans, almost flawless in their physical abilities and at least as clever. But there is another aspect that is not addressed in this film. Within certain limits, machines can be taught more efficiently than humans. For this very reason it is rather unusual  for a highly trained person to move from one career to another. The learning curve for a person to become a qualified doctor generally prohibits anyone to enter the field unless they have chosen this profession beginning at a young age. Machines on the other hand can be trained rather quickly. In fact, the learning achieved by one machine can readily be transferred to another almost immediately. This presents an opportunity for incredibly more efficient use of resources than is possible with humans. Rather than endlessly moving human resources from one city to another or back and forth between home and office, we can position machines in a geography where they will most likely be utilized and train them as needed. Of course, not all machines are created equal, it is not feasible to simply create millions of machines which can be used universally for any task we call upon them to undertake. But this is still far beyond the capabilities of how we utilize human resources today. We already do this in a useful way today. If we need to tackle a project we might download an application to our laptop which helps us do this. When we decide to become more healthy we might download the latest fitness app to our phone. We don’t go out and buy a new phone or laptop every time we want to expand its capabilities. As devices become more powerful and flexible we will see them take on more tasks in our daily life. And slowly we will begin seeing the same thing in autonomous machines. Right now this is not the case. Our more specialized devices tend to be somewhat limited in how much they can expand their capabilities, but that is changing. It is what is referred to in the industry as the convergence: the convergence of many machines to the need for relatively few.

Imagine being able to start a new project requiring one hundred human resources but instead having the option of simply resetting and configuring one hundred machines which can perform the same task as well or better – machines that don’t go home at night or on weekends, machines that don’t take a break for lunch, machines that don’t get sick when a friend comes in from out of town. Imagine a scenario where those one hundred machines all sit in a single room the size of your living room. And all of those machines can exchange ideas, learning, and capabilities almost as easily as we share a document today. Also, this exchange would not be limited to those hundred machines in the same room; machines could transfer their knowledge and experience to other machines across the globe, and beyond. What advantage would this provide to any company that took this approach? And what disadvantage would a human have applying for this job?

Intelligence By The Numbers (Part II) – Statistics

In Intelligence By The Numbers – Part I we considered how to evaluate multiple measurements to evaluate the potential of an athlete or to evaluate the intelligence of a being by looking at the overall magnitude of a 3-dimensional vector. In our simplistic examples we ignored a few assumptions. We treated the numeric values which represent height, weight, and strength as equally important and also did not allow for the fact that the numbers used to represent these qualities do not cover the same ranges. For example, we used a strength index where 500 represents the strongest human alive. When selecting athletes we can easily compare an athlete with a strength of 400 to one of 200 and would pick the first over the second because they are twice as strong. But this simple numerical approach doesn’t work when we introduce other variables and simply add them together. Let’s say our first athlete is 68 inches tall and our second athlete is 78 inches tall. The first athlete is of average height for a male while the second athlete is significantly taller. If we take the naïve approach of combining the numbers for strength and height we get 468 and 278. Simply comparing these two numbers we might conclude that the first athlete is far superior to the second. But what information to these numbers truly convey? By blindly combining these two values, the meaning is diluted at best, perhaps almost lost. The situation only gets worse if we add a third variable. Let’s assume weights of 120 lbs and 180 lbs. This gives us an aggregate number of 588 for the first athlete and 458 for the second. Again, the first athlete seems to be far superior to the second based solely on our primitive numerical scheme. But what our method has done is to choose an athlete who is of average height and weighs only 120 lbs over an athlete who is well over 6 ft tall and weighs 180 lbs. The problem, or at least one of them, is that the numbers used to measure height weight, and strength have different domains. Height generally lies somewhere between 60 and 80, whereas weight (for an adult male athlete) may lie between 120 and 300. And of course our fictitious strength index seems to lie between 200 and 500. Because of this, when we combine these three attributes using simple addition, the meaning of those numerical values is lost. There is of course another concern regarding the importance of each of these traits, and this will vary depending on the specific sport we are choosing candidates for or even the specific role, or position, we are trying to select. But we will continue to ignore this issue for now. For now, let’s just focus on how to take three or more seemingly disparate values and combine them in some way that is meaningful.

To solve this problem we once again turn to a standard tool of mathematics, statistics. In statistical analysis, it is often important to understand just how much the values of a set of numbers is spread out, in other words we want to know how much the numbers vary. The variance of a set of numbers (typically called samples in statistics) can be thought of as a measure of how much the numbers vary. A variance of 0 indicates that all of the numbers in the sample are exactly the same. A small variance means that most of the sample values are ‘very close’. A high variance indicates that the values are more spread out.

Ok, so that’s great, but what good is the value of a variance anyway? Well, what we really want is what we can derive from the variance, namely, the standard deviation. The standard deviation is the square root of the variance and what that gives us is a tool which help us categorize all of the sample data. In a so-called normal distribution (the hypothetical bell-curve where all values are evenly distributed) we find that 68% of the sample data (typically referred to as the population) is within one standard deviation of the mean [The mean is what many refer to as the average]. You may recognize this from some grading schemes where the idea was to take the average grade in a class and that became the center of the ‘C’ range, with A’s and B’s above it and of course D’s and F’s below it. The standard deviation is often represented using the lower case Greek letter σ (sigma). For this reason we often refer to the bounds of this center region as -1 σ and 1 σ. The region above that is bounded by 1 σ and 2 σ and contains only 13.6% in a normally distributed population. The region bounded by 3 σ only contains 2.1 % of the population and the region above 3 σ only represents 0.1% of the population. What does all of this do for us? It allow us to come up with a standard way of looking at any sample of data and determine how to categorize the values in a meaningful way. For example we know that 99.8% of the data lies between -3 σ and 3 σ of the mean value. What we need to do now is to find a way to apply this concept to multiple variables simultaneously.

When we want to determine the range of multiple variables at the same time, you might think it is as simple as finding the standard deviation of each variable and scaling the values accordingly. In other words, we just force the values into a scale that goes, for example, from 0 to 100. This approach distorts the data in a very significant way. What we find is that the distances between our test cases is not preserved. Whereas in the original scaling it seemed obvious which athlete was the better candidate, normalizing the values makes it less obvious and may even make them indistinguishable. The challenge lies primarily in choosing the appropriate unit of measurement for each value. One way to do this is to find the standard deviation and then scale values in terms of standard deviation units from the mean.  We are scaling data based on the covariance matrix; representing the data distribution both within and among the various dimensions of our samples. This approach gives us a much more usable result. This same approach can be generalized to many dimensions and is generally referred to as the Mahalanobis distance. It is widely used in data analytics, specifically in cluster analysis.

So what is the purpose of all of this and what has it got to do with whether a machine is capable of intelligence? The point is simply to demonstrate that human qualities and characteristics can be measured, compared, and classified. This is certainly not groundbreaking news; we have been doing this for decades in many areas including the estimation of lifespans in various populations, evaluating the health of newborns, and of course the Intelligence Quotient or IQ. Although the subject of much controversy, the aim of the IQ test is to classify the potential abilities of an individual. Despite the shortcomings of current methods, much of the work done in this area is applicable in assessing the intelligence of machines. I will revisit this later and will use this as a starting point in an attempt to classify machines into several categories of intelligence.

Intelligence By The Numbers – Numerical Representation of Humans and Machines

In this post I lay out the first part of a mathematical foundation for comparing the capabilities of machines to humans. Let me begin by saying that this is not intended to be a formal mathematical proof that machines can be more intelligent than humans and I have made no attempt at approaching this with any sort of mathematical rigor. The purpose of this exposition is simply to focus attention on existing mathematical concepts in the context of the ongoing discussion regarding the metrics we use to measure intelligence. I will later use this to show why I believe that machines will far outperform us no matter what criteria we might adopt in measuring intelligence. The notion of intelligence and intelligent behavior is an abstract one and any attempts to quantify intelligence in humans, animals, or other entities, both living and non-living, have fallen short of expectations. Mathematics has long served as a powerful tool for abstraction of concepts and manipulating those concepts in useful ways. We have successfully deployed it as a tool for working in domains where we have a solid understanding, such as chemistry, as well as domains in which we are still discovering their true nature, such as quantum field theory. Perhaps the most famous example of using mathematics to represent concepts which were not yet understood is Einstein’s theories of relativity. In this case we see an example of using mathematics to build a theory which predicts the existence of phenomenon which were formerly unknown. By representing the world around us in abstract mathematical terms, we are able to use the rich set of mathematical tools at our disposal to test hypotheses and follow them to their logical conclusion.

Representing Humans with Vectors

I begin by introducing a simple concept used in many domains throughout business and science alike – the vector. The notion of a vector is a simple one. It is a series of numbers which comprise a collection treated as a single unit. Although vectors can be manipulated and compared completely independently of any association with real world values, for our purposes we will assume that they represent something ‘real’. Choosing a simple example will make it easier to introduce some concepts typical of vectors . For example, we might use a vector to represent a person with respect to two values such as height and weight. Most of us are familiar with graphing values on an X/Y axis so we will use this to review some simple concepts before moving on to more useful examples. Let’s start with a vector to represent a person (P1) with a height of 72 inches and weight of 220 pounds and another person(P2) with a height of 76 inches and weight of 180 pounds, represented as:

P1 = [72, 220]  P2 = [76, 180]

These two vectors represent our simplistic categorization as depicted in 2-dimensional space shown below.

HeightWeight
2-Dimensional Vector Representation

We can easily see that P1 is greater in terms of weight and that P2 is greater in height. We can also see that taking the two attributes combined (if we assume that height and weight are equally important) P1 is slightly greater than P2 because the line representing P1 is longer than the line representing P2 . This measurement is known as the magnitude of a vector. We say that the magnitude of P1 is greater than the magnitude of P2 .

Now let’s take this comparison and apply it to a situation. Assume we are trying to choose between two athletes for a sports team such as a basketball or football team. In our simple world we only have two data points: height and weight. Also, we will assume that height and weight are equally important on our fantasy sports team. We can see that athlete 1 is heavier at 220 pounds but athlete 2 is taller at 76 inches. But as shown above, athlete 1 has a greater overall magnitude when we combine the two values. So by these very simple criteria athlete 1 is our clear choice. But even in our simple version of the world we would soon realize that there is more to selecting the better athlete than just comparing height and weight, so we begin looking for better criteria to improve our selection process. Perhaps one of the most obvious is strength. There are many ways to measure strength, but since we want to keep our examples simple, we will assume there is some test that gives us a strength index that goes from 0 to 500, where 500 is the strongest human alive. We find that our first athlete has a strength index of 300 and the other a strength index of 450. Our numerical representation of our two athletes now looks like this:

P1 = [72, 220, 300]  P2 = [76, 180, 450]

Visually we can represent this in 3-dimensional space as shown below.

3-Dimensional Vector
3-Dimensional Vector

Just as in our 2-dimensional example, we can evaluate each vector in terms of magnitude, or overall length of the line shown from the origin to the endpoint. Given this added dimension in the two vectors, we can clearly see that the second athlete, P2 , is our better choice.

There are many details and considerations to take into account such as the number of data points beyond our simple 3-dimensional example and how to account for the fact that not all data points should be treated equally. For example, we may want to construct a model where strength is three times as important as weight. I will address these issues and more later on. But for now we have enough to put forth the basis for evaluating the level of intelligence in a human or a machine.

If we take our simple example and translate into the domain of intelligence we might choose to evaluate the three criteria previously suggested in Criteria for Intelligence: speed intelligence, collective intelligence, and quality intelligence. As in the above example where we simplified the notion of the athlete’s strength into a strength index, we will assume we have come up with a method of representing these three rather complex aspects of intelligence with a numerical index. For brevity’s sake, I will refer to these three values which correspond to the three different types of intelligence as speed, breadth, and quality. At this point we haven’t established any real meaning for these measurements so the values are arbitrary. Using the same notation as before to represent these values as a vector for two different people  we have:

P1 = [72, 220, 300]  P2 = [76, 180, 450]

Once again we can represent the intelligence of our two subjects in 3-dimensional space as shown below:

SpeedBreadthQuality
Vector Representation of Three Aspects of Intelligence in a Person

By the given criteria  our second subject P2 is clearly the more intelligent overall as shown by the magnitude of the vector.

The above examples show, albeit in a very simplistic manner, a very straightforward and well-defined way of evaluating multiple characteristics to evaluate and compare two or more subjects using the mathematical structure known as a vector. This concept, along with other mathematical tools which I will introduce later on, have been used for hundreds of years in diverse fields ranging from mathematics and physics to business and finance. I intend to apply these resources in the ongoing effort to explore and ultimately give some clarity to the many questions that fall under the general heading of “What is Intelligence?”, as well as to further my case for the ultimate superiority of machines in the not so distant future.

Quality Intelligence – Machines that can Predict the Future

The notion of quality intelligence, unlike speed intelligence, is difficult to define in measurable terms. It can best be described by example. Consider the case where the ability of one person to perform a certain task is clearly superior to another. For example, one person may be good at math while another just doesn’t “get it”. One person may have a natural ability to learn to play a musical instrument with ease while another finds learning the same instrument an exercise in frustration. One person may be good at recognizing patterns, such as constellations in the sky, whereas another has difficulty seeing them. In these examples the issue isn’t how fast one performs the task, it isn’t a matter of speed intelligence. It also isn’t a matter of how much information we can process or how much we know, it isn’t a matter of collective intelligence. This intelligence requires a specific kind or aspect of intelligence, it requires the ability to think or to process information in a way which is specific to a particular task. The need for developing very specific cognitive skills is often what differentiates one individual from another, especially in highly skilled areas. Mastery of advanced skills require a particular and often unique approach. We often hear people talk about the ability to “see” a problem in a specific way. It is one of the key characteristics which differentiates top performers in science, business, or the arts.

So far we have only considered abilities and domains that are well known to us. But what about abilities that are unknown to us because they are out of our reach? Nick Bostrom suggests “… the idea of possible but non-realized cognitive talents, talents that no actual human possesses…”. What these abilities might be we can only guess. Might a person or other entity be able to foresee the future? Not by literally being able to see the future, but by observing the obvious consequences of the present, not evident to the comparatively less intelligent population around them. While predicting the future may sound like any one of many films that come out of Hollywood, we have readily come to accept that we can now predict the weather with a fair amount of accuracy, at least for a few days. As our ability to understand the systems that manifest themselves as wind and rain, hot and cold, we have learned to “see” tomorrow’s weather before it happens. In medicine, we are constantly striving to understand the system of human physiology and the malfunctions that we know generally as “disease”. As physicians, chemists, and other scientists increase their collective intelligence, it in many cases increases their quality intelligence. While the knowledge that arises out of collective intelligence may be necessary to discover the cause and ultimately the cure for a particular malady, it is not necessarily sufficient. For that leap in advancement, we need quality intelligence. We need to be able to “see” a complex system of biology, the human body, and its interaction with another complex system, the environment, in a particular way. It is this “seeing” that leads to an abstract concept we call “understanding”.  It is this understanding that allows us to see that a person is becoming ill before they show symptoms.

As humans and machines become more knowledgeable, more intelligent, and more powerful, the ability to “see” events before they happen will become more and more prevalent. In a competitive environment, the individual who is better at seeing what will happen next has a distinct advantage over everyone else. In the future, whoever can see more clearly into the future will be ahead of the rest of the pack. That individual will be one of the top people….or one of the top machines.

Collective Intelligence

Nick Bostrom has described collective intelligence as joint problemsolving capacity”. There are many examples of problems that have been solved by more than a single individual and which probably could not have been solved by a single person. When a new drug is discovered and developed it is not done by a single person but by a team of people working toward a collective goal, sometimes in collaboration with other teams performing similar, related research. Projects such as the Human Genome Project was a massive research project that was only able to achieve its goal by the collective research of twenty teams across six countries over thirteen years. In our daily lives we often tackle jobs at work that require the cooperation of several people or large teams to solve problems and implement solutions. We are surrounded by evidence that our problem-solving potential is increased by combining the skills of multiple individuals. This increase in potential is realized for two reasons.

First, we have an additive effect. Think of a simple example where our goal is to identify all red objects in a warehouse. A single individual may be able to pick out and identify each object in 1 second. Therefore they can sort through 60 objects per minute, 3600 objects in an hour. But if we can enlist ten people to take on this task, assuming they are all able to sort through objects at the same rate, we can sort through 36,000 in an hour. In other words, we can accomplish the task ten times as fast! This is of course a rather simple uninteresting example, but this is the way many tasks are accomplished. The pyramids of Egypt were built this way, and many intellectual tasks are as well.

The second reason we can accomplish difficult tasks more easily with a group is due to the breadth of skills required to complete the task. We live in an increasingly complex world and the problems we are faced with are constantly increasing in complexity as well. As the tasks become more complex, solving them requires an ever-increasing array of problem-solving skills. While this phenomenon has been evolving for a long time, the first good example with a significant impact is found in Henry Ford’s approach to building the automobile. Rather than relying on a single individual or even a few individuals with the requisite skills to build a car, he broke the overall tasks into smaller tasks requiring a degree of skill that could be developed in the workers in a short amount of time. Today we see an even greater divergence of skills required in our world. Think of the number of people that contribute to the treatment of a patient during their stay at a hospital. Even going to a store to buy something sometimes involves several people. We ask where we can find what we are looking for and are directed to the correct department. Once there we can’t find what we are looking for and have to ask someone who works in that department. We then ask them about some feature of the product, they don’t know the answer and go find someone who knows more about that product. Finally, we go to the register where we pay for what we have selected. While this may not seem like something that exemplifies the height of human intelligence, it demonstrates how much of what we undertake relies on the knowledge of multiple individuals. This is a simple form of collective intelligence.

There is one more aspect of collective intelligence that is important to recognize. In the past we see examples of a brilliant individual such as Thomas Edison or Alexander Bell who were great innovators. But even these great names relied on the discoveries and knowledge gained from others, some their contemporaries and some from the past. More than any other species, humans have the ability to learn from their predecessors. This ability is referred to by Michael Tomasello and Steven Mithen as cultural learning. In its simplest form it can be seen as a child learning not to cross the street without looking for oncoming cars. It can be seen as the fundamentals of reading, writing, and arithmetic that we learn in our first few years of school. It is seen in our education where we learn the necessary skills for our careers. Although we don’t often think of it, when we attend a year of school to learn a trade or skill we are learning skills and knowledge that took hundreds of years to accumulate. The long term effect of this type of knowledge transfer is incredible. It is what has allowed us to develop rocket ships that fly to the moon and understand the complex system we know as the human body. It is what has led to the development of  the computers that are so advanced, so fast, so intelligent – that they may soon surpass us in intelligence in its every form.

Intelligence – Very, Very Fast Intelligence

Speed superintelligence is quite simply the ability to do what a human can do, but much faster.

Through training, we are able to get faster by optimizing our neural circuitry. The result is that our reactions become faster, allowing our performance to increase. This is most evident when we pair a well-trained athlete with an athlete of a relatively low skill level. To the beginner, everything seems to be progressing at a breakneck pace, barely able to keep up. To the well-trained expert, time seems to be moving at a slow, almost relaxed pace. We often say that the pro makes it all seem effortless.

This optimization through training yields an improvement by a factor of two or three. In a machine, we will realize improvements by a factor of hundreds or thousands. In specific problem areas, we already have machines which are faster than humans at a factor approaching one million times.

Imagine a scenario where you are competing with a machine for a job at a bank. It might be a position processing loans or mortgages, or perhaps managing a portfolio of investments. Both of these jobs already leverage computers for accessing and processing in formation. But today we still rely on human loan officers and portfolio managers to make decisions about which loans or mortgages to approve, which stocks to buy, and when to sell them. But what if we had enough confidence to let the computers make all of these decisions on their own, without supervision? How much more quickly could these computers process loans or make buying and selling decisions on investments? These automated financial agents could easily make decisions and process transactions at a rate that would allow them to take the place of tens or even hundreds of their human counterparts. In fact, they are used extensively in the financial sector today and continue to become more pervasive. Automated trading has been used to accelerate trades for years and has now become available even to the individual investor.

From the perspective of speed and efficiency, machines clearly are more intelligent than humans already. They have become so good at what they do that they make their task seem effortless.

Criteria for Intelligence

Before delving further into how machines might become intelligent it is helpful to define, or at least describe, what is meant by intelligence when referring to machines. I. J. Good, who worked with Alan Turing during World War II and is credited as the originator of the oft cited term ‘singularity’, described what he called an “ultraintelligent” machine as “a machine that can far surpass all the intellectual activities of any man however clever”. Good and many others tend to focus not just on what constitutes intelligence but specifically on how and when we will know that machines will surpass the capabilities of human. Regardless of whether we are defining intelligence or ultraintelligence, the type of the criteria should be the same. This gives us the basis for one possible criterion for intelligence, viz. the ability to perform a task or activity as well as an average person. This gives us the basis for one possible criterion for intelligence, viz. the ability to perform a task or activity as well as an average person.

In Superintelligence: Paths, Dangers, Strategies Nick Bostrom states : “…we use the term ‘superintelligence’ to refer to intellects that greatly outperform the best current human minds across many very general cognitive domains.” He goes on to suggest that it is helpful to decompose this notion into three categories of superintelligence: speed superintelligence, collective superintelligence, and quality superintelligence.

Speed superintelligence is defined as “a system that can do all that a human intellect can do, but much faster.”

Collective superintelligence is defined as “a system composed of a large number of smaller intellects such that the system’s overall performance across many very general domains vastly outstrips that of any current cognitive system.”

Quality superintelligence is defined as “a system that is at least as fast as a human mind and vastly qualitatively smarter.”

Each of these three definitions covers a different form of intelligence and is in fact the product of a different type of system. The next three posts will cover these in more detail.

 

 

The Beginning of Real Machine Intelligence

In my post The End of Artificial Intelligence I raised the question “What is real intelligence?” I went on to refine this question to “What criteria constitute intelligence on the same level as we humans?” This is not an easy problem to address, primarily because not everyone means the same thing when referring to the quality of intelligence and it can be used to mean different things in different contexts. For example, if a person makes a lot of mistakes we might declare that they are not very intelligent. If a person is very good at their job, especially when that job requires learning something complex and some degree of mental difficulty, such as solving problems or managing a financial portfolio, we refer to that person as smart, astute, or intelligent. Yet, if I were to create a machine that is capable of doing that same task, managing a financial portfolio for example, most of us would hesitate to say that this machine was intelligent. We might say it a good system, perhaps even advanced. We might say things like “this machine is very capable.” But to confer intelligence on a machine is not a leap most of us are comfortable making. So exactly what is it we are looking for before we can confidently pronounce a machine to be intelligent? I will attempt to lay out a few basic criteria. Whether these criteria are all necessary for intelligence I leave as an open question along with whether these criteria are sufficient for intelligence. For the moment, my goal is only to come up with some minimal and rather basic notions for consideration. In other words, where does intelligence begin? At what point are we willing to say that a machine possesses intelligence?

In my next three posts I will present the things that I feel are probably the minimum characteristics of any intelligent entity. This will form a basis for discussing how to apply those criteria along with caveats, concerns, and other possible factors to consider.

The Future of Cyber-Warfare and Cyber-Security – Part I

I am speaking at the Cloud Expo in Santa Clara this week on The Future of Security in the Cloud so this week I have decided to lay out what I believe will be a few of the biggest concerns in internet security over the next few decades. I will return to my quest for intelligence in a few weeks.

The world we live in is changing rapidly and the pace is only going to accelerate. The impact of these changes will be immense. The changes in technology and the way we use technology will impact each and every one of us. And one of the biggest will be the impact to our online security – the ubiquity of online devices coupled with the use of intelligent machines will change the security landscape forever. Cyber Warfare has been around for years now. It has probably existed since we started connecting computers to the Arpanet back in 1969. The forerunner of the internet, the federally funded project known as Arpanet was used to send the first message across a distributed network. Back then we didn’t have firewalls; there was nothing to monitor suspicious activity. It was a trusted environment where nobody had any reason to believe there was any threat of having their data compromised or their facility broken into by pranksters, criminals, and government agencies. Cryptology had been advancing since before World War II far beyond the complexity of the ciphers used in ancient Rome. Over the last three decades of the twentieth century we saw incredible advances in technology giving rise to the current generation of the internet and what we now know as the World Wide Web. At the same time we have seen the development of complex machines capable of waging war from thousands of miles away, conducting surveillance from hundreds of miles above the earth, reaching deep into the innermost thought of companies and private citizens via the information stored on their computers.

At this rate we will see virtually no limit to what can be known about anyone or anything. If someone wants to know what food you have in your refrigerator they will skim that information off of the next generation net. What some people refer to as SkyNet, as homage to the Terminator trilogy and suggestive of the perils it may bring to humankind, will bring with it vast potential. Potential power – potential productivity – and potential abuse. Guarding our digital assets by guarding the single endpoint like the drawbridge to the castle will no longer be feasible (it’s actually not working all that well now). As long as our defenses rely on trying to identify the bad guys and stop them as they come through the door, failure will be inevitable. Almost all security breaches in the past few years have been due to vulnerabilities in the web application or mobile application – best estimates suggest about 86%. That means that most breaches could be avoided by simply writing application software that didn’t have bugs in it. Of course this isn’t as easy as it sounds. It is generally assumed that all software has flaws, there is no such thing as bug-free software. After all, we are only human.

But what if software wasn’t written by humans? And what if networks weren’t configured by humans? We have already seen widespread increases in the use of computers and other machines to increase quality in many industries from heavy manufacturing to electronics. Surely we can bring the many years of knowledge and experience gained from quality engineering in other fields to the software industry. As computer driven engineering becomes more pervasive, we are able to build products of all shapes and sizes and of great degrees of complexity with better results: better quality, better time from drawing board to production, better flexibility and customizability. We have used computers for years to gain productivity in software design and implementation and even testing. The advantages are undeniably astounding. One person writing code line by line, at a rate of less than one hundred lines of code per day, might take a year to write even a relatively simple non-graphical application. Today, through extensive use of modular, well-specified APIs, one person with a good understanding of software development can design and create a small but useful application in a day.

As this continued evolution of creating applications through the use of highly automated and very mature toolsets begins to integrate design, implementation, and testing we will see a new level of maturity in the field of application security assurance. No longer will we need to write code while checking a list of do’s and don’ts for secure coding. The need to have someone test our code as a last gateway before it gets rolled off the production line will be an anecdote in history. No doubt this sounds incredible to you. Sending your newest mobile application up to the online store without running it through the final Quality Assurance (QA) test to be blessed is like lion taming wearing a blindfold right? But if we know the software is built right, we really don’t need to test it one more time, do we? After all, our QA process doesn’t do anything but test for known vulnerabilities. We have a long list of ‘things that could be wrong’ and we try to identify whether any of these mistakes have made into our application. Isn’t there a better way to accomplish this?

Now I must confess to a bit of sleight of hand here. I’ve been saying that in the future it won’t be necessary to send software to that final all-inclusive QA testing before releasing it. But I didn’t say that testing software wouldn’t require testing, at least not in the way we test software today. The key is to validate our code as it is written. Think of it this way. If we create an application and it has a security vulnerability in it that we can identify during our QA testing, then that vulnerability exists because of a specific fragment of code. Before that fragment of code was introduced to our application the vulnerability didn’t exist. As soon as we add that fragment to the application the vulnerability does exist. So all we have to do is to identify that fragment of code as soon as it is added to our application. It’s that simple. And this simple but arduous task is precisely what computers are good at, and getting better at all the time. For any known security vulnerability (remember that’s all we have been testing for) we simply check every fragment of code as it is added to our application. It couldn’t be simpler. OK, once again I have made a statement that is not quite accurate. Security vulnerabilities aren’t generally a result of a single self-contained fragment of code. They are more often due to the way multiple fragments of code are connected to each other, in other words it depends on the context of the code fragment. But that doesn’t change anything except the number of fragment combinations the computer needs to identify and the level of difficulty in specifying these combinations. As the ability to automate software development improves, including testing for security flaws, we will see less and less need for the type of human involvement in writing and testing code. In fact, computers will be much faster and produce better results, making the manual aspects of software development an anachronism. As the task, and the responsibility, for creating software and the tools that test it shifts more and more to computers, we will see a shift in the ability to create bug-free software.

There may be someone reading this who has written some sort of code-checking program or perhaps written a full blown scanning engine to search for security vulnerabilities in code. Right now they are saying, “That’s not possible. It isn’t that easy! This is a very complex problem.” You are correct. This, like many challenges that arise in developing good robust software, is a problem that seems beyond the reach of a fully automated solution. But there is only one reason for this. We are only human. As the responsibility for creating software and the tools that test it shifts more and more to computers, we will see a shift in the ability to create bug-free software.

Automating application security testing will not be an option, it will be a necessity. Face it, computers are better at some things than humans. Hiring security testers to manually test your application will be a thing of the past. They will not be able to keep pace with current technological advances. The only way to thoroughly test applications is by leveraging the application security expertise of a human empowered by best of breed automated software testing tools.

Automation will be key to success. Next week I will move on to Cyber Warfare.