In Intelligence By The Numbers – Part I we considered how to evaluate multiple measurements to evaluate the potential of an athlete or to evaluate the intelligence of a being by looking at the overall magnitude of a 3-dimensional vector. In our simplistic examples we ignored a few assumptions. We treated the numeric values which represent height, weight, and strength as equally important and also did not allow for the fact that the numbers used to represent these qualities do not cover the same ranges. For example, we used a strength index where 500 represents the strongest human alive. When selecting athletes we can easily compare an athlete with a strength of 400 to one of 200 and would pick the first over the second because they are twice as strong. But this simple numerical approach doesn’t work when we introduce other variables and simply add them together. Let’s say our first athlete is 68 inches tall and our second athlete is 78 inches tall. The first athlete is of average height for a male while the second athlete is significantly taller. If we take the naïve approach of combining the numbers for strength and height we get 468 and 278. Simply comparing these two numbers we might conclude that the first athlete is far superior to the second. But what information to these numbers truly convey? By blindly combining these two values, the meaning is diluted at best, perhaps almost lost. The situation only gets worse if we add a third variable. Let’s assume weights of 120 lbs and 180 lbs. This gives us an aggregate number of 588 for the first athlete and 458 for the second. Again, the first athlete seems to be far superior to the second based solely on our primitive numerical scheme. But what our method has done is to choose an athlete who is of average height and weighs only 120 lbs over an athlete who is well over 6 ft tall and weighs 180 lbs. The problem, or at least one of them, is that the numbers used to measure height weight, and strength have different domains. Height generally lies somewhere between 60 and 80, whereas weight (for an adult male athlete) may lie between 120 and 300. And of course our fictitious strength index seems to lie between 200 and 500. Because of this, when we combine these three attributes using simple addition, the meaning of those numerical values is lost. There is of course another concern regarding the importance of each of these traits, and this will vary depending on the specific sport we are choosing candidates for or even the specific role, or position, we are trying to select. But we will continue to ignore this issue for now. For now, let’s just focus on how to take three or more seemingly disparate values and combine them in some way that is meaningful.
To solve this problem we once again turn to a standard tool of mathematics, statistics. In statistical analysis, it is often important to understand just how much the values of a set of numbers is spread out, in other words we want to know how much the numbers vary. The variance of a set of numbers (typically called samples in statistics) can be thought of as a measure of how much the numbers vary. A variance of 0 indicates that all of the numbers in the sample are exactly the same. A small variance means that most of the sample values are ‘very close’. A high variance indicates that the values are more spread out.
Ok, so that’s great, but what good is the value of a variance anyway? Well, what we really want is what we can derive from the variance, namely, the standard deviation. The standard deviation is the square root of the variance and what that gives us is a tool which help us categorize all of the sample data. In a so-called normal distribution (the hypothetical bell-curve where all values are evenly distributed) we find that 68% of the sample data (typically referred to as the population) is within one standard deviation of the mean [The mean is what many refer to as the average]. You may recognize this from some grading schemes where the idea was to take the average grade in a class and that became the center of the ‘C’ range, with A’s and B’s above it and of course D’s and F’s below it. The standard deviation is often represented using the lower case Greek letter σ (sigma). For this reason we often refer to the bounds of this center region as -1 σ and 1 σ. The region above that is bounded by 1 σ and 2 σ and contains only 13.6% in a normally distributed population. The region bounded by 3 σ only contains 2.1 % of the population and the region above 3 σ only represents 0.1% of the population. What does all of this do for us? It allow us to come up with a standard way of looking at any sample of data and determine how to categorize the values in a meaningful way. For example we know that 99.8% of the data lies between -3 σ and 3 σ of the mean value. What we need to do now is to find a way to apply this concept to multiple variables simultaneously.
When we want to determine the range of multiple variables at the same time, you might think it is as simple as finding the standard deviation of each variable and scaling the values accordingly. In other words, we just force the values into a scale that goes, for example, from 0 to 100. This approach distorts the data in a very significant way. What we find is that the distances between our test cases is not preserved. Whereas in the original scaling it seemed obvious which athlete was the better candidate, normalizing the values makes it less obvious and may even make them indistinguishable. The challenge lies primarily in choosing the appropriate unit of measurement for each value. One way to do this is to find the standard deviation and then scale values in terms of standard deviation units from the mean. We are scaling data based on the covariance matrix; representing the data distribution both within and among the various dimensions of our samples. This approach gives us a much more usable result. This same approach can be generalized to many dimensions and is generally referred to as the Mahalanobis distance. It is widely used in data analytics, specifically in cluster analysis.
So what is the purpose of all of this and what has it got to do with whether a machine is capable of intelligence? The point is simply to demonstrate that human qualities and characteristics can be measured, compared, and classified. This is certainly not groundbreaking news; we have been doing this for decades in many areas including the estimation of lifespans in various populations, evaluating the health of newborns, and of course the Intelligence Quotient or IQ. Although the subject of much controversy, the aim of the IQ test is to classify the potential abilities of an individual. Despite the shortcomings of current methods, much of the work done in this area is applicable in assessing the intelligence of machines. I will revisit this later and will use this as a starting point in an attempt to classify machines into several categories of intelligence.