Statistics in Real Life: Rating Cars
Chapter One
Fun with Cars
Imagine that you work for the publisher of CONSUMER REPORTS (CR). When I did, we tested about 50 cars per year for the magazine. Each car had about 100 attributes that we had to factor into our published ratings. The ratings were printed, in summary, as five colored balls.
A solid black ball is called a POOR rating, with a value 1
A half-black ball is called a FAIR rating, with a value of 2
An empty ball is GOOD, valued at 3
A half-red ball is VERY-GOOD, valued at 4
A full red ball is EXCELLENT, valued 5.
Soon we will get to some simple examples, but first, let’s wonder about that scale: 1 to 5. Half way between 1 and 5 is 3, which we call GOOD. But we call 4 VERY GOOD. Why do we not have a single word, like POOR, FAIR, GOOD and EXCELLENT?
PRETTY GOOD won’t be an improvement. How about GREAT? What word would you suggest for half way between GOOD and EXCELLENT? Nice? Better?
Now, how about the rating for 0 – ZERO? Consumers Union, publisher of CR, determined that if any attribute of a product or service was dangerous to human health, it was UNACCEPTABLE. Thus, there is no spread of points between 0 and 1. There is a spread above zero, but zero is unique. There should be no shading of a rating of ZERO.
Wait – what was that? A spread of points? Yes, when you have a hundred attributes to add up for a rating you can not just publish the total of all the 1s, 2s, 3s, 4s and 5s. You have to add the numbers and divide by the number of numbers. You call that the average rating, right? So if the total is 476 for a total of 100 rating factors, the RATING is 4.76, which, when rounded up, earns a 5 which rates a red ball. Obviously we practice rounding off so that a car with a 4.45 average would get only a VERY GOOD rating. And we confess that our 5 point scale is really one of 500 points.
No, wait! If we say the scale begins with 0.001 and ends with 5.000 – just what is the spread? So, if we count all the numbers: 0.001, 0.002, …5.000, we have an awful lot of choices for our rating system.
Some organizations, such as the SAE and the NHTSA, use a ten- point scale: 1 to 10. A problem here is that the middle point is 5.5. So it is better to have a scale beginning and ending with even or odd numbers so that the middle rating is a whole number. Or, one could remember the NOT ACCEPTABLE rating, and make the scale 0 to 10. Now the middle is 5. Oh yeah – now you need eleven fingers. Dumb idea. Stick with the 0, plus 1 to 5 scale, but don’t allow the 0 to be counted.
Now we find another problem, and we have not reached the hard ones yet. One car got a 4.76 and the lesser car got a 4.45, with a real difference of only 0.31 points yet the balls indicate a difference of one whole point, three times the actual difference. Is that fair?
Here is a recent example. You can see that after years of criticism about the problem above, we at CU added the bar graph to the ball-ratings. Now it is obvious how closely rated some vehicles are.
Now for something a little harder
First let us think about ratings and rankings. The car with a rating of 4.76 does rank above the one with the 4.45 rating. But let’s talk about the cars in the chart.
You say that the Volvo is ranked 15th in the group. Does that mean it is one-fifteenth as good as than the one ranked number one at the top? The top three are all EXCELLENT. The rest, except for the Volvo are VERY GOOD. So, ranking is really pretty meaningless.
But wait, I see something else!
Notice the last column. The Mercedes Benz and the Infiniti I35 rate the same. Would you choose the more reliable car? I would.
Now we have entered the realm of weighing the attributes and ratings.
The reason I chose to emphasize the reliability over the performance was that my personal preference is for the more reliable car. Maybe because that is because I am much older than you are. Or maybe I really am focusing on the next-to-last column. That Mercedes is $8475 more costly. You might lean to the Mercedes because it is more glamorous – a consideration that CR does not rate.
So look at a matrix you might use to judge which car is most desirable for yourself.
Attribute | Range | Value (a) | Importance (b) | Rating |
Importance | 1 to 5 | (a) times (b) | ||
Performance | 1 to 5.00 | 4.9 | 3 | 14.7 |
Reliability | 1 to 5 | 1 | 5 | 5 |
Price | 1 to $100,000 | 42,320 | 5 | 211600 |
Glamour appeal | Subjective | "Looks Nice" | 2 | ??? |
My rating value > | ??? | |||
(a) This value comes from testing and analysis | ||||
(b) This number is a composite of the test crew opinion | ||||
OK. Now it is getting hard.
Obviously we cannot add apples and oranges like that. There are two columns to add. We must take care of thing like price which we would consider better if lower, not higher as we do performance. We also must invent a logical scale for an attribute such as “glamour”. We must normalize the metrics. All must be manipulated to fit the 1 to 5 rating metric. Clearly the first two do. Price must be put into a 1 to 5 scale and inverted. The 42320 is divided by 100000 to get .423. Simple – it is about one fourth as much as I would consider paying. But a $60,000 car would get a value of .600. Is that better? No, that is not so good. Well, invert the numbers. Now we have 2.36 and 1.67. Is that better? Well, now the cheaper car rates better than the expensive one.
Let’s see now – a car costing as little as $5000 would get a 0.05 or an inverted 20. A car costing the maximum would rate 1, and an inverted 1. Alright. Now the range is from 1 to 20.
So we normalize that into a 1 to 5 scale. All we need to do is divide the numbers by 4 and we have a scale of 0.25 to 5. That is not a 1 to 5 scale, but it does not make $100,000 a zero – not acceptable. It is just PRETTY POOR in this context.
To summarize: Divide the raw value by the maximum expected value. Invert it. Divide by what it takes to make the maximum value 5. This is really just like the numbers and balls situation.
ATTRIBUTE | METRIC | VALUE | IMPORTANCE | RATING |
Performance | 1 to 5 | 4.9 | 3 | 14.7 |
Reliability | 1 to 5 | 1 | 10 | 10 |
Price | 0.25 to 5 | 0.59 | 6 | 3.54 |
Glamour | Subjective | Neat! | 2 | ??? |
My rating value> | ||||
This is what a trial run looks like. The 0.59 value is based on a normalization of the price.
That looks pretty good. I can’t add 14.7, 10 and 3.54 to???, can I? So now we have to develop a scale of words that can be converted to numbers. Well, just for fun try this:
Rotten = 0 (Not acceptable)
Old fashioned =1
Plain = 2
Nice= 3
Neat = 4
WOOOW! = 5
Now, for the Mercedes:
ATTRIBUTE | METRIC | VALUE | IMPORTANCE | RATING |
Performance | 1 to 5 | 4.9 | 3 | 14.7 |
Reliability | 1 to 5 | 1 | 10 | 10 |
Price | 0.25 to 5 | 0.59 | 6 | 3.54 |
Glamour | Subjective | 4 | 2 | 8 |
My rating value> | 36.24 |
OK, so now I think the Mercedes is a 36.24.
And, now for the Infiniti:
ATTRIBUTE | METRIC | VALUE | IMPORTANCE | RATING |
Performance | 1 to 5 | 4.90 same | 3 | 14.7 |
Reliability | 1 to 5 | 5 | 10 | 50 |
Price | 0.25 to 5 | 2.95 | 6 | 17.7 |
Glamour | Subjective | 3 | 2 | 6 |
My rating value> | 88.4 |
Now we need one more table for this UPSCALE SEDAN class of cars. This one is for the perfect car.
ATTRIBUTE | METRIC | VALUE | IMPORTANCE | RATING |
Performance | 1 to 5 | 5 | 3 | 15 |
Reliability | 1 to 5 | 5 | 10 | 50 |
Price | 0.25 to 5 | 5 | 6 | 30 |
Glamour | Subjective | 5 | 2 | 10 |
My rating value> | 106 |
So, now we know the perfect car would get 106 points if I had my choices.
Here are the ratings
Vehicle | Points | Divide by 106 | Multiply by 5 | Balls |
Mercedes | 36.24 | 0.34 | 1.7 | Fair |
Infiniti | 88.4 | 0.83 | 4.15 | Very Good |
Perfect car | 106 | 1 | 5 | Excellent |
That is how we do it. Each month we have usually 4 or 5 cars considering about 200 attributes – the maximum we can think of. All of these calculations are put into spread sheets. The weights are set for each class of cars (Sedans, SUVs and so forth). All we need to do is plug in the names, prices, reliability numbers (from another source) and our ratings.
Let’s try that one more time. This time rate the Mercedes from the perspective of a wealthy young woman.
ATTRIBUTE | RANGE | VALUE | IMPORTANCE | RATING |
Performance | 1 to 5 | 4.9 | 5 | 24.5 |
Reliability | 1 to 5 | 1 | 3 | 3 |
Price | 0.25 to 5 | 0.59 | 2 | 0.7 |
Glamour | Subjective | 4 | 10 | 40 |
Her rating value> | 68.2 |
See? Price and reliability mean less, to her, than performance and glamour.
By the way, to be honest, the above examples contain falsehoods. Consumers Union does not use price or glamour in ratings. Instead, these things affect which class of cars the vehicle is rated within. This was fun to do anyway.
Chapter Two
Statistics in Real Life: Rating Automobile Safety
So far, we have used the words rating and ranking. The National Highway Traffic Safety Administration (NHTSA), which makes and enforces the Federal Motor Vehicle Safety Standards, uses other words. The idea is to prioritize things. That does not mean the same as IMPORTANCE. NHTSA rates countermeasures (solutions) for safety problems. Then they apply weights to the solutions to determine which ones have the best payoff. Payoff is influenced by the benefit-cost ratio. Payoff can be skewed by political considerations – such as when a congressional committee demands something.
Here are some considerations to rate and weigh solutions to a problem:
Frequency of occurrence
AAA times per 100,000 vehicles, or BBB times per year, or …
Severity of the consequence
The Accident Injury Scale (AIS) goes from 1 to 6. This scale is the reverse of the goodness scale. The number 1 means no injury; 6 means surely fatal within 24 hours – a failure.
Dollar value of the yearly loss of life and property
This is part of the benefit-cost ratio.
Probability of finding a solution
This is for any solution that works. See later.
Manufacturing cost of the solution
This is the other part of the b-c ratio.
Lead time to enact the regulation
If the is a lot of resistance from manufacturers, the rule-making process can take many years.
Lead time to implement the solution into production.
The implementation involves design, testing and tooling time. To overcome realistic objections, it is now allowed as many as 4 years, with 10 percent of all production the first year, 40% next, then 75% and finally all vehicles.
You can throw an awful lot of garbage into this pot. You haven’t even begun to assign weights, or to try to normalize these things so that each consideration is put into a 1 to 5 scale for weighing.
Now you know how hard it is to get things done in the government. The Administration (NHTSA) originates the accident research and does experiments to propose solutions. Sometime Congress butts in. They did this to demand that air bags be installed as well as seatbelts, to provide optimum occupant protection. Then, the Courts can get involved if the car makers are really upset with the rulemaking. Some car makers really hated to be told to put in air bags.
There are two kinds of mandates here. Consumers Union mandates that a rating of zero is to make the product (or service) NOT ACCEPTABLE. The other mandate is the command: DO IT OR ELSE! Congress said to the NHTSA, “Require air bags to be standard equipment soon, or we will cut your funding.”
Or, a Court could issue a mandate (judgment), too. “You are bad. Go to jail”
Guess what? This process we have been describing is at the heart of the whole rulemaking process. Everyone can argue about ratings and weightings. Many different sciences are involved in the process such as financial, engineering, biomechanical, political, legal and statistical. Some persons represent the sciences, other the public (consumer) and others, the politicians.
No matter how big the safety project, this is a framework for rational discussion between those participating. The biomechanical engineer and the lawyer can all have some input for attributes in their specialty. Everyone can make suggestions about the importance of the rating. Usually, it becomes the task of the chairman (Task Force Leader) to arbitrate the final weights.
Chapter Three
Statistics in Real Life: Personal Rating System
Now, for an exercise you can do that will be useful though your life.
Construct a matrix like this one. Make copies. Use them to judge your friends rationally.
*** My personal friendship rating for _________________ ***
ROW NUMBER | ATTRIBUTES | IMPORTANCE TO ME (ANY NUMBER) | RATING | WEIGHED RATING |
(A) | (B) | (A) * (B) | ||
1 | Good looks | |||
2 | Personality | |||
3 | Clothing | |||
4 | Sociability | |||
5 | Likes my family | |||
6 | My family likes him | |||
More? | ||||
My friend's rating> | SUM > |
You decide what attributes you want to consider about your friends. Then decide what level of importance to assign to each attribute. Make copies of the chart as it is so far. Then cover up the third column and rate each friend.
Be honest and you will learn a lot. So, now you know how MATH fits into everyday life.
Chapter Four
Statistics in Real Life: A short cut
Maybe this is a short cut, maybe not. You will see.
Now let us image that we have five work projects to do soon. How can we decide which one ought to be done first? Second? Without saying here, what the projects consist of, lets just call them A, B, C, D and E.
Line the project names like this.
A, B, C, D, and E
Now mentally compare project A with project B. Which one ought to be first – considering only that pair? Say that B is your choice here. So the order of projects is:
B, A, C, D, and E
Do that again, comparing A and C. Say that C is also more important than A.
B, C, A, D, and E
Let’s say that after you compare project A with D and E, there is no change.
B, C, A, D, and E
You have judged project A against the other four. Now compare project B with C, D and E. You already decided about B vs. A. You are not done. Run the comparisons again with C vs. D and E. Then once more, compare D and E.
Now you see that this short cut really is not so short. To do this correctly, you need to compare every pair. So A should be compared to B, C, D and E. And then B should be compared to C, D and E. Next C is tested against D and E, and finally D and E compared.
That is 4 +3+2+1 =10 comparisons to make. So you see that you can really save time with the rating system we started with. In this experiment you were attempting to rank multiple pairs by mentally comparing all the attributes of each project. Do you think you could recall all those attributes for ten comparisons?
Maybe you could. But if your boss should ask you why you decided to do project C before doing project A, you will be more convincing if you have your rating sheet. If the boss wants, he can change the ratings or the rankings. He could even ask you to consider some other attributes.
See, this kind of statistical analysis helps on the job. Image you are a stocker at the supermarket. Which products on the shelves do you restock first? Maybe you don’t do this analysis. Probably your boss did it. He has done it so often he can skip to just ranking the needs. Then he can tell you what to do.
The ranking short cut is best if you have three of just four things to consider.
Appendix
Kinds of ratings
Linear – straight line, uniform progression. Most common for things you measure, like the temperature or length.
Non-linear – a curve of some sort. The probability bell-curve curve is an example. Another very common example is the S-curve. This plots things for which the numbers approach a fixed level at the bottom, and the numbers approach some finite maximum at the top. Think about your body temperature. The bottom number is room temperature. A deadly fever is at the top.
Continuous – numbers like 1.0, 1.1, and 1.2 which can be divided into even smaller increments like 1.00, 1.10, 1.12, 1.20 and so forth.
Discrete – integer numbers like 0, 1, 2, 3 and so forth. These are useful for counting things. Negative numbers are useful, too. You think of some examples, OK?
Subjective – words expressed like Fair, Good, and Excellent to describe a car’s ride quality. These ratings must be transformed into numbers like 2, 3 and 5.
No comments:
Post a Comment