<?xml version="1.0" encoding="UTF-8" standalone="no"?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0"><channel><title>free Learning Center</title><description>Free Learning Center as a medium to learn about developments in Information Technology, Databases, Data Mining, Tips and Tricks, Linux, Online Business, Internet Marketing, Articles and other interesting materials</description><managingEditor>noreply@blogger.com (Andy)</managingEditor><pubDate>Fri, 20 Mar 2026 04:45:21 -0700</pubDate><generator>Blogger http://www.blogger.com</generator><openSearch:totalResults xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">11</openSearch:totalResults><openSearch:startIndex xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">1</openSearch:startIndex><openSearch:itemsPerPage xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">25</openSearch:itemsPerPage><link>http://freelearningcenter.blogspot.com/</link><language>en-us</language><itunes:explicit>no</itunes:explicit><itunes:subtitle>Free Learning Center as a medium to learn about developments in Information Technology, Databases, Data Mining, Tips and Tricks, Linux, Online Business, Internet Marketing, Articles and other interesting materials</itunes:subtitle><itunes:owner><itunes:email>noreply@blogger.com</itunes:email></itunes:owner><item><title>An Overview of Data Mining Techniques</title><link>http://freelearningcenter.blogspot.com/2011/12/overview-of-data-mining-techniques.html</link><category>Data Mining</category><category>Information Technology</category><author>noreply@blogger.com (Andy)</author><pubDate>Thu, 15 Dec 2011 23:21:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-2297802068996031184</guid><description>This overview provides a description of some of the most
common data mining algorithms in use today.&amp;nbsp;&amp;nbsp; We have broken the
discussion into two sections, each with a specific theme:

&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Classical Techniques: Statistics, Neighborhoods and
    Clustering&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Next Generation Techniques: Trees, Networks and Rules&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
Each section will describe a number of data mining
algorithms at a high level, focusing on the "big picture" so that the
reader will be able to understand how each algorithm fits into the landscape of
data mining techniques.&amp;nbsp;&amp;nbsp; Overall, six broad classes of data mining
algorithms are covered.&amp;nbsp; Although there are a number of other algorithms and many
variations of the techniques described, one of the algorithms from this
group of six is almost always used in real world deployments of data mining
systems.
&lt;/div&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;

&lt;h3&gt;
&lt;span style="font-size: medium;"&gt;I. Classical Techniques: Statistics, Neighborhoods and Clustering&lt;/span&gt;&lt;/h3&gt;
&lt;h3 style="mso-list: l10 level2 lfo13;"&gt;
&lt;span style="font-size: medium;"&gt;1.1. The Classics&lt;/span&gt;&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
These two sections have been broken up based on when the
data mining technique was developed and when it became technically mature enough
to be used for business, especially for aiding in the optimization of customer
relationship management systems.&amp;nbsp; Thus this section contains descriptions
of techniques that have classically been used for decades the next section
represents techniques that have only been widely used since the early 1980s.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This section should help the user to understand the rough
differences in the techniques and at least enough information to be dangerous
and well armed enough to not be baffled by the vendors of&amp;nbsp; different data
mining tools.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The main techniques that we will discuss here are the
ones that are used 99.9% of the time on existing business problems.&amp;nbsp; There
are certainly many other ones as well as proprietary techniques from particular
vendors - but in general the industry is converging to those techniques that
work consistently and are understandable and explainable.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo12;"&gt;
1.2. Statistics&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
By strict definition "statistics" or
statistical techniques are not data mining.&amp;nbsp; They were being used long
before the term data mining was coined to apply to business applications.&amp;nbsp;
However, statistical techniques are driven by the data and are used to discover
patterns and build predictive models.&amp;nbsp; And from the users perspective you
will be faced with a conscious choice when solving a "data mining"
problem as to whether you wish to attack it with statistical methods or other
data mining techniques.&amp;nbsp; For this reason it is important to have some idea
of how statistical techniques work and how they can be applied.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
What is different between statistics and
data mining?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
I flew the Boston

 to 

Newark

 shuttle recently and sat next to a professor from one the Boston area
Universities.&amp;nbsp; He was going to discuss the&amp;nbsp; drosophila (fruit flies)
genetic makeup to a pharmaceutical company in 

New Jersey.&amp;nbsp; He had&amp;nbsp; compiled the world's largest database on the genetic makeup
of the fruit fly and had made it available to other researchers on the internet
through Java applications accessing a larger relational database.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
He explained to me that they not only now were storing
the information on the flies but also were doing "data mining" adding
as an aside "which seems to be very important these days whatever that
is".&amp;nbsp; I mentioned that I had written a book on the subject and he was
interested in knowing what the difference was between "data mining"
and statistics.&amp;nbsp; There was no easy answer.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The techniques used in data mining, when successful, are
successful for precisely the same reasons that statistical techniques are
successful (e.g. clean data, a well defined target to predict and good
validation to avoid overfitting).&amp;nbsp; And for the most part the techniques are
used in the same places for the same types of problems (prediction,
classification discovery).&amp;nbsp; In fact some of the techniques that are
classical defined as "data mining" such as CART and CHAID arose from
statisticians.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
So what is the difference?&amp;nbsp; Why aren't we as excited
about "statistics" as we are about data mining?&amp;nbsp; There are
several reasons.&amp;nbsp; The first is that the classical data mining techniques
such as CART, neural networks and nearest neighbor techniques tend to be more
robust to both messier real world data and also more robust to being used by
less expert users.&amp;nbsp; But that is not the only reason.&amp;nbsp; The other reason
is that the time is right.&amp;nbsp; Because of the use of computers for closed loop
business data storage and generation there now exists large quantities of data
that is available to users.&amp;nbsp; IF there were no data - there would be no
interest in mining it.&amp;nbsp; Likewise the fact that computer hardware has
dramatically upped the ante by several orders of magnitude in storing and
processing the data makes some of the most powerful data mining techniques
feasible today.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The bottom line though, from an academic standpoint at
least, is that there is little practical difference between a statistical
technique and a classical data mining technique.&amp;nbsp; Hence we have included a
description of some of the most useful in this section.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
What is statistics?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Statistics is a branch of mathematics concerning the
collection and the description of data.&amp;nbsp; Usually statistics is considered
to be one of those scary topics in college right up there with chemistry and
physics.&amp;nbsp; However, statistics is probably a much friendlier branch of
mathematics because it really can be used every day.&amp;nbsp; Statistics was in
fact born from very humble beginnings of real world problems from business,
biology, and gambling!&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
&amp;nbsp;Knowing statistics in your everyday life will help
the average business person make better decisions by allowing them to figure out
risk and uncertainty when all the facts either aren’t known or can’t be
collected.&amp;nbsp; Even with all the data stored in the largest of data warehouses
business decisions still just become more informed guesses.&amp;nbsp; The more and
better the data and the better the understanding of statistics the better the
decision that can be made.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Statistics has been around for a long time easily a
century and arguably many centuries when the ideas of probability began to gel.&amp;nbsp;
It could even be argued that the data collected by the ancient Egyptians,
Babylonians, and Greeks were all statistics long before the field was officially
recognized.&amp;nbsp; Today data mining has been defined independently of statistics
though “mining data” for patterns and predictions is really what statistics
is all about.&amp;nbsp; Some of the techniques that are classified under data mining
such as CHAID and CART really grew out of the statistical profession more than
anywhere else, and the basic ideas of probability, independence and causality
and overfitting are the foundation on which both data mining and statistics are
built.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Data, counting and probability&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One thing that is always true about statistics is that&amp;nbsp;
there is always data involved,&amp;nbsp; and usually enough data so that the average
person cannot keep track of all the data in their heads.&amp;nbsp;&amp;nbsp; This is
certainly more true today than it was when the basic ideas of probability and
statistics were being formulated and refined early this century.&amp;nbsp; Today
people have to deal with up to terabytes of data and have to make sense of it
and glean the important patterns from it.&amp;nbsp; Statistics can help greatly in
this process by helping to answer several important questions about your data:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
What patterns are
    there in my database?&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
What is the
    chance that an event will occur?&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Which patterns
    are significant?&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
What is a high
    level summary of the data that gives me some idea of what is contained in my
    database?&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
Certainly statistics can do more than answer these
questions but for most people today these are the questions that statistics can
help answer.&amp;nbsp; Consider for example that a large part of statistics is
concerned with summarizing data, and more often than not, this summarization has
to do with counting.&amp;nbsp;&amp;nbsp; One of the great values of statistics is in
presenting a high level view of the database that provides some useful
information without requiring every record to be understood in detail.&amp;nbsp;
This aspect of statistics is the part that people run into every day when they
read the daily newspaper and see, for example, a pie chart reporting the number
of US citizens of different eye colors, or the average number of annual doctor
visits for people of different ages.&amp;nbsp;&amp;nbsp; Statistics at this level is
used in the reporting of important information from which people may be able to
make useful decisions.&amp;nbsp;&amp;nbsp; There are many different parts of statistics
but the idea of collecting data and counting it is often at the base of even
these more sophisticated techniques.&amp;nbsp; The first step then in understanding
statistics is to understand how the data is collected into a&amp;nbsp; higher level
form - one of the most notable ways of doing this is with the histogram.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Histograms&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the best ways to summarize data is to provide a
histogram of the data.&amp;nbsp; In the simple example database shown in Table 1.1
we can create a histogram of eye color by counting the number of occurrences of
different colors of eyes in our database.&amp;nbsp; For this example database of 10
records this is fairly easy to do and the results are only slightly more
interesting than the database itself.&amp;nbsp;&amp;nbsp; However, for a database of
many more records this is a very useful way of getting a high level
understanding of the database.&lt;br /&gt;
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 auto; mso-shading: windowtext; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
ID&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 auto; mso-shading: windowtext; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Name&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Prediction&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
Age&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
Balance&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Income&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Eyes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Gender&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
1&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Amy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
62&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
2&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Al&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
53&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,800&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
3&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Betty&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
47&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$16,543&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 4;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
4&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Bob&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
32&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$45&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 5;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
5&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carla&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
21&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$2,300&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 6;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
6&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carl&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$5,400&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 7;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
7&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Donna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
50&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$165&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 8;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
8&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Don&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
46&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 9;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
9&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Edna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$500&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 10; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
10&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Ed&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
68&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,200&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 1.1 &lt;i&gt; An
Example Database of Customers with Different Predictor Types&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This histogram shown in figure 1.1 depicts a simple
predictor (eye color) which will have only a few different values no matter if
there are 100 customer records in the database or 100 million.&amp;nbsp; There are,
however, other predictors that have many more distinct values and can create a
much more complex histogram.&amp;nbsp; Consider, for instance, the histogram of ages
of the customers in the population.&amp;nbsp; In this case the histogram can be more
complex but can also be enlightening.&amp;nbsp; Consider if you found that the
histogram of your customer data looked as it does in figure 1.2.&lt;/div&gt;
&lt;div align="center" class="MsoBodyText"&gt;
&lt;img border="0" height="285" src="http://www.thearling.com/text/dmtechniques/dmtech1.gif" width="435" /&gt;&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.1 &lt;i&gt; This
histogram shows the number of customers with various eye colors.&amp;nbsp; This
summary can quickly show important information about the database such as that
blue eyes are the most frequent.&lt;/i&gt;&lt;/div&gt;
&lt;div align="center" class="MsoBodyText" style="text-align: center;"&gt;

&lt;img border="0" height="246" src="http://www.thearling.com/text/dmtechniques/dmtech2.gif" width="529" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.2&amp;nbsp; &lt;i&gt;This histogram shows the number of customers of different ages and quickly tells
the viewer that the majority of customers are over the age of 50.&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
By looking at this second histogram the viewer is in many
ways looking at all of the data in the database for a particular predictor or
data column.&amp;nbsp; By looking at this histogram it is also possible to build an
intuition about other important factors.&amp;nbsp; Such as the average age of the
population, the maximum and minimum age.&amp;nbsp; All of which are important.&amp;nbsp;
These values are called summary statistics.&amp;nbsp; Some of the most frequently
used summary statistics include:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Max - the maximum
    value for a given predictor.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Min - the minimum
    value for a given predictor.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Mean - the
    average value for a given predictor.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Median - the
    value for a given predictor that divides the database as nearly as possible
    into two databases of equal numbers of records.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Mode - the most
    common value for the predictor.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Variance - the
    measure of how spread out the values are from the average value.&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
When there are many values for a given predictor the
histogram begins to look smoother and smoother (compare the difference between
the two histograms above).&amp;nbsp; Sometimes the shape of the distribution of data
can be calculated by an equation rather than just represented by the histogram.&amp;nbsp;
This is what is called a data distribution.&amp;nbsp; Like a histogram a data
distribution can be described by a variety of statistics.&amp;nbsp; In classical
statistics the belief is that there is some “true” underlying shape to the
data distribution that would be formed if all possible data was collected.&amp;nbsp;
The shape of the data distribution can be calculated for some simple examples.
The statistician’s job then is to take the limited data that may have been
collected and from that make their best guess at what the “true” or at least
most likely underlying data distribution might be.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Many data distributions are well described by just two
numbers, the mean and the variance.&amp;nbsp; The mean is something most people are
familiar with, the variance, however, can be problematic.&amp;nbsp; The easiest way
to think about it is that it measures the average distance of each predictor
value from the mean value over all the records in the database.&amp;nbsp; If the
variance is high it implies that the values are all over the place and very
different.&amp;nbsp; If the variance is low most of the data values are fairly close
to the mean.&amp;nbsp; To be precise the actual definition of the variance uses the
square of the distance rather than the actual distance from the mean and the
average is taken by dividing the squared sum by one less than the total number
of records.&amp;nbsp; In terms of prediction a user could make some guess at the
value of a predictor without knowing anything else just by knowing the mean and
also gain some basic sense of how variable the guess might be based on the
variance.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Statistics for Prediction

&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
In this book the term “prediction” is used for a
variety of types of analysis that may elsewhere be more precisely called
regression.&amp;nbsp; We have done so in order to simplify some of the concepts and
to emphasize the common and most important aspects of predictive modeling.&amp;nbsp;
Nonetheless regression is a powerful and commonly used tool in statistics and it
will be discussed here.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Linear regression 

&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
In statistics prediction is usually synonymous with
regression of some form.&amp;nbsp;&amp;nbsp; There are a variety of different types of
regression in statistics but the basic idea is that a model is created that maps
values from predictors in such a way that the lowest error occurs in making a
prediction.&amp;nbsp; The simplest form of regression is simple linear regression
that just contains one predictor and a prediction.&amp;nbsp; The relationship
between the two can be mapped on a two dimensional space and the records plotted
for the prediction values along the Y axis and the predictor values along the X
axis.&amp;nbsp; The simple linear regression model then could be viewed as the line
that minimized the error rate between the actual prediction value and the point
on the line (the prediction from the model).&amp;nbsp; Graphically this would look
as it does in Figure 1.3. The simplest form of regression seeks to build a
predictive model that is a line that maps between each predictor value to a
prediction value.&amp;nbsp; Of the many possible lines that could be drawn through
the data the one that minimizes the distance between the line and the data
points is the one that is chosen for the predictive model.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
On average if you guess the value on the line it should
represent an acceptable compromise amongst all the data at that point giving
conflicting answers.&amp;nbsp; Likewise if there is no data available for a
particular input value the line will provide the best guess at a reasonable
answer based on similar data.&lt;/div&gt;
&lt;div align="center" class="MsoBodyText" style="text-align: center;"&gt;

&lt;img border="0" height="291" src="http://www.thearling.com/text/dmtechniques/dmtech3.gif" width="440" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.3 &lt;i&gt; Linear
regression is similar to the task of finding the line that minimizes the total
distance to a set of data.&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The predictive model is the line shown in Figure 1.3.&amp;nbsp;
The line will take a given value for a predictor and map it into a given value
for a prediction.&amp;nbsp; The actual equation would look something like:
Prediction = a + b * Predictor.&amp;nbsp; Which is just the equation for a line Y =
a + bX.&amp;nbsp; As an example for a bank the predicted average consumer bank
balance might equal $1,000 + 0.01 * customer’s annual income.&amp;nbsp; The trick,
as always with predictive modeling, is to find the model that best minimizes the
error. The most common way to calculate the error is the square of the
difference between the predicted value and the actual value.&amp;nbsp; Calculated
this way points that are very far from the line will have a great effect on
moving the choice of line towards themselves in order to reduce the error.&amp;nbsp;
The values of a and b in the regression equation that minimize this error can be
calculated directly from the data relatively quickly.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
What if the pattern in my data doesn't
look like a straight line?

&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Regression can become more complicated than the simple
linear regression we’ve introduced so far.&amp;nbsp; It can get more complicated
in a variety of different ways in order to better model particular database
problems.&amp;nbsp;&amp;nbsp; There are, however, three main modifications that can be
made:&lt;/div&gt;
&lt;div class="MsoNormal" style="margin-left: 57.0pt; mso-list: l12 level1 lfo17; text-indent: -.25in;"&gt;
1.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
More predictors than just one can be used.

&lt;/div&gt;
&lt;div class="MsoNormal" style="margin-left: 57.0pt; mso-list: l12 level1 lfo18; text-indent: -.25in;"&gt;
2.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
Transformations can be applied to the predictors.

&lt;/div&gt;
&lt;div class="MsoNormal" style="margin-left: 57.0pt; mso-list: l12 level1 lfo19; text-indent: -.25in;"&gt;
3.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
Predictors can be multiplied together and used as terms in the equation.

&lt;/div&gt;
&lt;div class="MsoNormal" style="margin-left: 57.0pt; mso-list: l12 level1 lfo20; text-indent: -.25in;"&gt;
4.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
Modifications can be made to accommodate response predictions that just have
yes/no or 0/1 values.

&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Adding more predictors to the linear equation can produce
more complicated lines that take more information into account and hence make a
better prediction.&amp;nbsp; This is called multiple linear regression and might
have an equation like the following if 5 predictors were used (X1, X2, X3, X4,
X5):&lt;/div&gt;
&lt;div align="center" class="MsoNormal"&gt;
Y = a + b1(X1) + b2(X2) + b3(X3) + b4(X4) +
b5(X5)

&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This equation still describes a line but it is now a line
in a6 dimensional space rather than the two dimensional space.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
By transforming the predictors by squaring, cubing or
taking their square root it is possible to use the same general regression
methodology and now create much more complex models that are no longer simple
shaped like lines.&amp;nbsp; This is called non-linear regression.&amp;nbsp; A model of
just one predictor might look like this: Y = a + b1(X1)&amp;nbsp; + b2 (X12).&amp;nbsp;
In many real world cases analysts will perform a wide variety of transformations
on their data just to try them out.&amp;nbsp; If they do not contribute to a useful
model their coefficients in the equation will tend toward zero and then they can
be removed.&amp;nbsp; The other transformation of predictor values that is often
performed is multiplying them together.&amp;nbsp; For instance a new predictor
created by dividing hourly wage by the minimum wage might be a much more
effective predictor than hourly wage by itself.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
When trying to predict a customer response that is just
yes or no (e.g. they bought the product or they didn’t or they defaulted or
they didn’t) the standard form of a line doesn’t work.&amp;nbsp; Since there are
only two possible values to be predicted it is relatively easy to fit a line
through them.&amp;nbsp; However, that model would be the same no matter what
predictors were being used or what particular data was being used.&amp;nbsp;
Typically in these situations a transformation of the prediction values is made
in order to provide a better predictive model.&amp;nbsp; This type of regression is
called logistic regression and because so many business problems are response
problems, logistic regression is one of the most widely used statistical
techniques for creating predictive models.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo12;"&gt;
1.3. Nearest Neighbor&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Clustering and the Nearest Neighbor prediction technique
are among the oldest techniques used in data mining.&amp;nbsp; Most people have an
intuition that they understand what clustering is - namely that like records are
grouped or clustered together.&amp;nbsp; Nearest neighbor is a prediction technique
that is quite similar to clustering - its essence is that in order to predict
what a prediction value is in one record look for records with similar predictor
values in the historical database and use the prediction value from the record
that it “nearest” to the unclassified record.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
A simple example of clustering&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
A simple example of clustering would be the clustering
that most people perform when they do the laundry - grouping the permanent
press, dry cleaning, whites and brightly colored clothes is important because
they have similar characteristics.&amp;nbsp; And it turns out they have important
attributes in common about the way they behave (and can be ruined) in the wash.&amp;nbsp;
To “cluster” your laundry most of your decisions are relatively
straightforward.&amp;nbsp; There are of course difficult decisions to be made about
which cluster your white shirt with red stripes goes into (since it is mostly
white but has some color and is permanent press).&amp;nbsp; When clustering is used
in business the clusters are often much more dynamic - even changing weekly to
monthly and many more of the decisions concerning which cluster a record falls
into can be difficult.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
A simple example of nearest neighbor&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
A simple example of the nearest neighbor prediction
algorithm is that if you look at the people in your neighborhood (in this case
those people that are in fact geographically near to you).&amp;nbsp; You may notice
that, in general, you all have somewhat similar incomes.&amp;nbsp; Thus if your
neighbor has an income greater than $100,000 chances are good that you too have
a high income.&amp;nbsp; Certainly the chances that you have a high income are
greater when all of your neighbors have incomes over $100,000 than if all of
your neighbors have incomes of $20,000.&amp;nbsp; Within your neighborhood there may
still be a wide variety of incomes possible among even your “closest”&amp;nbsp;
neighbors but if you had to predict someone’s income based on only knowing
their neighbors you’re best chance of being right would be to predict the
incomes of the neighbors who live closest to the unknown person.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The nearest neighbor prediction algorithm works in very
much the same way except that “nearness” in a database may consist of a
variety of factors not just where the person lives.&amp;nbsp;&amp;nbsp; It may, for
instance, be far more important to know which school someone attended and what
degree they attained when predicting income.&amp;nbsp; The better definition of
“near” might in fact be other people that you graduated from college with
rather than the people that you live next to.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Nearest Neighbor techniques are among the easiest to use
and understand because they work in a way similar to the way that people think -
by detecting closely matching examples.&amp;nbsp; They also perform quite well in
terms of automation, as many of the algorithms are robust with respect to dirty
data and missing data.&amp;nbsp; Lastly they are particularly adept at performing
complex ROI calculations because the predictions are made at a local level where
business simulations could be performed in order to optimize ROI.&amp;nbsp;&amp;nbsp; As
they enjoy similar levels of accuracy compared to other techniques the measures
of accuracy such as lift are as good as from any other.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
How to use Nearest Neighbor for Prediction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the essential elements underlying the concept of
clustering is that one particular object (whether they be cars, food or
customers) can be closer to another object than can some third object.&amp;nbsp; It
is interesting that most people have an innate sense of ordering placed on a
variety of different objects.&amp;nbsp; Most people would agree that an apple is
closer to an orange than it is to a tomato and that a Toyota Corolla is closer
to a Honda Civic than to a Porsche.&amp;nbsp; This sense of ordering on many
different objects helps us place&amp;nbsp; them in time and space and to make sense
of the world.&amp;nbsp; It is what allows us to build clusters - both in databases
on computers as well as in our daily lives.&amp;nbsp; This definition of nearness
that seems to be ubiquitous also allows us to make predictions.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The nearest neighbor prediction algorithm simply stated
is:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Objects that are “near” to each other will have
similar prediction values as well.&amp;nbsp; Thus if you know the prediction value
of one of the objects you can predict it for it’s nearest neighbors.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Where has the nearest neighbor technique
been used in business?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the classical places that nearest neighbor has
been used for prediction has been in text retrieval.&amp;nbsp; The problem to be
solved in text retrieval is one where the end user defines a document (e.g. Wall
Street Journal article, technical conference paper etc.) that is interesting to
them and they solicit the system to “find more documents like this one”.&amp;nbsp;
Effectively defining a target of: “this is the interesting document” or
“this is not interesting”.&amp;nbsp; The prediction problem is that only a very
few of the documents in the database actually have values for this prediction
field (namely only the documents that the reader has had a chance to look at so
far).&amp;nbsp; The nearest neighbor technique is used to find other documents that
share important characteristics with those documents that have been marked as
interesting.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Using nearest neighbor for stock market
data&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
As with almost all prediction algorithms, nearest
neighbor can be used in&amp;nbsp; a variety of places.&amp;nbsp; Its successful use is
mostly dependent on the pre-formatting of the data so that nearness can be
calculated and where individual records can be defined.&amp;nbsp; In the text
retrieval example this was not too difficult - the objects were documents. This
is not always as easy as it is for text retrieval. Consider what it might be
like in a time series problem - say for predicting the stock market.&amp;nbsp; In
this case the input data is just a long series of stock prices over time without
any particular record that could be considered to be an object.&amp;nbsp;&amp;nbsp; The
value to be predicted is just the next value of the stock price.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The way that this problem is solved for both nearest
neighbor techniques and for some other types of prediction algorithms is to
create training records by taking, for instance, 10 consecutive stock prices and
using the first 9 as predictor values and the 10th as the prediction value.&amp;nbsp;
Doing things this way, if you had 100 data points in your time series you could
create 10 different training records.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
You could create even more training records than 10 by
creating a&amp;nbsp; new record starting at every data point.&amp;nbsp; For instance in
the you could take the first 10 data points and create a record.&amp;nbsp; Then you
could take the 10 consecutive data points starting at the second data point,
then the 10 consecutive data point starting at the third data point.&amp;nbsp; Even
though&amp;nbsp; some of the data points would overlap from one record to the next
the prediction value would always be different.&amp;nbsp; In our example of 100
initial data points 90 different training records could be created this way as
opposed to the 10 training records created via the other method.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Why voting is better - K Nearest Neighbors&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the improvements that is usually made to the basic
nearest neighbor algorithm is to take a vote from the “K” nearest neighbors
rather than just relying on the sole nearest neighbor to the unclassified
record.&amp;nbsp; In Figure 1.4 we can see that unclassified example C has a nearest
neighbor that is a defaulter and yet is surrounded almost exclusively by records
that are good credit risks.&amp;nbsp; In this case the nearest neighbor to record C
is probably an outlier - which may be incorrect data or some non-repeatable
idiosyncrasy.&amp;nbsp; In either case it is more than likely that C is a
non-defaulter yet would be predicted to be a defaulter if the sole nearest
neighbor were used for the prediction.&lt;/div&gt;
&lt;div align="center" class="MsoNormal" style="page-break-after: avoid; text-align: center;"&gt;

&lt;img border="0" height="327" src="http://www.thearling.com/text/dmtechniques/dmtech4.gif" width="423" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.4&amp;nbsp; &lt;i&gt;The nearest neighbors are shown graphically for three unclassified records: A,
B, and C.&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In cases like these a vote of the 9 or 15 nearest
neighbors would provide a better prediction accuracy for the system than would
just the single nearest neighbor.&amp;nbsp; Usually this is accomplished by simply
taking the majority or plurality of predictions from the K nearest neighbors if
the prediction column is a binary or categorical or taking the average value of
the prediction column from the K nearest neighbors.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
How can the nearest neighbor tell you how
confident it is in the prediction?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Another important aspect of any system that is used to
make predictions is that the user be provided with, not only the prediction, but
also some sense of the confidence in that prediction (e.g. the prediction is
defaulter with the chance of being correct 60% of the time).&amp;nbsp; The nearest
neighbor algorithm provides this confidence information in a number of ways:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The distance to the nearest neighbor provides a level of
confidence.&amp;nbsp; If the neighbor is very close or an exact match then there is
much higher confidence in the prediction than if the nearest record is a great
distance from the unclassified record.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The degree of homogeneity amongst the predictions within
the K nearest neighbors can also be used.&amp;nbsp; If all the nearest neighbors
make the same prediction then there is much higher confidence in the prediction
than if half the records made one prediction and the other half made another
prediction.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo12;"&gt;
1.4. Clustering&lt;/h3&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Clustering for Clarity&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Clustering is the method by which like records are
grouped together.&amp;nbsp; Usually this is done to give the end user a high level
view of what is going on in the database.&amp;nbsp; Clustering is sometimes used to
mean segmentation - which most marketing people will tell you is useful for
coming up with a birds eye view of the business.&amp;nbsp; Two of these clustering
systems are the PRIZM™ system from Claritas corporation and MicroVision™
from Equifax corporation.&amp;nbsp; These companies have grouped the population by
demographic information into segments that they believe are useful for direct
marketing and sales.&amp;nbsp; To build these groupings they use information such as
income, age, occupation, housing and race collect in the US Census.&amp;nbsp; Then
they assign memorable “nicknames” to the clusters.&amp;nbsp; Some examples are
shown in Table 1.2.
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 183;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;
Name&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Income&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
Age&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
Education&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Vendor&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;
Blue Blood Estates&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Wealthy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
35-54&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
College&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Claritas Prizm™&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;
Shotguns and Pickups&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Middle&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
35-64&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
High School&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Claritas Prizm™&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;

        
        Southside
        
        City
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Poor&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
Mix&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
Grade School&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Claritas Prizm™&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 4;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;
Living Off the Land&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Middle-Poor&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
School Age Families&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Equifax MicroVision™&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 5;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;
University 
        
        USA
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Very low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
Young - Mix&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
Medium to High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Equifax MicroVision™&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 6; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 127.8pt;" valign="top" width="170"&gt;
        &lt;div class="TableCell"&gt;
Sunset Years&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 75.8pt;" valign="top" width="101"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.8pt;" valign="top" width="96"&gt;
        &lt;div class="TableCell"&gt;
Seniors&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 88.85pt;" valign="top" width="118"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 78.55pt;" valign="top" width="105"&gt;
        &lt;div class="TableCell"&gt;
Equifax MicroVision™&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 1.2 &lt;i&gt; Some
Commercially Available Cluster Tags&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This clustering information is then used by the end user
to tag the customers in their database.&amp;nbsp; Once this is done the business
user can get a quick high level view of what is happening within the cluster.
Once the business user has worked with these codes for some time they also begin
to build intuitions about how these different customers clusters will react to
the marketing offers particular to their business.&amp;nbsp;&amp;nbsp; For instance some
of these clusters may relate to their business and some of them may not.&amp;nbsp;
But given that their competition may well be using these same clusters to
structure their business and marketing offers it is important to be aware of how
you customer base behaves in regard to these clusters.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Finding the ones that don't fit in -
Clustering for Outliers&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Sometimes clustering is performed not so much to keep
records together as to make it easier to see when one record sticks out from the
rest.&amp;nbsp; For instance:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Most wine distributors selling inexpensive wine in 

Missouri

 and that ship a certain volume of product produce a certain level of profit.&amp;nbsp;
There is a cluster of stores that can be formed with these characteristics.&amp;nbsp;
One store stands out, however, as producing significantly lower profit.&amp;nbsp;&amp;nbsp;
On closer examination it turns out that the distributor was delivering product
to but not collecting payment from one of their customers.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
A sale on men’s suits is being held in all branches of
a department store for southern 

California

.&amp;nbsp;&amp;nbsp; All stores with these characteristics&amp;nbsp; have seen at least a
100% jump in revenue since the start of the sale except one.&amp;nbsp; It turns out
that this store had, unlike the others,&amp;nbsp; advertised via radio rather than
television.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
How is clustering like the nearest
neighbor technique?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The nearest neighbor algorithm is basically a refinement
of clustering in the sense that they both use distance in some feature space to
create either structure in the data or predictions.&amp;nbsp; The nearest neighbor
algorithm is a refinement since part of the algorithm usually is a way of
automatically determining the weighting of the importance of the predictors and
how the distance will be measured within the feature space.&amp;nbsp; Clustering is
one special case of this where the importance of each predictor is considered to
be equivalent.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
How to put clustering and nearest neighbor
to work for prediction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
To see clustering and nearest neighbor prediction in use
let’s go back to our example database and now look at it in two ways.&amp;nbsp;
First let’s try to create our own clusters - which if useful we could use
internally to help to simplify and clarify large quantities of data (and maybe
if we did a very good job sell these new codes to other business users).&amp;nbsp;
Secondly let’s try to create predictions based on the nearest neighbor.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
First take a look at the data.&amp;nbsp; How would you
cluster the data in Table 1.3?&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
ID&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Name&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Prediction&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
Age&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
Balance&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Income&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Eyes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Gender&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
1&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Amy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
62&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
2&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Al&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
53&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,800&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
3&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Betty&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
47&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$16,543&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 4;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
4&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Bob&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
32&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$45&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 5;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
5&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carla&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
21&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$2,300&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 6;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
6&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carl&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$5,400&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 7;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
7&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Donna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
50&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$165&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 8;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
8&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Don&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
46&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 9;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
9&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Edna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$500&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 10; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
10&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Ed&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
68&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,200&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 1.3&lt;i&gt; A
Simple Example Database&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
If these were your friends rather than your customers
(hopefully they could be both) and they were single, you might cluster them
based on their compatibility with each other.&amp;nbsp; Creating your own mini
dating service.&amp;nbsp; If you were a pragmatic person you might cluster your
database as follows because you think that marital happiness is mostly dependent
on financial compatibility and create three clusters as shown in Table 1.4.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;

&amp;nbsp;
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="background: #D9D9D9; border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-pattern: gray-15 black; mso-shading: white; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
ID&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Name&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Prediction&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
Age&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
Balance&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Income&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Eyes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Gender&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div class="TableCell" style="margin-bottom: 0; margin-top: 0;"&gt;


&amp;nbsp;&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
3&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Betty&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
47&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$16,543&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
5&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carla&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
21&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$2,300&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
6&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carl&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$5,400&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
8&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Don&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
46&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div class="TableCell" style="margin-bottom: 0; margin-top: 0;"&gt;


&amp;nbsp;&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
1&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Amy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
62&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
2&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Al&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
53&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,800&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
4&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Bob&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
32&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$45&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div class="TableCell" style="margin-bottom: 0; margin-top: 0;"&gt;

&amp;nbsp; 
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
7&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Donna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
50&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$165&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
9&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Edna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$500&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
10&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Ed&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
68&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,200&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="TableCell"&gt;

&amp;nbsp;Table 1.4.&amp;nbsp; &lt;i&gt;A Simple Clustering of the Example Database&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Is the another "correct" way to
cluster?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
If on the other hand you are more of a romantic you might
note some incompatibilities between 46 year old Don and 21 year old Carla (even
though they both make very good incomes).&amp;nbsp; You might instead consider age
and some physical characteristics to be most important in creating clusters of
friends.&amp;nbsp; Another way you could cluster your friends would be based on
their ages and on the color of their eyes.&amp;nbsp; This is shown in Table 1.5.&amp;nbsp;
Here three clusters are created where each person in the cluster is about the
same age and some attempt has been made to keep people of like eye color
together in the same cluster.&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="background: #D9D9D9; border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-pattern: gray-15 black; mso-shading: white; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
ID
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Name
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Prediction
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
Age 
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
Balance
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Income
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Eyes
        
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Gender
        
        &lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div class="TableCell" style="margin-bottom: 0; margin-top: 0;"&gt;

&amp;nbsp;&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
5&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carla&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
21&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$2,300&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
9&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Edna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$500&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
6&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Carl&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$5,400&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
4&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Bob&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
32&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$45&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div class="TableCell" style="margin-bottom: 0; margin-top: 0;"&gt;

&amp;nbsp;&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
8&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Don&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
46&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
7&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Donna&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
50&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$165&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
10&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Ed&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
68&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,200&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div class="TableCell" style="margin-bottom: 0; margin-top: 0;"&gt;

&amp;nbsp;&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
3&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Betty&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
47&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$16,543&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
2&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Al&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
53&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$1,800&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Green&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
M&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 28.4pt;" valign="top" width="38"&gt;
        &lt;div class="TableCell"&gt;
1&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 47.65pt;" valign="top" width="64"&gt;
        &lt;div class="TableCell"&gt;
Amy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 71.1pt;" valign="top" width="95"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 40.95pt;" valign="top" width="55"&gt;
        &lt;div class="TableCell"&gt;
62&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 60.2pt;" valign="top" width="80"&gt;
        &lt;div class="TableCell"&gt;
$0&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Medium&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Brown&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.0pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
F&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="TableCell"&gt;

&amp;nbsp;Table 1.5 &lt;i&gt; A More "Romantic" Clustering of the Example Database
to Optimize for Your Dating Service&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
There is no best way to cluster.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This example, though simple, points up some important
questions about clustering.&amp;nbsp; For instance: Is it possible to say whether
the first clustering that was performed above (by financial status) was better
or worse than the second clustering (by age and eye color)?&amp;nbsp; Probably not
since the clusters were constructed for no particular purpose except to note
similarities between some of the records and that the view of the database could
be somewhat simplified by using clusters.&amp;nbsp; But even the differences that
were created by the two different clusterings were driven by slightly different
motivations (financial vs. Romantic).&amp;nbsp; In general the reasons for
clustering are just this ill defined because clusters are used more often than
not for exploration and summarization as much as they are used for prediction.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
How are tradeoffs made when determining
which records fall into which clusters?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Notice that for the first clustering example there was a
pretty simple rule by which the records could be broken up into clusters -
namely by income.&amp;nbsp; In the second clustering example there were less clear
dividing lines since two predictors were used to form the clusters (age and eye
color).&amp;nbsp;&amp;nbsp; Thus the first cluster is dominated by younger people with
somewhat mixed eye colors whereas the latter two clusters have a mix of older
people where eye color has been used to separate them out (the second cluster is
entirely blue eyed people).&amp;nbsp;&amp;nbsp; In this case these tradeoffs were made
arbitrarily but when clustering much larger numbers of records these tradeoffs
are explicitly defined by the clustering algorithm.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Clustering is the happy medium between
homogeneous clusters and the fewest number of clusters.&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
In the best possible case clusters would be built where
all records within the cluster had identical values for the particular
predictors that were being clustered on.&amp;nbsp; This would be the optimum in
creating a high level view since knowing the predictor values for any member of
the cluster would mean knowing the values for every member of the cluster no
matter how large the cluster was.&amp;nbsp; Creating homogeneous clusters where all
values for the predictors are the same is difficult to do when there are many
predictors and/or the predictors have many different values (high cardinality).&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
It is possible to guarantee that homogeneous clusters are
created by breaking apart any cluster that is inhomogeneous into smaller
clusters that are homogeneous.&amp;nbsp; In the extreme, though, this usually means
creating clusters with only one record in them which usually defeats the
original purpose of the clustering.&amp;nbsp; For instance in our 10 record database
above 10 perfectly homogeneous clusters could be formed of 1 record each, but
not much progress would have been made in making the original database more
understandable.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The second important constraint on clustering is then
that a reasonable number of clusters are formed.&amp;nbsp; Where, again, reasonable
is defined by the user but is difficult to quantify beyond that except to say
that just one cluster is unacceptable (too much generalization) and that as many
clusters and original records is also unacceptable&amp;nbsp; Many clustering
algorithms either let the user choose the number of clusters that they would
like to see created from the database or they provide the user a “knob” by
which they can create fewer or greater numbers of clusters interactively after
the clustering has been performed.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
What is the difference between clustering
and nearest neighbor prediction?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The main distinction between clustering and the nearest
neighbor technique is that clustering is what is called an unsupervised learning
technique and nearest neighbor is generally used for prediction or a supervised
learning technique.&amp;nbsp; Unsupervised learning techniques are unsupervised in
the sense that when they are run there is not particular reason for the creation
of the models the way there is for supervised learning techniques that are
trying to perform prediction.&amp;nbsp; In prediction, the patterns that are found
in the database and presented in the model are always the most important
patterns in the database for performing some particular prediction.&amp;nbsp; In
clustering there is no particular sense of why certain records are near to each
other or why they all fall into the same cluster.&amp;nbsp; Some of the differences
between clustering and nearest neighbor prediction can be summarized in Table 1.6.&lt;br /&gt;

&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 55;"&gt;
    &lt;tbody&gt;
&lt;tr style="height: 17.7pt; mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #666666; border: solid gray 1.0pt; height: 17.7pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-60 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 234.9pt;" valign="top" width="313"&gt;
        &lt;div class="TableCell"&gt;
&lt;span style="color: white;"&gt;&lt;b&gt;Nearest Neighbor
        
        &lt;/b&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #666666; border-left: none; border: solid gray 1.0pt; height: 17.7pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-60 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 3.0in;" valign="top" width="288"&gt;
        &lt;div class="TableCell"&gt;
&lt;span style="color: white;"&gt;&lt;b&gt;Clustering
        
        &lt;/b&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 234.9pt;" valign="top" width="313"&gt;
        &lt;div class="TableCell"&gt;
Used for prediction as well as consolidation.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 3.0in;" valign="top" width="288"&gt;
        &lt;div class="TableCell"&gt;
Used mostly for consolidating data into a
        high-level view and general grouping of records into like behaviors.&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 234.9pt;" valign="top" width="313"&gt;
        &lt;div class="TableCell"&gt;
Space is defined by the problem to be solved
        (supervised learning).&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 3.0in;" valign="top" width="288"&gt;
        &lt;div class="TableCell"&gt;
Space is defined as default n-dimensional&amp;nbsp;
        space, or is defined by the user, or is a predefined space driven by
        past experience (unsupervised learning).&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 234.9pt;" valign="top" width="313"&gt;
        &lt;div class="TableCell"&gt;
Generally only uses distance metrics to determine
        nearness.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 3.0in;" valign="top" width="288"&gt;
        &lt;div class="TableCell"&gt;
Can use other metrics besides distance to determine
        nearness of two records - for example linking two points together.&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 1.6&amp;nbsp; &lt;i&gt;Some of the Differences Between the Nearest-Neighbor Data Mining Technique and
Clustering&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
What is an n-dimensional space? Do I
really need to know this?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
When people talk about clustering or nearest neighbor
prediction they will often talk about a “space” of “N” dimensions.&amp;nbsp;
What they mean is that in order to define what is near and what is far away it
is helpful to have a “space” defined where distance can be calculated.&amp;nbsp;
Generally these spaces behave just like the three dimensional space that we are
familiar with where distance between objects is defined by euclidean distance
(just like figuring out the length of a side in a triangle).&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
What goes for three dimensions works pretty well for more
dimensions as well.&amp;nbsp; Which is a good thing since most real world problems
consists of many more than three dimensions.&amp;nbsp; In fact each predictor (or
database column) that is used can be considered to be a new dimension.&amp;nbsp; In
the example above the five predictors: age, income, balance, eyes and gender can
all be construed to be dimensions in an n dimensional space where n, in this
case, equal 5.&amp;nbsp; It is sometimes easier to think about these and other data
mining algorithms in terms of n-dimensional spaces because it allows for some
intuitions to be used about how the algorithm is working.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Moving from three dimensions to five dimensions is not
too large a jump but there are also spaces in real world problems that are far
more complex.&amp;nbsp; In the credit card industry credit card issuers typically
have over one thousand predictors that could be used to create an n-dimensional
space.&amp;nbsp; For text retrieval (e.g. finding useful Wall Street Journal
articles from a large database, or finding useful web sites on the internet) the
predictors (and hence the dimensions) are typically words or phrases that are
found in the document records.&amp;nbsp; In just one&amp;nbsp; year of the Wall Street
Journal there are more than 50,000 different words used - which translates to a
50,000 dimensional space in which nearness between records must be calculated.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
How is the space for clustering and
nearest neighbor defined?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
For clustering the n-dimensional space is usually defined
by assigning one predictor to each dimension.&amp;nbsp; For the nearest neighbor
algorithm predictors are also mapped to dimensions but then those dimensions are
literally stretched or compressed based on how important the particular
predictor is in making the prediction.&amp;nbsp; The stretching of a dimension
effectively makes that dimension (and hence predictor) more important than the
others in calculating the distance.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
For instance if you are a mountain climber and someone
told you that you were 2 miles from your destination the distance is the same
whether it’s 1 mile north and 1 mile up the face of the mountain or 2 miles
north on level ground but clearly the former route is much different from the
latter. The distance traveled straight upward is the most important if figuring
out how long it will really take to get to the destination and you would
probably like to consider this “dimension” to be more important than the
others.&amp;nbsp;&amp;nbsp; In fact you, as a mountain climber, could “weight” the
importance of the vertical dimension in calculating some new distance by
reasoning that every mile upward is equivalent to 10 miles on level ground.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
If you used this rule of thumb to weight the importance
of one dimension over the other it would be clear that in one case you were much
“further away” from your destination (“11 miles”) than in the second
(“2 miles”).&amp;nbsp; In the next section we’ll show how the nearest neighbor
algorithm uses distance measure that similarly weight the important dimensions
more heavily when calculating a distance measure.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Hierarchical and Non-Hierarchical
Clustering&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
There are two main types of clustering techniques, those
that create a hierarchy of clusters and those that do not.&amp;nbsp; The
hierarchical clustering techniques create a hierarchy of clusters from small to
big.&amp;nbsp; The main reason for this is that, as was already stated,&amp;nbsp;
clustering is an unsupervised learning technique, and as such, there is no
absolutely correct answer.&amp;nbsp; For this reason and depending on the particular
application of the clustering, fewer or greater numbers of clusters may be
desired.&amp;nbsp; With a hierarchy of clusters defined it is possible to choose the
number of clusters that are desired.&amp;nbsp; At the extreme it is possible to have
as many clusters as there are records in the database.&amp;nbsp;&amp;nbsp; In this case
the records within the cluster are optimally similar to each other (since there
is only one) and certainly different from the other clusters.&amp;nbsp; But of
course such a clustering technique misses the point in the sense that the idea
of clustering is to find useful patters in the database that summarize it and
make it easier to understand.&amp;nbsp; Any clustering algorithm that ends up with
as many clusters as there are records has not helped the user understand the
data any better.&amp;nbsp; Thus one of the main points about clustering is that
there be many fewer clusters than there are original records.&amp;nbsp; Exactly how
many clusters should be formed is a matter of interpretation.&amp;nbsp; The
advantage of hierarchical clustering methods is that they allow the end user to
choose from either many clusters or only a few.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The hierarchy of clusters is usually viewed as a tree
where the smallest clusters merge together to create the next highest level of
clusters and those at that level merge together to create the next highest level
of clusters.&amp;nbsp; Figure 1.5 below shows how several clusters might form a
hierarchy.&amp;nbsp; When a hierarchy of clusters like this is created the user can
determine what the right number of clusters is that adequately summarizes the
data while still providing useful information (at the other extreme a single
cluster containing all the records is a great summarization but does not contain
enough specific information to be useful).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This hierarchy of clusters is created through the
algorithm that builds the clusters.&amp;nbsp; There are two main types of
hierarchical clustering algorithms:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Agglomerative - Agglomerative clustering techniques
    start with as many clusters as there are records where each cluster contains
    just one record.&amp;nbsp;&amp;nbsp; The clusters that are nearest each other are
    merged together to form the next largest cluster.&amp;nbsp; This merging is
    continued until a hierarchy of clusters is built with just a single cluster
    containing all the records at the top of the hierarchy.&lt;br /&gt;
  &lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Divisive - Divisive clustering techniques take the
    opposite approach from agglomerative techniques.&amp;nbsp; These techniques
    start with all the records in one cluster and then try to split that cluster
    into smaller pieces and then in turn to try to split those smaller pieces.&amp;nbsp;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
Of the two the agglomerative techniques are the most
commonly used for clustering and have more algorithms developed for them.&amp;nbsp;
We’ll talk about these in more detail in the next section. The
non-hierarchical techniques in general are faster to create from the historical
database but require that the user make some decision about the number of
clusters desired or the minimum “nearness” required for two records to be
within the same cluster.&amp;nbsp; These non-hierarchical techniques often times are
run multiple times starting off with some arbitrary or even random clustering
and then iteratively improving the clustering by shuffling some records around.&amp;nbsp;
Or these techniques some times create clusters that are created with only one
pass through the database adding records to existing clusters when they exist
and creating new clusters when no existing cluster is a good candidate for the
given record. Because the definition of which clusters are formed can depend on
these initial choices of which starting clusters should be chosen or even how
many clusters these techniques can be less repeatable than the hierarchical
techniques and can sometimes create either too many or too few clusters because
the number of clusters is predetermined by the user not determined solely by the
patterns inherent in the database.&lt;/div&gt;
&lt;div align="center" class="MsoNormal" style="page-break-after: avoid;"&gt;
&lt;img border="0" height="306" src="http://www.thearling.com/text/dmtechniques/dmtech5.gif" width="399" /&gt;&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.5 &lt;i&gt;
Diagram showing a hierarchy of clusters.&amp;nbsp; Clusters at the lowest level are
merged together to form larger clusters at the next level of the hierarchy.&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Non-Hierarchical Clustering&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
There are two main non-hierarchical clustering
techniques.&amp;nbsp; Both of them are very fast to compute on the database but have
some drawbacks.&amp;nbsp; The first are the single pass methods.&amp;nbsp; They derive
their name from the fact that the database must only be passed through once in
order to create the clusters (i.e. each record is only read from the database
once).&amp;nbsp; The other class of&amp;nbsp; techniques are called reallocation
methods.&amp;nbsp; They get their name from the movement or “reallocation” of
records from one cluster to another in order to create better clusters.&amp;nbsp;
The reallocation techniques do use multiple passes through the database but are
relatively fast in comparison to the hierarchical techniques.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Some techniques allow the user to request the number of
clusters that they would like to be pulled out of the data.&amp;nbsp; Predefining
the number of clusters rather than having them driven by the data might seem to
be a bad idea as there might be some very distinct and observable clustering of
the data into a certain number of clusters which the user might not be aware of.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
For instance the user may wish to see their data broken
up into 10 clusters but the data itself partitions very cleanly into 13
clusters.&amp;nbsp; These non-hierarchical techniques will try to shoe horn these
extra three clusters into the existing 10 rather than creating 13 which best fit
the data.&amp;nbsp; The saving grace for these methods, however, is that, as we have
seen, there is no one right answer for how to cluster so it is rare that by
arbitrarily predefining the number of clusters that you would end up with the
wrong answer.&amp;nbsp; One of the advantages of these techniques is that often
times the user does have some predefined level&amp;nbsp; of summarization that they
are interested in (e.g. “25 clusters is too confusing, but 10 will help to
give me an insight into my data”).&amp;nbsp; The fact that greater or fewer
numbers of clusters would better match the data is actually of secondary
importance.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo12;"&gt;
Hierarchical Clustering 

&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Hierarchical clustering has the advantage over
non-hierarchical techniques in that the clusters are defined solely by the data
(not by the users predetermining the number of clusters) and that the number of
clusters can be increased or decreased by simple moving up and down the
hierarchy.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The hierarchy is created by starting either at the top
(one cluster that includes all records) and subdividing (divisive clustering) or
by starting at the bottom with as many clusters as there are records and merging
(agglomerative clustering).&amp;nbsp; Usually the merging and subdividing are done
two clusters at a time.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The main distinction between the techniques is their
ability to favor long, scraggly clusters that are linked together record by
record, or to favor the detection of the more classical, compact or spherical
cluster that was shown at the beginning of this section.&amp;nbsp; It may seem
strange to want to form these long snaking chain like clusters, but in some
cases they are the patters that the user would like to have detected in the
database.&amp;nbsp; These are the times when the underlying space looks quite
different from the spherical clusters and the clusters that should be formed are
not based on the distance from the center of the cluster but instead based on
the records being “linked” together.&amp;nbsp; Consider the example shown in Figure 1.6
or in Figure 1.7.&amp;nbsp; In these cases there are two clusters that are not very
spherical in shape but could be detected by the single link technique.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
When looking at the layout of the data in Figure1.6&amp;nbsp;
there appears to be two relatively flat clusters running parallel to each along
the income axis.&amp;nbsp; Neither the complete link nor Ward’s method would,
however, return these two clusters to the user.&amp;nbsp; These techniques rely on
creating a “center” for each cluster and picking these centers so that they
average distance of each record from this center is minimized. Points that are
very distant from these centers would necessarily fall into a different cluster.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
What makes these clusters “visible” in this simple
two dimensional space is the fact that each point in a cluster is tightly linked
to some other point in the cluster.&amp;nbsp; For the two clusters we see the
maximum distance between the nearest two points within a cluster is less than
the minimum distance of the nearest two points in different clusters.&amp;nbsp; That
is to say that for any point in this space, the nearest point to it is always
going to be another point in the same cluster.&amp;nbsp; Now the center of gravity
of a cluster could be quite distant from a given point but that every point is
linked to every other point by a series of small distances.&lt;/div&gt;
&lt;div align="center" class="MsoNormal" style="page-break-after: avoid; text-align: center;"&gt;

&lt;img border="0" height="267" src="http://www.thearling.com/text/dmtechniques/dmtech6.gif" width="394" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.6&amp;nbsp; &lt;i&gt;an example of elongated clusters which would not be recovered by the complete
link or Ward's methods but would be by the single-link method.&lt;/i&gt;&lt;/div&gt;
&lt;div align="center" class="MsoNormal" style="page-break-after: avoid; text-align: center;"&gt;

&lt;img border="0" height="282" src="http://www.thearling.com/text/dmtechniques/dmtech7.gif" width="393" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 1.7&amp;nbsp; &lt;i&gt;An example of nested clusters which would not be recovered by the complete link
or Ward's methods but would be by the single-link method.&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo12;"&gt;
1.5. Choosing the Classics&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
There is no particular rule that would tell you when to
choose a particular technique over another one.&amp;nbsp; Sometimes those decisions
are made relatively arbitrarily based on the availability of data mining
analysts who are most experienced in one technique over another.&amp;nbsp;&amp;nbsp; And
even choosing classical techniques over some of the newer techniques is more
dependent on the availability of good tools and good analysts.&amp;nbsp; Whichever
techniques are chosen whether classical or next generation all of the techniques
presented here have been available and tried for more than two decades.&amp;nbsp; So
even the next generation is a solid bet for implementation.&lt;br /&gt;

&lt;/div&gt;
&lt;h3&gt;
II. Next Generation Techniques: Trees, Networks and Rules&lt;/h3&gt;
&lt;h3 style="mso-list: l10 level2 lfo21;"&gt;
2.1. The Next Generation&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The data mining techniques in this section represent the
most often used techniques that have been developed over the last two decades of
research.&amp;nbsp; They also represent the vast majority of the techniques that are
being spoken about when data mining is mentioned in the popular press.&amp;nbsp;
These techniques can be used for either discovering new information within large
databases or for building predictive models.&amp;nbsp; Though the older decision
tree techniques such as CHAID are currently highly used the new techniques such
as CART are gaining wider acceptance.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo21;"&gt;
2.2. Decision Trees&lt;/h3&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
What is a Decision Tree?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
A decision tree is a predictive model that, as its name
implies, can be viewed as a tree.&amp;nbsp; Specifically each branch of the tree is
a classification question and the leaves of the tree are partitions of the
dataset with their classification.&amp;nbsp; For instance if we were going to
classify customers who churn (don’t renew their phone contracts) in the
Cellular Telephone Industry a decision tree might look something like that found
in Figure 2.1.&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;

&lt;img border="0" height="334" src="http://www.thearling.com/text/dmtechniques/dmtech8.gif" width="444" /&gt;
&lt;br /&gt;
Figure 2.1&amp;nbsp; &lt;i&gt;A decision tree is a predictive model that makes a prediction
on the basis of a&amp;nbsp; series of decision much like the game of 20 questions.&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
You may notice some interesting things about the tree:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
It divides up the data on each branch point without
losing any of the data (the number of total records in a given parent node is
equal to the sum of the records contained in its two children).&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
The number of churners and non-churners is conserved as
you move up or down the tree&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
It is pretty easy to understand how the model is being
built (in contrast to the models from neural networks or from standard
statistics).&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
It would also be pretty easy to use this model if you
actually had to target those customers that are likely to churn with a targeted
marketing offer.&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
You may also build some intuitions about your customer
base.&amp;nbsp; E.g.&amp;nbsp; “customers who have been with you for a couple of years
and have up to date cellular phones are pretty loyal”.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Viewing decision trees as segmentation
with a purpose&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
From a business perspective decision trees can be viewed
as creating a segmentation of the original dataset (each segment would be one of
the leaves of the tree).&amp;nbsp; Segmentation of customers, products, and sales
regions is something that marketing managers have been doing for many years. In
the past this segmentation has been performed in order to get a high level view
of a large amount of data - with no particular reason for creating the
segmentation except that the records within each segmentation were somewhat
similar to each other.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In this case the segmentation is done for a particular
reason - namely for the prediction of some important piece of information.&amp;nbsp;
The records that fall within each segment fall there because they have
similarity with respect to the information being predicted - not just that they
are similar - without similarity being well defined.&amp;nbsp; These predictive
segments that are derived from the decision tree also come with a description of
the characteristics that define the predictive segment.&amp;nbsp;&amp;nbsp; Thus the
decision trees and the algorithms that create them may be complex, the results
can be presented in an easy to understand way that can be quite useful to the
business user.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Applying decision trees to Business&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Because of their tree structure and ability to easily
generate rules decision trees are the favored technique for building
understandable models.&amp;nbsp; Because of this clarity they also allow for more
complex profit and ROI models to be added easily in on top of the predictive
model.&amp;nbsp; For instance once a customer population is found with high
predicted likelihood to attrite a variety of cost models can be used to see if
an expensive marketing intervention should be used because the customers are
highly valuable or a less expensive intervention should be used because the
revenue from this sub-population of customers is marginal.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Because of their high level of automation and the ease of
translating decision tree models into SQL for deployment in relational databases
the technology has also proven to be easy to integrate with existing IT
processes, requiring little preprocessing and cleansing of the data, or
extraction of a special purpose file specifically for data mining.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Where can decision trees be used?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Decision trees are data mining technology that has been
around in a form very similar to the technology of today for almost twenty years
now and early versions of the algorithms date back in the 1960s.&amp;nbsp; Often
times these techniques were originally developed for statisticians to automate
the process of determining which fields in their database were actually useful
or correlated with the particular problem that they were trying to understand.&amp;nbsp;
Partially because of this history, decision tree algorithms tend to automate the
entire process of hypothesis generation and then validation much more completely
and in a much more integrated way than any other data mining techniques.&amp;nbsp;
They are also particularly adept at handling raw data with little or no
pre-processing.&amp;nbsp; Perhaps also because they were originally developed to
mimic the way an analyst interactively performs data mining they provide a
simple to understand predictive model based on rules (such as “90% of the time
credit card customers of less than 3 months who max out their credit limit are
going to default on their credit card loan.”).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Because decision trees score so highly on so many of the
critical features of data mining they can be used in a wide variety of business
problems for both exploration and for prediction.&amp;nbsp; They have been used for
problems ranging from credit card attrition prediction to time series prediction
of the exchange rate of different international currencies.&amp;nbsp; There are also
some problems where decision trees will not do as well.&amp;nbsp; Some very simple
problems where the prediction is just a simple multiple of the predictor can be
solved much more quickly and easily by linear regression.&amp;nbsp; Usually the
models to be built and the interactions to be detected are much more complex in
real world problems and this is where decision trees excel.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Using decision trees for Exploration&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The decision tree technology can be used for exploration
of the dataset and business problem.&amp;nbsp; This is often done by looking at the
predictors and values that are chosen for each split of the tree.&amp;nbsp; Often
times these predictors provide usable insights or propose questions that need to
be answered.&amp;nbsp; For instance if you ran across the following in your database
for cellular phone churn you might seriously wonder about the way your telesales
operators were making their calls and maybe change the way that they are
compensated: “IF customer lifetime &amp;lt; 1.1 years AND sales channel =
telesales THEN chance of churn is 65%.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Using decision trees for Data
Preprocessing&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Another way that the decision tree technology has been
used is for preprocessing data for other prediction algorithms.&amp;nbsp; Because
the algorithm is fairly robust with respect to a variety of predictor types
(e.g. number, categorical etc.) and because it can be run relatively quickly
decision trees can be used on the first pass of a data mining run to create a
subset of possibly useful predictors that can then be fed into neural networks,
nearest neighbor and normal statistical routines - which can take a considerable
amount of time to run if there are large numbers of possible predictors to be
used in the model.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Decision tress for Prediction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Although some forms of decision trees were initially
developed as exploratory tools to refine and preprocess data for more standard
statistical techniques like logistic regression.&amp;nbsp; They have also been used
and more increasingly often being used for prediction.&amp;nbsp; This is interesting
because many statisticians will still use decision trees for exploratory
analysis effectively building a predictive model as a by product but then ignore
the predictive model in favor of techniques that they are most comfortable with.&amp;nbsp;
Sometimes veteran analysts will do this even excluding the predictive model when
it is superior to that produced by other techniques.&amp;nbsp; With a host of new
products and skilled users now appearing this tendency to use decision trees
only for exploration now seems to be changing.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
The first step is Growing the Tree&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The first step in the process is that of growing the
tree.&amp;nbsp; Specifically the algorithm seeks to create a tree that works as
perfectly as possible on all the data that is available.&amp;nbsp; Most of the time
it is not possible to have the algorithm work perfectly.&amp;nbsp; There is always
noise in the database to some degree (there are variables that are not being
collected that have an impact on the target you are trying to predict).&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The name of the game in growing the tree is in finding
the best possible question to ask at each branch point of the tree.&amp;nbsp; At the
bottom of the tree you will come up with nodes that you would like to be all of
one type or the other.&amp;nbsp; Thus the question: “Are you over 40?” probably
does not sufficiently distinguish between those who are churners and those who
are not - let’s say it is 40%/60%.&amp;nbsp; On the other hand there may be a
series of questions that do quite a nice job in distinguishing those cellular
phone customers who will churn and those who won’t.&amp;nbsp; Maybe the series of
questions would be something like: “Have you been a customer for less than a
year, do you have a telephone that is more than two years old and were you
originally landed as a customer via telesales rather than direct sales?”&amp;nbsp;
This series of questions defines a segment of the customer population in which
90% churn.&amp;nbsp; These are then relevant questions to be asking in relation to
predicting churn.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
The difference between a good question and
a bad question&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The difference between a good question and a bad question
has to do with how much the question can organize the data - or in this case,
change the likelihood of a churner appearing in the customer segment.&amp;nbsp; If
we started off with our population being half churners and half non-churners
then we would expect that&amp;nbsp; a question that didn’t organize the data to
some degree into one segment that was more likely to churn than the other then
it wouldn’t be a very useful question to ask.&amp;nbsp; On the other hand if we
asked a question that was very good at distinguishing between churners and
non-churners - say that split 100 customers into one segment of 50 churners and
another segment of 50 non-churners then this would be considered to be a good
question.&amp;nbsp; In fact it had decreased the “disorder” of the original
segment as much as was possible.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The process in decision tree algorithms is very similar
when they build trees.&amp;nbsp; These algorithms look at all possible
distinguishing questions that could possibly break up the original training
dataset into segments that are nearly homogeneous with respect to the different
classes being predicted. Some decision tree algorithms may use heuristics in
order to pick the questions or even pick them at random.&amp;nbsp; CART picks the
questions in a very unsophisticated way: It tries them all.&amp;nbsp; After it has
tried them all CART picks the best one uses it to split the data into two more
organized segments and then again asks all possible questions on each of those
new segments individually.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
When does the tree stop growing?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
If the decision tree algorithm just continued growing the
tree like this it could conceivably create more and more questions and branches
in the tree so that eventually there was only one record in the segment. To let
the tree grow to this size is both computationally expensive but also
unnecessary.&amp;nbsp; Most decision tree algorithms stop growing the tree when one
of three criteria are met:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
The segment
    contains only one record.&amp;nbsp; (There is no further question that you could
    ask which could further refine a segment of just one.)&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
All the records
    in the segment have identical characteristics.&amp;nbsp; (There is no reason to
    continue asking further questions segmentation since all the remaining
    records are the same.)&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
The improvement
    is not substantial enough to warrant making the split.&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Why would a decision tree algorithm stop
growing the tree if there wasn’t enough data?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Consider the following example shown in Table 2.1 of a
segment that we might want to split further which has just two examples.&amp;nbsp;
It has been created out of a much larger customer database by selecting only
those customers aged 27 with blue eyes and salaries between $80,000 and $81,000.
&amp;nbsp;
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 183;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Name&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 37.05pt;" valign="top" width="49"&gt;
        &lt;div class="TableCell"&gt;
Age&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 42.3pt;" valign="top" width="56"&gt;
        &lt;div class="TableCell"&gt;
Eyes&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 56.55pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
Salary&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 70.05pt;" valign="top" width="93"&gt;
        &lt;div class="TableCell"&gt;
Churned?&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Steve&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 37.05pt;" valign="top" width="49"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 42.3pt;" valign="top" width="56"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.55pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
$80,000&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 70.05pt;" valign="top" width="93"&gt;
        &lt;div class="TableCell"&gt;
Yes&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .65in;" valign="top" width="62"&gt;
        &lt;div class="TableCell"&gt;
Alex&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 37.05pt;" valign="top" width="49"&gt;
        &lt;div class="TableCell"&gt;
27&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 42.3pt;" valign="top" width="56"&gt;
        &lt;div class="TableCell"&gt;
Blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 56.55pt;" valign="top" width="75"&gt;
        &lt;div class="TableCell"&gt;
$80,000&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 70.05pt;" valign="top" width="93"&gt;
        &lt;div class="TableCell"&gt;
No&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 2.1&amp;nbsp; &lt;i&gt;Decision tree algorithm segment.&amp;nbsp; This segment cannot be split further
except by using the predictor "name".&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In this case all of the possible questions that could be
asked about the two customers turn out to have the same value (age, eyes,
salary) except for name.&amp;nbsp; It would then be possible to ask a question like:
“Is the customer’s name Steve?” and create the segments which would be
very good at breaking apart those who churned from those who did not:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The problem is that we all have an intuition that the
name of the customer is not going to be a very good indicator of whether that
customer churns or not.&amp;nbsp; It might work well for this particular 2 record
segment but it is unlikely that it will work for other customer databases or
even the same customer database at a different time.&amp;nbsp; This particular
example has to do with overfitting the model - in this case fitting the model
too closely to the idiosyncrasies of the training data.&amp;nbsp; This can be fixed
later on but clearly stopping the building of the tree short of either one
record segments or very small segments in general is a good idea.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Decision trees aren’t necessarily
finished after the tree is grown.&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
After the tree has been grown to a certain size
(depending on the particular stopping criteria used in the algorithm) the CART
algorithm has still more work to do.&amp;nbsp; The algorithm then checks to see if
the model has been overfit to the data.&amp;nbsp; It does this in several ways using
a cross validation approach or a test set validation approach.&amp;nbsp; Basically
using the same mind numbingly simple approach it used to find the best questions
in the first place - namely trying many different simpler versions of the tree
on a held aside test set.&amp;nbsp; The tree that does the best on the held aside
data is selected by the algorithm as the best model.&amp;nbsp; The nice thing about
CART is that this testing and selection is all an integral part of the algorithm
as opposed to the after the fact approach that other techniques use.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
ID3 and an enhancement - C4.5&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
In the late 1970s J. Ross Quinlan introduced a decision
tree algorithm named ID3.&amp;nbsp; It was one of the first decision tree algorithms
yet at the same time built solidly on work that had been done on inference
systems and&amp;nbsp; concept learning systems from that decade as well as the
preceding decade.&amp;nbsp; Initially ID3 was used for tasks such as learning good
game playing strategies for chess end games.&amp;nbsp; Since then ID3 has been
applied to a wide variety of problems in both academia and industry and has been
modified, improved and borrowed from many times over.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
ID3 picks predictors and their splitting values based on
the gain in information that the split or splits provide.&amp;nbsp; Gain represents
the difference between the amount of information that is needed to correctly
make a prediction before a split is made and after the split has been made.&amp;nbsp;
If the amount of information required is much lower after the split is made then
that split has decreased the disorder of the original single segment.&amp;nbsp; Gain
is defined as the difference between the entropy of the original segment and the
accumulated entropies of the resulting split segments.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
ID3 was later enhanced in the version called C4.5.&amp;nbsp;
C4.5 improves on ID3 in several important areas:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
predictors with
    missing values can still be used&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
predictors with
    continuous values can be used&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
pruning is
    introduced&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
rule derivation&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
Many of these techniques appear in the CART algorithm
plus some others so we will go through this introduction in the CART algorithm.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
CART - Growing a forest and picking the
best tree&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
CART stands for Classification and Regression Trees and
is a data exploration and prediction algorithm developed by Leo Breiman, Jerome
Friedman, Richard Olshen and Charles Stone and is nicely detailed in their 1984
book “Classification and Regression Trees” ([Breiman, Friedman, Olshen and
Stone 19 84)].&amp;nbsp; These researchers from 

Stanford
 
University

 and the 

University
 of 
California

 at 

Berkeley

 showed how this new algorithm could be used on a variety of different problems
from to the detection of Chlorine from the data contained in a mass spectrum.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Predictors are picked as they decrease the disorder of
the data.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In building the CART tree each predictor is picked based
on how well it teases apart the records with different predictions.&amp;nbsp;&amp;nbsp;
For instance one measure that is used to determine whether a given split point
for a give predictor is better than another is the entropy metric.&amp;nbsp; The
measure originated from the work done by Claude Shannon and Warren Weaver on
information theory in 1949.&amp;nbsp; They were concerned with how information could
be efficiently communicated over telephone lines.&amp;nbsp; Interestingly, their
results also prove useful in creating decision trees.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
CART Automatically Validates the Tree&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the great advantages of CART is that the algorithm
has the validation of the model and the discovery of the optimally general model
built deeply into the algorithm.&amp;nbsp;&amp;nbsp; CART accomplishes this by building
a very complex tree and then pruning it back to the optimally general tree based
on the results of cross validation or test set validation.&amp;nbsp;&amp;nbsp;&amp;nbsp; The
tree is pruned back based on the performance of the various pruned version of
the tree on the test set data.&amp;nbsp; The most complex tree rarely fares the best
on the held aside data as it has been overfitted to the training data.&amp;nbsp; By
using cross validation the tree that is most likely to do well on new, unseen
data can be chosen.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
CART Surrogates handle missing data&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The CART algorithm is relatively robust with respect to
missing data.&amp;nbsp; If the value is missing for a particular predictor in a
particular record that record will not be used in making the determination of
the optimal split when the tree is being built.&amp;nbsp;&amp;nbsp; In effect CART will
utilizes as much information as it has on hand in order to make the decision for
picking the best possible split.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
When CART is being used&amp;nbsp; to predict on new data,
missing values can be handled via surrogates.&amp;nbsp; Surrogates are split values
and predictors that mimic the actual split in the tree and can be used when the
data for the preferred predictor is missing.&amp;nbsp; For instance though shoe size
is not a perfect predictor of height&amp;nbsp; it could be used as a surrogate to
try to mimic a split based on height when that information was missing from the
particular record being predicted with the CART model.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
CHAID&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Another equally popular decision tree technology to CART
is CHAID or Chi-Square Automatic Interaction Detector.&amp;nbsp; CHAID is similar to
CART in that it builds a decision tree but it differs in the way that it chooses
its splits. Instead of the entropy or Gini metrics for choosing optimal splits
the technique relies on the chi square test used in contingency tables to
determine which categorical predictor is furthest from independence with the
prediction values.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Because CHAID relies on the contingency tables to form
its test of significance for each predictor all predictors must either be
categorical or be coerced into a categorical form via binning (e.g. break up
possible people ages into 10 bins from 0-9, 10-19, 20-29 etc.).&amp;nbsp; Though
this binning can have deleterious consequences the actual accuracy performances
of CART and CHAID have been shown to be comparable in real world direct
marketing response models.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo21;"&gt;
2.3. Neural Networks&lt;/h3&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
What is a Neural Network?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
When data mining algorithms are talked about these days
most of the time people are talking about either decision trees or neural
networks.&amp;nbsp; Of the two neural networks have probably been of greater
interest through the formative stages of data mining technology.&amp;nbsp; As we
will see neural networks do have disadvantages that can be limiting in their
ease of use and ease of deployment, but they do also have some significant
advantages.&amp;nbsp; Foremost among these advantages is their highly accurate
predictive models that can be applied across a large number of different types
of problems.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
To be more precise with the term “neural network” one
might better speak of an “artificial&amp;nbsp; neural network”.&amp;nbsp; True
neural networks are biological systems (a k a&amp;nbsp; brains) that detect
patterns, make predictions and learn.&amp;nbsp; The artificial ones are computer
programs implementing sophisticated pattern detection and machine learning
algorithms on a computer to build predictive models from large historical
databases.&amp;nbsp; Artificial neural networks derive their name from their
historical development which started off with the premise that machines could be
made to “think” if scientists found ways to mimic the structure and
functioning of the human brain on the computer.&amp;nbsp; Thus historically neural
networks grew out of the community of Artificial Intelligence rather than from
the discipline of statistics.&amp;nbsp; Despite the fact that scientists are still
far from understanding the human brain let alone mimicking it, neural networks
that run on computers can do some of the things that people can do.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
It is difficult to say exactly when the first “neural
network” on a computer was built.&amp;nbsp; During World War II a seminal paper
was published by McCulloch and Pitts which first outlined the idea that simple
processing units (like the individual neurons in the human brain) could be
connected together in large networks to create a system that could solve
difficult problems and display behavior that was much more complex than the
simple pieces that made it up. Since that time much progress has been made in
finding ways to apply artificial neural networks to real world prediction
problems and in improving the performance of the algorithm in general.&amp;nbsp; In
many respects the greatest breakthroughs in neural networks in recent years have
been in their application to more mundane real world problems like customer
response prediction or fraud detection rather than the loftier goals that were
originally set out for the techniques such as overall human learning and
computer speech and image understanding.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Don’t Neural Networks Learn to make
better predictions?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Because of the origins of the techniques and because of
some of their early successes the techniques have enjoyed a great deal of
interest.&amp;nbsp;&amp;nbsp; To understand how neural networks can detect patterns in a
database an analogy is often made that they “learn” to detect these patterns
and make better predictions in a similar way to the way that human beings do.&amp;nbsp;
This view is encouraged by the way the historical training data is often
supplied to the network - one record (example) at a time.&amp;nbsp;&amp;nbsp; Neural
networks do “learn” in a very real sense but under the hood the algorithms
and techniques that are being deployed are not truly different from the
techniques found in statistics or other data mining algorithms.&amp;nbsp; It is for
instance, unfair to assume that neural networks could outperform other
techniques because they “learn” and improve over time while the other
techniques are static.&amp;nbsp; The other techniques if fact “learn” from
historical examples in exactly the same way but often times the examples
(historical records) to learn from a processed all at once in a more efficient
manner than neural networks which often modify their model one record at a time.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Are Neural Networks easy to use?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
A common claim for neural networks is that they are
automated to a degree where the user does not need to know that much about how
they work, or predictive modeling or even the database in order to use them.&amp;nbsp;
The implicit claim is also that most neural networks can be unleashed on your
data straight out of the box without having to rearrange or modify the data very
much to begin with.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Just the opposite is often true.&amp;nbsp; There are many
important design decisions that need to be made in order to effectively use a
neural network such as:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
How should the nodes in the network be connected?&amp;nbsp;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
How many neuron like processing units should be used?&amp;nbsp;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
When should “training” be stopped in order to
    avoid overfitting?&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
There are also many important steps required for
preprocessing the data that goes into a neural network - most often there is a
requirement to normalize numeric data between 0.0 and 1.0 and categorical
predictors may need to be broken up into virtual predictors that are 0 or 1 for
each value of the original categorical predictor.&amp;nbsp; And, as always,
understanding what the data in your database means and a clear definition of the
business problem to be solved are essential to ensuring eventual success.&amp;nbsp;
The bottom line is that neural networks provide no short cuts.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Applying Neural Networks to Business&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Neural networks are very powerful predictive modeling
techniques but some of the power comes at the expense of ease of use and ease of
deployment.&amp;nbsp; As we will see in this section, neural networks, create very
complex models that are almost always impossible to fully understand even by
experts.&amp;nbsp; The model itself is represented by numeric values in a complex
calculation that requires all of the predictor values to be in the form of a
number.&amp;nbsp; The output of the neural network is also numeric and needs to be
translated if the actual prediction value is categorical (e.g. predicting the
demand for blue, white or black jeans for a clothing manufacturer requires that
the predictor values blue, black and white for the predictor color to be
converted to numbers).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Because of the complexity of these techniques much effort
has been expended in trying to increase the clarity with which the model can be
understood by the end user.&amp;nbsp;&amp;nbsp; These efforts are still in there infancy
but are of tremendous importance since most data mining techniques including
neural networks are being deployed against real business problems where
significant investments are made based on the predictions from the models (e.g.
consider trusting the predictive model from a neural network that dictates which
one million customers will receive a $1 mailing).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
There are two ways that these shortcomings in
understanding the meaning of the neural network model have been successfully
addressed:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
The neural
    network is package up into a complete solution such as fraud prediction.&amp;nbsp;
    This allows the neural network to be carefully crafted for one particular
    application and once it has been proven successful it can be used over and
    over again without requiring a deep understanding of how it works.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
The neural
    network is package up with expert consulting services.&amp;nbsp; Here the neural
    network is deployed by trusted experts who have a track record of success.&amp;nbsp;
    Either the experts are able to explain the models or they are trusted that
    the models do work.&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
The first tactic has seemed to work quite well because
when the technique is used for a well defined problem many of the difficulties
in preprocessing the data can be automated (because the data structures have
been seen before) and interpretation of the model is less of an issue since
entire industries begin to use the technology successfully and a level of trust
is created.&amp;nbsp; There are several vendors who have deployed this strategy
(e.g. HNC’s Falcon system for credit card fraud prediction and Advanced
Software Applications ModelMAX package for direct marketing).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Packaging up neural networks with expert consultants is
also a viable strategy that avoids many of the pitfalls of using neural
networks, but it can be quite expensive because it is human intensive.&amp;nbsp; One
of the great promises of data mining is, after all, the automation of the
predictive modeling process.&amp;nbsp; These neural network consulting teams are
little different from the analytical departments many companies already have in
house.&amp;nbsp; Since there is not a great difference in the overall predictive
accuracy of neural networks over standard statistical techniques the main
difference becomes the replacement of the statistical expert with the neural
network expert. Either with statistics or neural network experts the value of
putting easy to use tools into the hands of the business end user is still not
achieved.&amp;nbsp;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Where to Use Neural Networks&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Neural networks are used in a wide variety of
applications.&amp;nbsp; They have been used in all facets of business from detecting&amp;nbsp;
the fraudulent use of credit cards and credit risk prediction to increasing the
hit rate of targeted mailings.&amp;nbsp; They also have a long history of
application in other areas such as the military for the automated driving of an
unmanned vehicle at 30 miles per hour on paved roads to biological simulations
such as learning the correct pronunciation of English words from written text.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Neural Networks for clustering&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Neural networks of various kinds can be used for
clustering and prototype creation.&amp;nbsp; The Kohonen network described in this
section is probably the most common network used for clustering and segmentation
of the database.&amp;nbsp; Typically the networks are used in a unsupervised
learning mode to create the clusters.&amp;nbsp; The clusters are created by forcing
the system to compress the data by creating prototypes or by algorithms that
steer the system toward creating clusters that compete against each other for
the records that they contain, thus ensuring that the clusters overlap as little
as possible.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Neural Networks for Outlier Analysis&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Sometimes clustering is performed not so much to keep
records together as to make it easier to see when one record sticks out from the
rest.&amp;nbsp; For instance:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Most wine distributors selling inexpensive wine in 

Missouri

 and that ship a certain volume of product produce a certain level of profit.&amp;nbsp;
There is a cluster of stores that can be formed with these characteristics.&amp;nbsp;
One store stands out, however, as producing significantly lower profit.&amp;nbsp;&amp;nbsp;
On closer examination it turns out that the distributor was delivering product
to but not collecting payment from one of their customers.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
A sale on men’s suits is being held in all branches of
a department store for southern 

California

.&amp;nbsp;&amp;nbsp; All stores with these characteristics&amp;nbsp; have seen at least a
100% jump in revenue since the start of the sale except one.&amp;nbsp; It turns out
that this store had, unlike the others,&amp;nbsp; advertised via radio rather than
television.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Neural Networks for feature extraction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the important problems in all of data mining is
that of determining which predictors are the most relevant and the most
important in building models that are most accurate at prediction.&amp;nbsp; These
predictors may be used by themselves or they may be used in conjunction with
other predictors to form “features”.&amp;nbsp; A simple example of a feature in
problems that neural networks are working on is the feature of a vertical line
in a computer image.&amp;nbsp; The predictors, or raw input data are just the
colored pixels that make up the picture.&amp;nbsp; Recognizing that the predictors
(pixels) can be organized in such a way as to create lines, and then using the
line as the input predictor can prove to dramatically improve the accuracy of
the model and decrease the time to create it.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Some features like lines in computer images are things
that humans are already pretty good at detecting, in other problem domains it is
more difficult to recognize the features.&amp;nbsp; One novel way that neural
networks have been used to detect features is the idea that features are sort of
a compression of the training database. For instance you could describe an image
to a friend by rattling off the color and intensity of each pixel on every point
in the picture or you could describe it at a higher level in terms of lines,
circles - or maybe even at a higher level of features such as trees, mountains
etc.&amp;nbsp; In either case your friend eventually gets all the information that
they need in order to know what the picture looks like, but certainly describing
it in terms of high level features requires much less communication of
information than the “paint by numbers” approach of describing the color on
each square millimeter of the image.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
If we think of features in this way, as an efficient way
to communicate our data, then neural networks can be used to automatically
extract them.&amp;nbsp;&amp;nbsp; The neural network shown in Figure 2.2 is used to
extract features by requiring the network to learn to recreate the input data at
the output nodes by using just 5 hidden nodes.&amp;nbsp; Consider that if you were
allowed 100 hidden nodes, that recreating the data for the network would be
rather trivial - simply pass the input node value directly through the
corresponding hidden node and on to the output node.&amp;nbsp; But as there are
fewer and fewer hidden nodes, that information has to be passed through the
hidden layer in a more and more efficient manner since there are less hidden
nodes to help pass along the information. 
&lt;/div&gt;
&lt;div align="center" class="MsoBodyText"&gt;
&lt;img border="0" height="314" src="http://www.thearling.com/text/dmtechniques/dmtech9.gif" width="553" /&gt; 
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 2.2 &lt;i&gt; Neural
networks can be used for data compression and feature extraction.&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In order to accomplish this the neural network tries to
have the hidden nodes extract features from the input nodes that efficiently
describe the record represented at the input layer.&amp;nbsp; This forced
“squeezing” of the data through the narrow hidden layer forces the neural
network to extract only those predictors and combinations of predictors that are
best at recreating the input record.&amp;nbsp; The link weights used to create the
inputs to the hidden nodes are effectively creating features that are
combinations of the input nodes values.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
What does a neural net look like?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
A neural network is loosely based on how some people
believe that the human brain is organized and how it learns.&amp;nbsp; Given that
there are two main structures of consequence in the neural network:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The node - which loosely corresponds to the neuron in the
human brain.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The link - which loosely corresponds to the connections
between neurons (axons, dendrites and synapses) in the human brain.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In Figure 2.3 there is a drawing of a simple neural
network.&amp;nbsp; The round circles represent the nodes and the connecting lines
represent the links.&amp;nbsp; The neural network functions by accepting predictor
values at the left and performing calculations on those values to produce new
values in the node at the far right.&amp;nbsp; The value at this node represents the
prediction from the neural network model.&amp;nbsp; In this case the network takes
in values for predictors for age and income and predicts whether the person will
default on a bank loan.&lt;/div&gt;
&lt;div align="center" class="MsoBodyText"&gt;

&lt;img border="0" height="184" src="http://www.thearling.com/text/dmtechniques/dmtech10.gif" width="432" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 2.3 &lt;i&gt; A
simplified view of a neural network for prediction of loan default.&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
How does a neural net make a prediction?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
In order to make a prediction the neural network accepts
the values for the predictors on what are called the input nodes.&amp;nbsp; These
become the values for those nodes those values are then multiplied by values
that are stored in the links (sometimes called links and in some ways similar to
the weights that were applied to predictors in the nearest neighbor method).&amp;nbsp;
These values are then added together at the node at the far right (the output
node)&amp;nbsp; a special thresholding function is applied and the resulting number
is the prediction.&amp;nbsp; In this case if the resulting number is 0 the record is
considered to be a good credit risk (no default) if the number is 1 the record
is considered to be a bad credit risk (likely default).&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
A simplified version of the calculations made in Figure 2.3
might look like what is shown in Figure 2.4.&amp;nbsp; Here the value age of 47 is
normalized to fall between 0.0 and 1.0 and has the value 0.47 and the income is
normalized to the value 0.65. This simplified neural network makes the
prediction of no default for a 47 year old making $65,000.&amp;nbsp; The links are
weighted at 0.7 and 0.1 and the resulting value after multiplying the node
values by the link weights is 0.39.&amp;nbsp; The network has been trained to learn
that an output value of 1.0 indicates default and that 0.0 indicates
non-default.&amp;nbsp; The output value calculated here (0.39) is closer to 0.0 than
to 1.0 so the record is assigned a non-default prediction.&lt;/div&gt;
&lt;div align="center" class="MsoNormal" style="page-break-after: avoid; text-align: center;"&gt;

&lt;img border="0" height="317" src="http://www.thearling.com/text/dmtechniques/dmtech11.gif" width="471" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 2.4 &lt;i&gt; The
normalized input values are multiplied by the link weights and added together at
the output.&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
How is the neural net model created?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The neural network model is created by presenting it with
many examples of the predictor values from records in the training set (in this
example age and income are used) and the prediction value from those same
records.&amp;nbsp; By comparing the correct answer obtained from the training record
and the predicted answer from the neural network it is possible to slowly change
the behavior of the neural network by changing the values of the link weights.&amp;nbsp;
In some ways this is like having a grade school teacher ask questions of her
student (a.k.a. the neural network) and if the answer is wrong to verbally
correct the student.&amp;nbsp; The greater the error the harsher the verbal
correction.&amp;nbsp; So that large errors are given greater attention at correction
than are small errors.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
For the actual neural network it is the weights of the
links that actually control the prediction value for a given record.&amp;nbsp; Thus
the particular model that is being found by the neural network is in fact fully
determined by the weights and the architectural structure of the network.&amp;nbsp;
For this reason it is the link weights that are modified each time an error is
made.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
How complex can the neural network model
become?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The models shown in the figures above have been designed
to be as simple as possible in order to make them understandable. In practice no
networks are as simple as these. Networks with many more links and many more
nodes are possible.&amp;nbsp; This was the case in the architecture of a neural
network system called NETtalk that learned how to pronounce written English
words.&amp;nbsp; Each node in this network was connected to every node in the level
above it and below it resulting in 18,629 link weights that needed to be learned
in the network.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In this network there was a row of nodes in between the
input nodes and the output nodes.&amp;nbsp; These are called hidden nodes or the
hidden layer because the values of these nodes are not visible to the end user
the way that the output nodes are (that contain the prediction) and the input
nodes (which &amp;nbsp;just contain the predictor values).&amp;nbsp; There are even more
complex neural network architectures that have more than one hidden layer.&amp;nbsp;
In practice one hidden layer seems to suffice however.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Hidden nodes are like trusted advisors to
the output nodes&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The meaning of the input nodes and the output nodes are
usually pretty well understood - and are usually defined by the end user based
on the particular problem to be solved and the nature and structure of the
database.&amp;nbsp; The hidden nodes, however, do not have a predefined meaning and
are determined by the neural network as it trains.&amp;nbsp;&amp;nbsp; Which poses two
problems:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
It is difficult
    to trust the prediction of the neural network if the meaning of these nodes
    is not well understood.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
ince the
    prediction is made at the output layer and the difference between the
    prediction and the actual value is calculated there, how is this error
    correction fed back through the hidden layers to modify the link weights
    that connect them?&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
The meaning of these hidden nodes is not necessarily well
understood but sometimes after the fact they can be looked at to see when they
are active and when they are not and derive some meaning from them.&amp;nbsp;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
The learning that goes on in the hidden
nodes.&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The learning procedure for the neural network has been
defined to work for the weights in the links connecting the hidden layer.&amp;nbsp;&amp;nbsp;
A good metaphor for how this works is to think of a military operation in some
war where there are many layers of command with a general ultimately responsible
for making the decisions on where to advance and where to retreat.&amp;nbsp;&amp;nbsp;
The general probably has several lieutenant generals advising him and each
lieutenant general probably has several major generals advising him.&amp;nbsp;&amp;nbsp;
This hierarchy continuing downward through colonels and privates at the bottom
of the hierarchy.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This is not too far from the structure of a neural
network with several hidden layers and one output node.&amp;nbsp; You can think of
the inputs coming from the hidden nodes as advice.&amp;nbsp; The link weight
corresponds to the trust that the general has in his advisors.&amp;nbsp; Some
trusted advisors have very high weights and some advisors may no be trusted and
in fact have negative weights.&amp;nbsp; The other part of the advice from the
advisors has to do with how competent the particular advisor is for a given
situation.&amp;nbsp; The general may have a trusted advisor but if that advisor has
no expertise in aerial invasion and the question at hand has to do with a
situation involving the air force this advisor may be very well trusted but the
advisor himself may not have any strong opinion one way or another.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In this analogy the link weight of a neural network to an
output unit is like the trust or confidence that a commander has in his advisors
and the actual node value represents how strong an opinion this particular
advisor has about this particular situation.&amp;nbsp; To make a decision the
general considers how trustworthy and valuable the advice is and how
knowledgeable and confident each advisor is in making their suggestion and then
taking all of this into account the general makes the decision to advance or
retreat.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In the same way the output node will make a decision (a
prediction) by taking into account all of the input from its advisors (the nodes
connected to it).&amp;nbsp; In the case of the neural network this decision is reach
by multiplying the link weight by the output value of the node and summing these
values across all nodes.&amp;nbsp; If the prediction is incorrect the nodes that had
the most influence on making the decision have their weights modified so that
the wrong prediction is less likely to be made the next time.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This learning in the neural network is very similar to
what happens when the wrong decision is made by the general.&amp;nbsp; The
confidence that the general has in all of those advisors that gave the wrong
recommendation is decreased - and all the more so for those advisors who were
very confident and vocal in their recommendation.&amp;nbsp; On the other hand any
advisors who were making the correct recommendation but whose input was not
taken as seriously would be taken more seriously the next time.&amp;nbsp; Likewise
any advisor that was reprimanded for giving the wrong advice to the general
would then go back to his advisors and determine which of them he had trusted
more than he should have in making his recommendation and who he should have
listened more closely to.&amp;nbsp;&amp;nbsp;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Sharing the blame and the glory throughout
the organization&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
This feedback can continue in this way down throughout
the organization - at each level giving increased emphasis to those advisors who
had advised correctly and decreased emphasis to those who had advised
incorrectly.&amp;nbsp; In this way the entire organization becomes better and better
and supporting the general in making the correct decision more of the time.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
A very similar method of training takes place in the
neural network.&amp;nbsp; It is called “back propagation” and refers to the
propagation of the error backwards from the output nodes (where the error is
easy to determine the difference between the actual prediction value from the
training database and the prediction from the neural network ) through the
hidden layers and to the input layers.&amp;nbsp; At each level the link weights
between the layers are updated so as to decrease the chance of making the same
mistake again.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Different types of neural networks&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
There are literally hundreds of variations on the back
propagation feedforward neural networks that have been briefly described here.&amp;nbsp;
Most having to do with changing the architecture of the neural network to
include recurrent connections where the output from the output layer is
connected back as input into the hidden layer.&amp;nbsp; These recurrent nets are
some times used for sequence prediction where the previous outputs from the
network need to be stored someplace and then fed back into the network to
provide context for the current prediction.&amp;nbsp; Recurrent networks have also
been used for decreasing the amount of time that it takes to train the neural
network.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Another twist on the neural net theme is to change the
way that the network learns.&amp;nbsp; Back propagation is effectively utilizing a
search technique called gradient descent&amp;nbsp; to search for the best possible
improvement in the link weights to reduce the error.&amp;nbsp; There are, however,
many other ways of doing search in a high dimensional space including Newton’s
methods and conjugate gradient as well as simulating the physics of&amp;nbsp;
cooling metals in a process called simulated annealing or in simulating the
search process that goes on in biological evolution and using genetic algorithms
to optimize the weights of the neural networks.&amp;nbsp; It has even been suggested&amp;nbsp;
that creating a large number of neural networks with randomly weighted links and
picking the one with the lowest error rate would be the best learning procedure.&amp;nbsp;&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Despite all of these choices, the back propagation
learning procedure is the most commonly used.&amp;nbsp; It is well understand,
relatively simple, and seems to work in a large number of problem domains.&amp;nbsp;
There are, however, two other neural network architectures that are used
relatively often.&amp;nbsp; Kohonen feature maps are often used for unsupervised
learning and clustering and Radial Basis Function networks are used for
supervised learning and in some ways represent a hybrid between nearest neighbor
and neural network classification.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Kohonen Feature Maps&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Kohonen feature maps were developed in the 1970’s and
as such were created to simulate certain brain function.&amp;nbsp; Today they are
used mostly to perform unsupervised learning and clustering.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Kohonen networks are feedforward neural networks
generally with no hidden layer.&amp;nbsp; The networks generally contain only an
input layer and an output layer but the nodes in the output layer compete
amongst themselves to display the strongest activation to a given record.&amp;nbsp;
What is sometimes called “winner take all”.&amp;nbsp;&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The networks originally came about when some of the
puzzling yet simple behaviors of the real neurons were taken into effect.&amp;nbsp;
Namely that physical locality of the neurons seems to play an important role in
the behavior and learning of neurons.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
When these networks were run, in order to simulate the
real world visual system it became that the organization that was automatically
being constructed on the data was also very useful for segmenting and clustering
the training database.&amp;nbsp; Each output node represented a cluster and nearby
clusters were nearby in the two dimensional output layer.&amp;nbsp; Each record in
the database would fall into one and only one cluster (the most active output
node) but the other clusters in which it might also fit would be shown and
likely to be next to the best matching cluster.&amp;nbsp;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
How much like a human brain is the neural
network?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Since the inception of the idea of neural networks the
ultimate goal for these techniques has been to have them recreate human thought
and learning.&amp;nbsp; This has once again proved to be a difficult task - despite
the power of these new techniques and the similarities of their architecture to
that of the human brain.&amp;nbsp; Many of the things that people take for granted
are difficult for neural networks - like avoiding overfitting and working with
real world data without a lot of preprocessing required.&amp;nbsp; There have also
been some exciting successes.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Combatting overfitting - getting a model
you can use somewhere else&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
As with all predictive modeling techniques some care must
be taken to avoid overfitting with a neural network.&amp;nbsp; Neural networks can
be quite good at overfitting training data with a predictive model that does not
work well on new data.&amp;nbsp; This is particularly problematic for neural
networks because it is difficult to understand how the model is working.&amp;nbsp;
In the early days of neural networks the predictive accuracy that was often
mentioned first was the accuracy on the training set and the vaulted or
validation set database was reported as a footnote.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This is in part due to the fact that unlike decision
trees or nearest neighbor techniques, which can quickly achieve 100% predictive
accuracy on the training database, neural networks can be trained forever and
still not be 100% accurate on the training set.&amp;nbsp; While this is an
interesting fact it is not terribly relevant since the accuracy on the training
set is of little interest and can have little bearing on the validation database
accuracy.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Perhaps because overfitting was more obvious for decision
trees and nearest neighbor approaches more effort was placed earlier on to add
pruning and editing to these techniques.&amp;nbsp; For neural networks
generalization of the predictive model is accomplished via rules of thumb and
sometimes in a more methodically way by using cross validation as is done with
decision trees.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
One way to control overfitting in neural networks is to
limit the number of links.&amp;nbsp; Since the number of links represents the
complexity of the model that can be produced, and since more complex models have
the ability to overfit while less complex ones cannot, overfitting can be
controlled by simply limiting the number of links in the neural network.&amp;nbsp;
Unfortunately there is no god theoretical grounds for picking a certain number
of links.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Test set validation can be used to avoid overfitting by
building the neural network on one portion of the training database and using
the other portion of the training database to detect what the predictive
accuracy is on vaulted data.&amp;nbsp; This accuracy will peak at some point in the
training and then as training proceeds it will decrease while the accuracy on
the training database will continue to increase.&amp;nbsp; The link weights for the
network can be saved when the accuracy on the held aside data peaks.&amp;nbsp; The
NeuralWare product, and others, provide an automated function that saves out the
network when it is best performing on the test set and even continues to search
after the minimum is reached.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Explaining the network&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the indictments against neural networks is that it
is difficult to understand the model that they have built and also how the raw
data effects the output predictive answer.&amp;nbsp; With nearest neighbor
techniques prototypical records are provided to “explain” why the prediction
is made, and decision trees provide rules that can be translated in to English
to explain why a particular prediction was made for a particular record.&amp;nbsp;&amp;nbsp;
The complex models of the neural network are captured solely by the link weights
in the network which represent a very complex mathematical equation.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
There have been several attempts to alleviate these basic
problems of the neural network.&amp;nbsp; The simplest approach is to actually look
at the neural network and try to create plausible explanations for the meanings
of the hidden nodes.&amp;nbsp; Some times this can be done quite successfully.&amp;nbsp;
In the example given at the beginning of this section the hidden nodes of the
neural network seemed to have extracted important distinguishing features in
predicting the relationship between people by extracting information like
country of origin.&amp;nbsp; Features that it would seem that a person would also
extract and use for the prediction.&amp;nbsp; But there were also many other hidden
nodes, even in this particular example that were hard to explain and didn’t
seem to have any particular purpose.&amp;nbsp; Except that they aided the neural
network in making the correct prediction.
&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo21;"&gt;
2.4. Rule Induction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Rule induction is one of the major forms of data mining
and is perhaps the most common form of knowledge discovery in unsupervised
learning systems.&amp;nbsp; It is also perhaps the form of data mining that most
closely resembles the process that most people think about when they think about
data mining, namely “mining” for gold through a vast database.&amp;nbsp; The
gold in this case would be a rule that is interesting - that tells you something
about your database that you didn’t already know and probably weren’t able
to explicitly articulate (aside from saying “show me things that are
interesting”).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Rule induction on a data base can be a massive
undertaking where all possible patterns are systematically pulled out of the
data and then an accuracy and significance are added to them that tell the user
how strong the pattern is and how likely it is to occur again.&amp;nbsp; In general
these rules are relatively simple such as for a market basket database of items
scanned in a consumer market basket you might find interesting correlations in
your database such as:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
If bagels are
    purchased then cream cheese is purchased 90% of the time and this pattern
    occurs in 3% of all shopping baskets.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
If live plants
    are purchased from a hardware store then plant fertilizer is purchased 60%
    of the time and these two items are bought together in 6% of the shopping
    baskets.&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
The rules that are pulled from the database are extracted
and ordered to be presented to the user based on the percentage of times that
they are correct and how often they apply.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The bane of rule induction systems is also its strength -
that it retrieves all possible interesting patterns in the database.&amp;nbsp; This
is a strength in the sense that it leaves no stone unturned but it can also be
viewed as a weaknes because the user can easily become overwhelmed with such a
large number of rules that it is difficult to look through all of them.&amp;nbsp;
You almost need a second pass of data mining to go through the list of
interesting rules that have been generated by the rule induction system in the
first place in order to find the most valuable gold nugget amongst them all.
This overabundance of patterns can also be problematic for the simple task of
prediction because all possible patterns are culled from the database there may
be conflicting predictions made by equally interesting rules.&amp;nbsp;&amp;nbsp;
Automating the process of culling the most interesting rules and of combing the
recommendations of a variety of rules are well handled by many of the
commercially available rule induction systems on the market today and is also an
area of active research.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Applying Rule induction to Business&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Rule induction systems are highly automated and are
probably the best of data mining techniques for exposing all possible predictive
patterns in a database.&amp;nbsp; They can be modified to for use in prediction
problems but the algorithms for combining evidence from a variety of rules comes
more from rules of thumbs and practical experience.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In comparing data mining techniques along an axis of
explanation neural networks would be at one extreme of the data mining
algorithms and rule induction systems at the other end.&amp;nbsp; Neural networks
are extremely proficient and saying exactly what must be done in a prediction
task (e.g. who do I give credit to /&amp;nbsp; who do I deny credit to) with little
explanation.&amp;nbsp; Rule induction systems when used for prediction on the other
hand are like having a committee of trusted advisors each with a slightly
different opinion as to what to do but relatively well grounded reasoning and a
good explanation for why it should be done.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The business value of rule induction techniques reflects
the highly automated way in which the rules are created which makes it easy to
use the system but also that this approach can suffer from an overabundance of
interesting patterns which can make it complicated in order to make a prediction
that is directly tied to return on investment (ROI).&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
What is a rule?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
In rule induction systems the rule itself is of a simple
form of “if this and this and this then this”.&amp;nbsp; For example a rule that
a supermarket might find in their data collected from scanners would be: “if
pickles are purchased then ketchup is purchased’.&amp;nbsp; Or&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
If paper plates then plastic forks&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
If dip then potato chips&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
If salsa then tortilla chips&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
In order for the rules to be useful there are two pieces
of information that must be supplied as well as the actual rule:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Accuracy - How often is the rule correct?&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Coverage - How often does the rule apply?&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="MsoBodyText"&gt;
Just because the pattern in the data base is expressed as
rule does not mean that it is true all the time.&amp;nbsp; Thus just like in other
data mining algorithms it is important to recognize and make explicit the
uncertainty in the rule.&amp;nbsp; This is what the accuracy of the rule means.&amp;nbsp;
The coverage of the rule has to do with how much of the database the rule
“covers” or applies to.&amp;nbsp; Examples of these two measure for a variety of
rules is shown in Table 2.2.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In some cases accuracy is called the confidence of the
rule and coverage is called the support.&amp;nbsp; Accuracy and coverage appear to
be the preferred ways of naming these two measurements.&amp;nbsp;&lt;br style="mso-special-character: line-break;" /&gt;
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 183;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 297.9pt;" valign="top" width="397"&gt;
        &lt;div class="TableCell"&gt;
Rule&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
Accuracy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
Coverage&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 297.9pt;" valign="top" width="397"&gt;
        &lt;div class="TableCell"&gt;
If breakfast cereal purchased then milk purchased.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
85%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
20%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 297.9pt;" valign="top" width="397"&gt;
        &lt;div class="TableCell"&gt;
If bread purchased then swiss cheese purchased.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
15%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
6%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 297.9pt;" valign="top" width="397"&gt;
        &lt;div class="TableCell"&gt;
If 42 years old and purchased pretzels and
        purchased dry roasted peanuts then beer will be purchased.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
95%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.45pt;" valign="top" width="97"&gt;
        &lt;div class="TableCell"&gt;
0.01%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 2.2&amp;nbsp; &lt;i&gt;Examples of Rule Accuracy and Coverage&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The rules themselves consist of two halves.&amp;nbsp; The
left hand side is called the antecedent and the right hand side is called the
consequent.&amp;nbsp; The antecedent can consist of just one condition or multiple
conditions which must all be true in order for the consequent to be true at the
given accuracy.&amp;nbsp; Generally the consequent is just a single condition
(prediction of purchasing just one grocery store item) rather than multiple
conditions.&amp;nbsp; Thus rules such as: “if x and y then a and b and c”.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
What to do with a rule&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
When the rules are mined out of the database the rules
can be used either for understanding better the business problems that the data
reflects or for performing actual predictions against some predefined prediction
target.&amp;nbsp; Since there is both a left side and a right side to a rule
(antecedent and consequent) they can be used in several ways for your business.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Target the antecedent.&amp;nbsp; In this case all rules that
have a certain value for the antecedent are gathered and displayed to the user.&amp;nbsp;
For instance a grocery store may request all rules that have nails, bolts or
screws as the antecedent in order to try to understand whether discontinuing the
sale of these low margin items will have any effect on other higher margin.&amp;nbsp;
For instance maybe people who buy nails also buy expensive hammers but
wouldn’t do so at the store if the nails were not available.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Target the consequent.&amp;nbsp; In this case all rules that
have a certain value for the consequent can be used to understand what is
associated with the consequent and perhaps what affects the consequent.&amp;nbsp;
For instance it might be useful to know all of the interesting rules that have
“coffee” in their consequent.&amp;nbsp; These may well be the rules that affect
the purchases of coffee and that a store owner may want to put close to the
coffee in order to increase the sale of both items.&amp;nbsp; Or it might be the
rule that the coffee manufacturer uses to determine in which magazine to place
their next coupons.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Target based on accuracy.&amp;nbsp; Some times the most
important thing for a user is the accuracy of the rules that are being
generated.&amp;nbsp; Highly accurate rules of 80% or 90% imply strong relationships
that can be exploited even if they have low coverage of the database and only
occur a limited number of times.&amp;nbsp; For instance a rule that only has 0.1%
coverage but 95% can only be applied one time out of one thousand but it will
very likely be correct.&amp;nbsp; If this one time is highly profitable that it can
be worthwhile.&amp;nbsp; This, for instance, is how some of the most successful data
mining applications work in the financial markets - looking for that limited
amount of time where a very confident prediction can be made.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Target based on coverage.&amp;nbsp; Some times user want to
know what the most ubiquitous rules are or those rules that are most readily
applicable.&amp;nbsp;&amp;nbsp; By looking at rules ranked by coverage they can quickly
get a high level view of what is happening within their database most of the
time.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Target based on “interestingness”.&amp;nbsp; Rules are
interesting when they have high coverage and high accuracy and deviate from the
norm. There have been many ways that rules have been ranked by some measure of
interestingness so that the trade off between coverage and accuracy can be made.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Since rule induction systems are so often used for
pattern discovery and unsupervised learning it is less easy to compare them.&amp;nbsp;
For example it is very easy for just about any rule induction system to generate
all possible rules, it is, however, much more difficult to devise a way to
present those rules (which could easily be in the hundreds of thousands) in a
way that is most useful to the end user.&amp;nbsp; When interesting rules are found
they usually have been created to find relationships between many different
predictor values in the database not just one well defined target of the
prediction.&amp;nbsp; For this reason it is often much more difficult to assign a
measure of value to the rule aside from its interestingness.&amp;nbsp; For instance
it would be difficult to determine the monetary value of knowing that if people
buy breakfast sausage they also buy eggs 60% of the time.&amp;nbsp; For data mining
systems that are more focused on prediction for things like customer attrition,
targeted marketing response or risk it is much easier to measure the value of
the system and compare it to other systems and other methods for solving the
problem.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Caveat: Rules do not imply causality&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
It is important to recognize that even though the
patterns produced from rule induction systems are delivered as if then rules
they do not necessarily mean that the left hand side of the rule (the “if”
part) causes the right hand side of the rule (the “then” part) to happen.&amp;nbsp;
Purchasing cheese does not cause the purchase of wine even though the rule if
cheese then wine may be very strong.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
This is particularly important to remember for rule
induction systems because the results are presented as if this then that as many
causal relationships are presented.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Types of databases used for rule induction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Typically rule induction is used on databases with either
fields of high cardinality (many different values) or many columns of binary
fields.&amp;nbsp; The classical case of this is the super market basket data from
store scanners that contains individual product names and quantities and may
contain tens of thousands of different items with different packaging that
create hundreds of thousands of SKU identifiers (Stock Keeping Units).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Sometimes in these databases the concept of a record is
not easily defined within the database - consider the typical Star Schema for
many data warehouses that store the supermarket transactions as separate entries
in the fact table.&amp;nbsp; Where the columns in the fact table are some unique
identifier of the shopping basket (so all items can be noted as being in the
same shopping basket), the quantity, the time of purchase, whether the item was
purchased with a special promotion (sale or coupon).&amp;nbsp; Thus each item in the
shopping basket has a different row in the fact table.&amp;nbsp; This layout of the
data is not typically the best for most data mining algorithms which would
prefer to have the data structured as&amp;nbsp; one row per shopping basket and each
column to represent the presence or absence of a given item.&amp;nbsp; This can be
an expensive way to store the data, however, since the typical grocery store
contains 60,000 SKUs or different items that could come across the checkout
counter.&amp;nbsp; This structure of the records can also create a very high
dimensional space (60,000 binary dimensions) which would be unwieldy for many
classical data mining algorithms like neural networks and decision trees.&amp;nbsp;
As we’ll see several tricks are played to make this computationally feasible
for the data mining algorithm while not requiring a massive reorganization of
the database.&lt;/div&gt;
&lt;h3&gt;
Discovery&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The claim to fame of these ruled induction systems is
much more so for knowledge discovers in unsupervised learning systems than it is
for prediction.&amp;nbsp; These systems provide both a very detailed view of the
data where significant patterns that only occur a small portion of the time and
only can be found when looking at the detail data as well as a broad overview of
the data where some systems seek to deliver to the user an overall view of the
patterns contained n the database.&amp;nbsp; These systems thus display a nice
combination of both micro and macro views:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Macro Level - Patterns that cover many situations are
    provided to the user that can be used very often and with great confidence
    and can also be used to summarize the database.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoBodyText"&gt;
Micro Level - Strong rules that cover only a very few
    situations can still be retrieved by the system and proposed to the end
    user.&amp;nbsp; These may be valuable if the situations that are covered are
    highly valuable (maybe they only apply to the most profitable customers) or
    represent a small but growing subpopulation which may indicate a market
    shift or the emergence of a new competitor (e.g. customers are only being
    lost in one particular area of the country where a new competitor is
    emerging).&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
Prediction&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
After the rules are created and their interestingness is
measured&amp;nbsp; there is also a call for performing prediction with the rules.&amp;nbsp;
Each rule by itself can perform prediction - the consequent is the target and
the accuracy of the rule is the accuracy of the prediction.&amp;nbsp; But because
rule induction systems produce many rules for a given antecedent or consequent
there can be conflicting predictions with different accuracies.&amp;nbsp; This is an
opportunity for improving the overall performance of the systems by combining
the rules.&amp;nbsp; This can be done in a variety of ways by summing the accuracies
as if they were weights or just by taking the prediction of the rule with the
maximum accuracy.&amp;nbsp;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Table 2.3 shows how a given consequent or antecedent can
be part of many rules with different accuracies and coverages.&amp;nbsp;&amp;nbsp; From
this example consider the prediction problem of trying to predict whether milk
was purchased based solely on the other items that were in the shopping basket.&amp;nbsp;
If&amp;nbsp; the shopping basket contained only bread then from the table we would
guess that there was a 35% chance that milk was also purchased.&amp;nbsp; If,
however, bread and butter and eggs and cheese were purchased what would be the
prediction for milk then?&amp;nbsp; 65% chance of milk because the relationship
between butter and milk is the greatest at 65%?&amp;nbsp; Or would all of the other
items in the basket increase even further the chance of milk being purchased to
well beyond 65%?&amp;nbsp; Determining how to combine evidence from multiple rules
is a key part of the algorithms for using rules for prediction.
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 183;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
Antecedent&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
Consequent&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
Accuracy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
Coverage&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
bagels&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
cream cheese&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
80%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
5%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
bagels&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
orange juice&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
40%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
3%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
bagels&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
coffee&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
40%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
2%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 4;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
bagels&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
eggs&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
25%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
2%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 5;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
bread&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
milk&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
35%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
30%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 6;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
butter&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
milk&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
65%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
20%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 7;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
eggs&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
milk&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
35%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
15%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 8; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
cheese&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
milk&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
40%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="148"&gt;
        &lt;div class="TableCell"&gt;
8%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Table 2.3 &lt;i&gt;
Accuracy and Coverage in Rule Antecedents and Consequents&lt;/i&gt;&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
The General Idea&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
The general idea of a rule classification system is that
rules are created that show the relationship between events captured in your
database.&amp;nbsp; These rules can be simple with just one element in the
antecedent or they might be more complicated with many column value pairs in the
antecedent all joined together by a conjunction (item1 and item2 and item3 …
must all occur for the antecedent to be true).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The rules are used to find interesting patterns in the
database but they are also used at times for prediction.&amp;nbsp;&amp;nbsp; There are
two main things that are important to understanding a rule:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Accuracy - Accuracy refers to the probability that if the
antecedent is true that the precedent will be true.&amp;nbsp; High accuracy means
that this is a rule that is highly dependable.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Coverage - Coverage refers to the number of records in
the database that the rule applies to.&amp;nbsp; High coverage means that the rule
can be used very often and also that it is less likely to be a spurious artifact
of the sampling technique or idiosyncrasies of the database.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
The business importance of accuracy and
coverage&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
From a business perspective accurate rules are important
because they imply that there is useful predictive information in the database
that can be exploited - namely that there is something far from independent
between the antecedent and the consequent.&amp;nbsp; The lower the accuracy the
closer the rule comes to just random guessing.&amp;nbsp; If the accuracy is
significantly below that of what would be expected from random guessing then the
negation of the antecedent may well in fact be useful (for instance people who
buy denture adhesive are much less likely to buy fresh corn on the cob than
normal).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
From a business perspective coverage implies how often
you can use a useful rule.&amp;nbsp; For instance you may have a rule that is 100%
accurate but is only applicable in 1 out of every 100,000 shopping baskets.&amp;nbsp;
You can rearrange your shelf space to take advantage of this fact but it will
not make you much money since the event is not very likely to happen.&amp;nbsp; Table 2.4.&amp;nbsp;
Displays the trade off between coverage and accuracy.
&lt;/div&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid black .75pt; mso-border-insideh: .75pt solid black; mso-border-insidev: .75pt solid black; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 183;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-bottom-alt: solid black .75pt; mso-border-right-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal"&gt;

        &amp;nbsp;
        &lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-pattern: gray-15 auto; mso-shading: windowtext; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal"&gt;
Accuracy Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-pattern: gray-15 auto; mso-shading: windowtext; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal"&gt;
Accuracy High&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="background: #D9D9D9; border-top: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; mso-pattern: gray-15 auto; mso-shading: windowtext; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal" style="line-height: normal;"&gt;
Coverage High&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal" style="line-height: normal;"&gt;
Rule is rarely correct
        but can be used often.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal" style="line-height: normal;"&gt;
Rule is often correct
        and can be used often.&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="background: #D9D9D9; border-top: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; mso-pattern: gray-15 auto; mso-shading: windowtext; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal" style="line-height: normal;"&gt;
Coverage Low&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal" style="line-height: normal;"&gt;
Rule is rarely correct
        and can be only rarely used.&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 2.05in;" valign="top" width="197"&gt;
        &lt;div class="MsoNormal" style="line-height: normal;"&gt;
Rule is often correct
        but can be only rarely used.&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoBodyText" style="text-align: center;"&gt;
Table 2.4&amp;nbsp; &lt;i&gt;Rule coverage versus
accuracy&lt;/i&gt;.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Trading off accuracy and coverage is like
betting at the track&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
An analogy between coverage and accuracy and making money
is the following from betting on horses.&amp;nbsp; Having a high accuracy rule with
low coverage would be like owning a race horse that always won when he raced but
could only race once a year.&amp;nbsp; In betting, you could probably still make a
lot of money on such a horse.&amp;nbsp; In rule induction for retail stores it is
unlikely that finding that one rule between mayonnaise, ice cream and sardines
that seems to always be true will have much of an impact on your bottom line.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
How to evaluate the rule&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One way to look at accuracy and coverage is to see how
they relate so some simple statistics and how they can be represented
graphically.&amp;nbsp; From statistics coverage is simply the a priori probability
of the antecedent and the consequent occurring at the same time. The accuracy is
just the probability of the consequent conditional on the precedent.&amp;nbsp; So,
for instance the if we were looking at the following database of super market
basket scanner data we would need the following information in order to
calculate the accuracy and coverage for a simple rule (let’s say milk purchase
implies eggs purchased).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
T = 100 = Total number of shopping baskets in the
database.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
E = 30 = Number of baskets with eggs in them.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
M = 40 = Number of baskets with milk in them.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
B = 20 = Number of baskets with both eggs and milk in
them.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Accuracy is then just the number of baskets with eggs and
milk in them divided by the number of baskets with milk in them.&amp;nbsp; In this
case that would be 20/40 = 50%.&amp;nbsp; The coverage would be the number of
baskets with milk in them divided by the total number of baskets.&amp;nbsp; This
would be 40/100 = 40%.&amp;nbsp; This can be seen graphically in Figure 2.5.&lt;/div&gt;
&lt;div align="center" class="MsoNormal" style="page-break-after: avoid; text-align: center;"&gt;

&lt;img border="0" height="349" src="http://www.thearling.com/text/dmtechniques/dmtech12.gif" width="466" /&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoCaption" style="text-align: center;"&gt;
Figure 2.5 &lt;i&gt;
Graphically the total number of shopping baskets can be represented in a space
and the number of baskets containing eggs or milk can be represented by the area
of a circle.&amp;nbsp; The coverage of the rule “If Milk then Eggs” is just the
relative size of the circle corresponding to milk.&amp;nbsp; The accuracy is the
relative size of the overlap between the two to the circle representing milk
purchased.&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Notice that we haven’t used E&amp;nbsp; the number of
baskets with eggs in these calculations. One way that eggs could be used would
be to calculate the expected number of baskets with eggs and milk in them based
on the independence of the events.&amp;nbsp; This would give us some sense of how
unlikely and how special the event is that 20% of the baskets have both eggs and
milk in them.&amp;nbsp; Remember from the statistics section that if two events are
independent (have no effect on one another) that the product of their individual
probabilities of occurrence should equal the probability of the occurrence of
them both together.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
If the purchase of eggs and milk were independent of each
other one would expect that 0.3 x 0.4 = 0.12 or 12% of the time we would see
shopping baskets with both eggs and milk in them.&amp;nbsp; The fact that this
combination of products occurs 20% of the time is out of the ordinary if these
events were independent.&amp;nbsp; That is to say there is a good chance that the
purchase of one effects the other and the degree to which this is the case could
be calculated through statistical tests and hypothesis testing.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Defining “interestingness”&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One of the biggest problems with rule induction systems
is the sometimes overwhelming number of rules that are produced.&amp;nbsp; Most of
which have no practical value or interest.&amp;nbsp; Some of the rules are so
inaccurate that they cannot be used, some have so little coverage that though
they are interesting they have little applicability, and finally many of the
rules capture patterns and information that the user is already familiar with.
To combat this problem researchers have sought to measure the usefulness or
interestingness of rules.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Certainly any measure of interestingness would have
something to do with accuracy and coverage.&amp;nbsp; We might also expect it to
have at least the following four basic behaviors:&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Interestingness =
    0 if the accuracy of the rule is equal to the background accuracy (a priori
    probability of the consequent).&amp;nbsp; The example in Table 2.5 shows an
    example of this.&amp;nbsp; Where a rule for attrition is no better than just
    guessing the overall rate of attrition.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Interestingness
    increases as accuracy increases (or decreases with decreasing accuracy) if
    the coverage is fixed.&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Interestingness
    increases or decreases with coverage if accuracy stays fixed&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
    &lt;div class="MsoListBullet" style="mso-list: l11 level1 lfo15;"&gt;
Interestingness
    decreases with coverage for a fixed number of correct responses (remember
    accuracy equals the number of correct responses divided by the coverage).&lt;br /&gt;
  &lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div align="center"&gt;

  &lt;center&gt;
  &lt;table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; mso-border-alt: solid gray .75pt; mso-border-insideh: .75pt solid gray; mso-border-insidev: .75pt solid gray; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 183;"&gt;
    &lt;tbody&gt;
&lt;tr style="mso-yfti-irow: 0;"&gt;
      &lt;td style="background: #D9D9D9; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 171.9pt;" valign="top" width="229"&gt;
        &lt;div class="TableCell"&gt;
Antecedent&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 135.3pt;" valign="top" width="180"&gt;
        &lt;div class="TableCell"&gt;
Consequent&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: .9in;" valign="top" width="86"&gt;
        &lt;div class="TableCell"&gt;
Accuracy&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="background: #D9D9D9; border-left: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-pattern: gray-15 black; mso-shading: white; padding: 0in 5.4pt 0in 5.4pt; width: 70.8pt;" valign="top" width="94"&gt;
        &lt;div class="TableCell"&gt;
Coverage&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 1;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 171.9pt;" valign="top" width="229"&gt;
        &lt;div class="TableCell"&gt;
&amp;lt;no constraints&amp;gt;&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 135.3pt;" valign="top" width="180"&gt;
        &lt;div class="TableCell"&gt;
then customer will attrite&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .9in;" valign="top" width="86"&gt;
        &lt;div class="TableCell"&gt;
10%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 70.8pt;" valign="top" width="94"&gt;
        &lt;div class="TableCell"&gt;
100%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 2;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 171.9pt;" valign="top" width="229"&gt;
        &lt;div class="TableCell"&gt;
If customer balance &amp;gt; $3,000&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 135.3pt;" valign="top" width="180"&gt;
        &lt;div class="TableCell"&gt;
then customer will attrite&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .9in;" valign="top" width="86"&gt;
        &lt;div class="TableCell"&gt;
10%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 70.8pt;" valign="top" width="94"&gt;
        &lt;div class="TableCell"&gt;
60%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 3;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 171.9pt;" valign="top" width="229"&gt;
        &lt;div class="TableCell"&gt;
If customer eyes = blue&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 135.3pt;" valign="top" width="180"&gt;
        &lt;div class="TableCell"&gt;
then customer will attrite&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .9in;" valign="top" width="86"&gt;
        &lt;div class="TableCell"&gt;
10%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 70.8pt;" valign="top" width="94"&gt;
        &lt;div class="TableCell"&gt;
30%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr style="mso-yfti-irow: 4; mso-yfti-lastrow: yes;"&gt;
      &lt;td style="border-top: none; border: solid gray 1.0pt; mso-border-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 171.9pt;" valign="top" width="229"&gt;
        &lt;div class="TableCell"&gt;
If customer social security number = 144 30 8217&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 135.3pt;" valign="top" width="180"&gt;
        &lt;div class="TableCell"&gt;
then customer will attrite&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .9in;" valign="top" width="86"&gt;
        &lt;div class="TableCell"&gt;
100%&lt;/div&gt;
&lt;/td&gt;
      &lt;td style="border-bottom: solid gray 1.0pt; border-left: none; border-right: solid gray 1.0pt; border-top: none; mso-border-alt: solid gray .75pt; mso-border-left-alt: solid gray .75pt; mso-border-top-alt: solid gray .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 70.8pt;" valign="top" width="94"&gt;
        &lt;div class="TableCell"&gt;
0.000001%&lt;/div&gt;
&lt;/td&gt;
    &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;div align="center" class="MsoBodyText" style="text-align: center;"&gt;
Table 2.5 &lt;i&gt;
Uninteresting rules&lt;/i&gt;&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
There are a variety of measures of interestingness that
are used that have these general characteristics.&amp;nbsp; They are used for
pruning back the total possible number of rules that might be generated and then
presented to the user.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Other measures of usefulness&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Another important measure is that of simplicity of the
rule.&amp;nbsp; This is an important solely for the end user.&amp;nbsp; As complex
rules, as powerful and as interesting as they might be, may be difficult to
understand or to confirm via intuition.&amp;nbsp; Thus the user has a desire to see
simpler rules and consequently this desire can be manifest directly in the rules
that are chosen and supplied automatically to the user.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Finally a measure of novelty is also required both during
the creation of the rules - so that rules that are redundant but strong are less
favored to be searched than rules that may not be as strong but cover important
examples that are not covered by other strong rules.&amp;nbsp; For instance there
may be few historical records to provide rules on a little sold grocery item
(e.g.&amp;nbsp; mint jelly) and they may have low accuracy but since there are so
few possible rules even though they are not interesting they will be “novel”
and should be retained and presented to the user for that reason alone.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Rules vs. Decision trees&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Decision trees also produce rules but in a very different
way than rule induction systems.&amp;nbsp; The main difference between the rules
that are produced by decision trees and rule induction systems is as follows:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Decision trees produce rules that are mutually exclusive
and collectively exhaustive with respect to the training database while rule
induction systems produce rules that are not mutually exclusive and might be
collectively exhaustive.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In plain English this means that for an given record
there will be a rule to cover it and there will only be one rule for rules that
come from decision trees.&amp;nbsp; There may be many rules that match a given
record from a rule induction system and for many systems it is not guaranteed
that a rule will exist for each and every possible record that might be
encountered (though most systems do create very general default rules to capture
these records).&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
The reason for this difference is the way in which the
two algorithms operate.&amp;nbsp; Rule induction seeks to go from the bottom up and
collect all possible patterns that are interesting and then later use those
patterns for some prediction target.&amp;nbsp; Decisions trees on the other hand
work from a prediction target downward in what is known as a “greedy”
search.&amp;nbsp; Looking for the best possible split on the next step (i.e.
greedily picking the best one without looking any further than the next step).&amp;nbsp;
Though the greedy algorithm can make choices at the higher levels of the tree
which are less than optimal at the lower levels of the tree it is very good at
effectively squeezing out any correlations between predictors and the
prediction.&amp;nbsp; Rule induction systems on the other hand retain all possible
patterns even if they are redundant or do not aid in predictive accuracy.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
For instance, consider that in a rule induction system
that if there were two columns of data that were highly correlated (or in fact
just simple transformations of each other) they would result in two rules
whereas in a decision tree one predictor would be chosen and then since the
second one was redundant it would not be chosen again.&amp;nbsp; An example might be
the two predictors annual charges and average monthly charges (average monthly
charges being the annual charges divided by 12).&amp;nbsp; If the amount charged was
predictive then the decision tree would choose one of the predictors and use it
for a split point somewhere in the tree.&amp;nbsp; The decision tree effectively
“squeezed” the predictive value out of the predictor and then moved onto the
next. A rule induction system would on the other hand create two rules. Perhaps
something like:&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
If annual charges &amp;gt; 12,000 then default = true 90%
accuracy&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
If average monthly charges &amp;gt; 1,000 the default = true
90% accuracy.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In this case we’ve shown an extreme case where two
predictors were exactly the same, but there can also be less extreme cases.&amp;nbsp;
For instance height might be used rather than shoe size in the decision tree
whereas in a rule induction system both would be presented as rules.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Neither one technique or the other is necessarily better
though having a variety of rules and predictors helps with the prediction when
there are missing values.&amp;nbsp; For instance if the decision tree did choose
height as a split point but that predictor was not captured in the record (a
null value) but shoe size was the rule induction system would still have a
matching rule to capture this record.&amp;nbsp; Decision trees do have ways of
overcoming this difficulty by keeping “surrogates” at each split point that
work almost as well at splitting the data as does the chosen predictor.&amp;nbsp; In
this case shoe size might have been kept as a surrogate for height at this
particular branch of the tree.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Another commonality between decision trees
and rule induction systems&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
One other thing that decision trees and rule induction
systems have in common is the fact that they both need to find ways to combine
and simplify rules.&amp;nbsp;&amp;nbsp; In a decision tree this can be as simple as
recognizing that if a lower split on a predictor is more constrained than a
split on the same predictor further up in the tree that both don’t need to be
provided to the user but only the more restrictive one. For instance if the
first split of the tree is age &amp;lt;= 50 years and the lowest split for the given
leaf is age &amp;lt;= 30 years then only the latter constraint needs to be captured
in the rule for that leaf.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Rules from rule induction systems are generally created
by taking a simple high level rule and adding new constraints to it until the
coverage gets so small as to not be meaningful.&amp;nbsp; This means that the rules
actually have families or what is called “cones of specialization” where one
more general rule can be the parent of many more specialized rules.&amp;nbsp;&amp;nbsp;
These cones then can be presented to the user as high level views of the
families of rules and can be viewed in a hierarchical manner to aid in
understanding.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level2 lfo21;"&gt;
2.5. Which Technique and When?&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
Clearly one of the hardest things to do when deciding to
implement a data&amp;nbsp; mining system is to determine which technique to use
when.&amp;nbsp; When are neural networks appropriate and when are decision trees
appropriate?&amp;nbsp; When is data mining appropriate at all as opposed to just
working with relational databases and reporting?&amp;nbsp; When would just using
OLAP and a multidimensional database be appropriate?&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Some of the criteria that are important in determining
the technique to be used are determined by trial and error.&amp;nbsp; There are
definite differences in the types of problems that are most conducive to each
technique but the reality of real world data and the dynamic way in which
markets, customers and hence the data that represents them&amp;nbsp; is formed means
that the data is constantly changing.&amp;nbsp; These dynamics mean that it no
longer makes sense to build the "perfect" model on the historical data
since whatever&amp;nbsp; was known in the past cannot adequately predict the future
because the future is so unlike what has gone before.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
In some ways this situation is analogous to the business
person who is waiting for all information to come in before they make their
decision.&amp;nbsp; They are trying out different scenarios, different formulae and
researching new sources of information.&amp;nbsp; But this is a task that will never
be accomplished - at least in part because the business the economy and even the
world is changing in unpredictable and even chaotic ways that could never be
adequately predicted.&amp;nbsp;&amp;nbsp; Better to take a robust model that perhaps is
an under-performer compared to what some of the best data mining tools could
provide with a great deal of analysis and execute it today rather than to wait
until tomorrow when it may be too late.&lt;/div&gt;
&lt;h3 style="mso-list: l10 level3 lfo21;"&gt;
Balancing exploration and exploitation&lt;/h3&gt;
&lt;div class="MsoBodyText"&gt;
There is always the trade off between exploration
(learning more and gathering more facts) and exploitation (taking immediate
advantage of everything that is currently known).&amp;nbsp; This theme of
exploration versus exploitation is echoed also at the level of collecting data
in a targeted marketing system:&amp;nbsp; from a limited population of
prospects/customers to choose from how many to you sacrifice to exploration
(trying out new promotions or messages at random) versus optimizing what you
already know.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
There was for instance no reasonable way that Barnes and
Noble bookstores could in 1995 look at past sales figures and foresee the impact
that Amazon books and others would have based on the internet sales model.&lt;/div&gt;
&lt;div class="MsoBodyText"&gt;
Compared to historic sales and marketing data the event
of the internet could not be predicted based on the data alone.&amp;nbsp; Instead
perhaps data mining could have been used to detect trends of decreased sales to
certain customer sub-populations - such as to those involved in the high tech
industry that were the first to begin to buy books online at Amazon.&lt;/div&gt;
So caveat emptor - use the data mining tools well but
strike while the iron is hot.&amp;nbsp; The performance of predictive model provided
by data mining tools have a limited half life of decay.&amp;nbsp; Unlike a good
bottle of wine they do not increase in value with age.&lt;br /&gt;
&lt;br /&gt;
Excerpted from the book&lt;b&gt;  &lt;a href="http://www.amazon.com/exec/obidos/ISBN=0071344446/kurtthearlinsdatA/" target="_blank"&gt;Building
Data Mining Applications for CRM&lt;/a&gt; &lt;/b&gt;&lt;br /&gt;
by Alex Berson, Stephen Smith, and Kurt Thearling</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>Data Preprocessing</title><link>http://freelearningcenter.blogspot.com/2011/12/doing-data-preprocessing.html</link><category>Data Mining</category><category>Information Technology</category><author>noreply@blogger.com (Andy)</author><pubDate>Thu, 15 Dec 2011 22:31:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-4737733584405281954</guid><description>&lt;span class="" id="result_box" lang="en"&gt;&lt;span class="hps"&gt;Stages&lt;/span&gt; &lt;span class="hps"&gt;in performing&lt;/span&gt; &lt;a href="http://freelearningcenter.blogspot.com/2011/12/concept-of-data-mining.html" target="_blank"&gt;&lt;span class="hps"&gt;data mining&lt;/span&gt;&lt;/a&gt; &lt;span class="hps"&gt;one of them is&lt;/span&gt; &lt;a href="http://freelearningcenter.blogspot.com/2011/12/concept-of-data-mining.html" target="_blank"&gt;&lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;preprocessing&lt;/span&gt;&lt;/a&gt;. &lt;span class="hps"&gt;The question is&lt;/span&gt; &lt;span class="hps"&gt;why the data&lt;/span&gt; &lt;span class="hps"&gt;needs to be&lt;/span&gt; &lt;span class="hps"&gt;cleaned&lt;/span&gt; &lt;span class="hps"&gt;before it&lt;/span&gt; &lt;span class="hps"&gt;is processed&lt;/span&gt;?&lt;br /&gt; &lt;span class="hps"&gt;This happens&lt;/span&gt; &lt;span class="hps"&gt;because usually&lt;/span&gt; &lt;span class="hps"&gt;the data to be&lt;/span&gt; &lt;span class="hps"&gt;used&lt;/span&gt; &lt;span class="hps"&gt;has not been&lt;/span&gt; &lt;span class="hps"&gt;good&lt;/span&gt;, &lt;span class="hps"&gt;the cause&lt;/span&gt; &lt;span class="hps"&gt;include:&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;-&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Incomplete&lt;/span&gt;&lt;/b&gt; : &lt;span class="hps"&gt;lack of&lt;/span&gt; &lt;span class="hps"&gt;values ​​of&lt;/span&gt; &lt;span class="hps"&gt;certain&lt;/span&gt; &lt;span class="hps"&gt;attributes or&lt;/span&gt; &lt;span class="hps"&gt;other&lt;/span&gt; &lt;span class="hps"&gt;attributes&lt;/span&gt;.&lt;br /&gt; &lt;span class="hps"&gt;-&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Noisy &lt;/span&gt;&lt;/b&gt;: &lt;span class="hps"&gt;containing&lt;/span&gt; &lt;span class="hps"&gt;errors&lt;/span&gt; &lt;span class="hps"&gt;or&lt;/span&gt; &lt;span class="hps"&gt;outliers&lt;/span&gt; &lt;span class="hps"&gt;values&lt;/span&gt; &lt;span class="hps"&gt;​​that deviate&lt;/span&gt; &lt;span class="hps"&gt;from&lt;/span&gt; &lt;span class="hps"&gt;the expected&lt;/span&gt;.&lt;br /&gt; &lt;span class="hps"&gt;-&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Inconsisten&lt;/span&gt;&lt;/b&gt; : &lt;span class="hps"&gt;mismatch&lt;/span&gt; &lt;span class="hps"&gt;in the use of&lt;/span&gt; &lt;span class="hps"&gt;code&lt;/span&gt; &lt;span class="hps"&gt;or name&lt;/span&gt;.&lt;br /&gt; &lt;span class="hps"&gt;Here&lt;/span&gt; &lt;span class="hps"&gt;are&lt;/span&gt; &lt;span class="hps"&gt;good&lt;/span&gt; &lt;span class="hps"&gt;quality data&lt;/span&gt; &lt;span class="hps"&gt;was based on&lt;/span&gt; &lt;span class="hps"&gt;good decisions&lt;/span&gt; &lt;span class="hps"&gt;and&lt;/span&gt; &lt;span class="hps"&gt;data warehouse&lt;/span&gt; &lt;span class="hps"&gt;needs&lt;/span&gt; &lt;span class="hps"&gt;consistent&lt;/span&gt; &lt;span class="hps"&gt;integration of&lt;/span&gt; &lt;span class="hps"&gt;quality data&lt;/span&gt;.&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span class="hps"&gt;Some&lt;/span&gt; &lt;span class="hps"&gt;things to consider&lt;/span&gt; &lt;span class="hps"&gt;to&lt;/span&gt; &lt;span class="hps"&gt;get&lt;/span&gt; &lt;span class="hps"&gt;good data&lt;/span&gt; &lt;span class="hps"&gt;are:&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Accuracy&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Completeness&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Consistency&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Timeliness&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Value&lt;/span&gt; &lt;span class="hps"&gt;added&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;interpretability&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Accessibility&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;&lt;/span&gt;&lt;span class="hps"&gt;Contextual&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;Representational&lt;/span&gt;&lt;/span&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span class="hps"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;span class="" id="result_box" lang="en"&gt; &lt;b&gt;&lt;span class="hps"&gt;&lt;br /&gt;Techniques&lt;/span&gt; &lt;span class="hps"&gt;or&lt;/span&gt; &lt;span class="hps"&gt;methods used&lt;/span&gt; &lt;span class="hps"&gt;in&lt;/span&gt; &lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;preprocessing&lt;/span&gt;, &lt;span class="hps"&gt;including&lt;/span&gt;&lt;/b&gt; :&lt;br /&gt; &lt;span class="hps"&gt;•&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Data&lt;/span&gt; &lt;span class="hps"&gt;cleaning&lt;/span&gt;&lt;/b&gt;&lt;br /&gt; &lt;span class="hps"&gt;Eliminating&lt;/span&gt; &lt;span class="hps"&gt;data values&lt;/span&gt; &lt;span class="hps"&gt;​​are&lt;/span&gt; &lt;span class="hps"&gt;wrong&lt;/span&gt;, &lt;span class="hps"&gt;fix the mess&lt;/span&gt; &lt;span class="hps"&gt;of data&lt;/span&gt; &lt;span class="hps"&gt;and checking&lt;/span&gt; &lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;inconsistencies.&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;•&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Data&lt;/span&gt; &lt;span class="hps"&gt;integration&lt;/span&gt;&lt;/b&gt;&lt;br /&gt; &lt;span class="hps"&gt;Combining&lt;/span&gt; &lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;from&lt;/span&gt; &lt;span class="hps"&gt;multiple&lt;/span&gt; &lt;span class="hps"&gt;sources&lt;/span&gt; &lt;span class="hps atn"&gt;(&lt;/span&gt;databases, &lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;cubes&lt;/span&gt;, &lt;span class="hps"&gt;or files&lt;/span&gt;) into &lt;span class="hps"&gt;the appropriate data&lt;/span&gt; &lt;span class="hps"&gt;storage&lt;/span&gt;.&lt;br /&gt; &lt;span class="hps"&gt;•&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Data&lt;/span&gt; &lt;span class="hps"&gt;transformation&lt;/span&gt;&lt;/b&gt;&lt;br /&gt; &lt;span class="hps"&gt;Normalization&lt;/span&gt; &lt;span class="hps"&gt;and data collection&lt;/span&gt; &lt;span class="hps"&gt;so that it becomes&lt;/span&gt; &lt;span class="hps"&gt;the same.&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;•&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Data&lt;/span&gt; &lt;span class="hps"&gt;reduction&lt;/span&gt;&lt;/b&gt;&lt;br /&gt; &lt;span class="hps"&gt;Describe&lt;/span&gt; &lt;span class="hps"&gt;the data into&lt;/span&gt; &lt;span class="hps"&gt;a smaller form&lt;/span&gt; &lt;span class="hps"&gt;size&lt;/span&gt; &lt;span class="hps"&gt;but still&lt;/span&gt; &lt;span class="hps"&gt;yield&lt;/span&gt; &lt;span class="hps"&gt;the same analytical&lt;/span&gt; &lt;span class="hps"&gt;results&lt;/span&gt;.&lt;br /&gt; &lt;span class="hps"&gt;•&lt;/span&gt; &lt;b&gt;&lt;span class="hps"&gt;Data&lt;/span&gt; &lt;span class="hps"&gt;diskretisasi&lt;/span&gt;&lt;/b&gt;&lt;br /&gt; &lt;span class="hps"&gt;Part of&lt;/span&gt; &lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;reduction&lt;/span&gt; &lt;span class="hps"&gt;but it has&lt;/span&gt; &lt;span class="hps"&gt;its own&lt;/span&gt; &lt;span class="hps"&gt;significance&lt;/span&gt;, &lt;span class="hps"&gt;especially&lt;/span&gt; &lt;span class="hps"&gt;for&lt;/span&gt; &lt;span class="hps"&gt;numerical data&lt;/span&gt;.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH8SAsdGawfw8goztDOTT4OVskP3Xf0D6AKUuyt0tCRxm2zNdU4olg1ydgDDh-lUGFczdItur2Q06pyWW9Uo4LgVj7-40YOpabYo59lmHhZg2Oqjc7QEjG32s9nbKyI1hCRlKo5omFmzHa/s1600/proses-preprosesing.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="357" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH8SAsdGawfw8goztDOTT4OVskP3Xf0D6AKUuyt0tCRxm2zNdU4olg1ydgDDh-lUGFczdItur2Q06pyWW9Uo4LgVj7-40YOpabYo59lmHhZg2Oqjc7QEjG32s9nbKyI1hCRlKo5omFmzHa/s400/proses-preprosesing.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;</description><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" height="72" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH8SAsdGawfw8goztDOTT4OVskP3Xf0D6AKUuyt0tCRxm2zNdU4olg1ydgDDh-lUGFczdItur2Q06pyWW9Uo4LgVj7-40YOpabYo59lmHhZg2Oqjc7QEjG32s9nbKyI1hCRlKo5omFmzHa/s72-c/proses-preprosesing.jpg" width="72"/><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>The concept of Data Mining</title><link>http://freelearningcenter.blogspot.com/2011/12/concept-of-data-mining.html</link><category>Data Mining</category><category>Information Technology</category><author>noreply@blogger.com (Andy)</author><pubDate>Thu, 15 Dec 2011 22:19:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-1477314934739531152</guid><description>&lt;span class="" id="result_box" lang="en"&gt;&lt;span class="hps"&gt;What&lt;/span&gt; &lt;span class="hps"&gt;actually&lt;/span&gt; &lt;span class="hps"&gt;motivates&lt;/span&gt; &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;&lt;b&gt;&lt;span class="hps"&gt;Data mining&lt;/span&gt;&lt;/b&gt;&lt;/a&gt; &lt;span class="hps"&gt;and why&lt;/span&gt; &lt;span class="hps"&gt;data mining&lt;/span&gt; &lt;span class="hps"&gt;is so&lt;/span&gt; &lt;span class="hps"&gt;important ?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt; &lt;span class="hps"&gt;The main reason&lt;/span&gt; &lt;span class="hps"&gt;why&lt;/span&gt; &lt;span class="hps"&gt;data mining&lt;/span&gt; &lt;span class="hps"&gt;is very interesting&lt;/span&gt; &lt;span class="hps"&gt;information industry&lt;/span&gt; &lt;span class="hps"&gt;in&lt;/span&gt; &lt;span class="hps"&gt;recent years&lt;/span&gt; &lt;span class="hps"&gt;is&lt;/span&gt; &lt;span class="hps"&gt;due to&lt;/span&gt; &lt;span class="hps"&gt;the availability of&lt;/span&gt; &lt;span class="hps"&gt;large&lt;/span&gt; &lt;span class="hps"&gt;amounts of data&lt;/span&gt; &lt;span class="hps"&gt;and&lt;/span&gt; &lt;span class="hps"&gt;the&lt;/span&gt; &lt;span class="hps"&gt;magnitude of&lt;/span&gt; &lt;span class="hps"&gt;the need&lt;/span&gt; &lt;span class="hps"&gt;to&lt;/span&gt; &lt;span class="hps"&gt;transform&lt;/span&gt; &lt;span class="hps"&gt;data&lt;/span&gt; &lt;span class="hps"&gt;into&lt;/span&gt; &lt;span class="hps"&gt;useful&lt;/span&gt; &lt;span class="hps"&gt;information&lt;/span&gt; &lt;span class="hps"&gt;and&lt;/span&gt; &lt;span class="hps"&gt;knowledge&lt;/span&gt;&lt;span class=""&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt; &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;&lt;b&gt;&lt;span class="hps"&gt;Data mining&lt;/span&gt;&lt;/b&gt;&lt;/a&gt; &lt;span class="hps"&gt;is the activity of&lt;/span&gt; &lt;span class="hps"&gt;extracting&lt;/span&gt; &lt;span class="hps"&gt;or&lt;/span&gt; &lt;span class="hps"&gt;mining&lt;/span&gt; &lt;span class="hps"&gt;knowledge&lt;/span&gt; &lt;span class="hps"&gt;from data&lt;/span&gt; &lt;span class="hps"&gt;size&lt;/span&gt; &lt;span class="hps"&gt;/&lt;/span&gt; &lt;span class="hps"&gt;large numbers&lt;/span&gt;, &lt;span class="hps"&gt;this is&lt;/span&gt; &lt;span class="hps"&gt;information&lt;/span&gt; &lt;span class="hps"&gt;that will be&lt;/span&gt; &lt;span class="hps"&gt;very&lt;/span&gt; &lt;span class="hps"&gt;useful&lt;/span&gt; &lt;span class="hps"&gt;for&lt;/span&gt; &lt;span class="hps"&gt;development&lt;/span&gt;. &lt;span class="hps"&gt;Where&lt;/span&gt; &lt;span class="hps"&gt;the steps to&lt;/span&gt; &lt;span class="hps"&gt;perform&lt;/span&gt; &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining-and-web-mining.html" target="_blank"&gt;&lt;span class="hps"&gt;data mining&lt;/span&gt;&lt;/a&gt; &lt;span class="hps"&gt;is as&lt;/span&gt; &lt;span class="hps"&gt;follows &lt;/span&gt;&lt;span class=""&gt;:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZZ_bI9hPblZNHDsp9qhHPF3uk_Kqr-Glo5ApBhOgoJvLXAkVWMoVhjqQat85DvQAj3yfa4qEub1tev0Z27OILrnIPbcS6gYzwLRREgybWYtZfbTzORaEbB-V25Kdkp37FE4rXWl6JhgKg/s1600/step-datamining.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZZ_bI9hPblZNHDsp9qhHPF3uk_Kqr-Glo5ApBhOgoJvLXAkVWMoVhjqQat85DvQAj3yfa4qEub1tev0Z27OILrnIPbcS6gYzwLRREgybWYtZfbTzORaEbB-V25Kdkp37FE4rXWl6JhgKg/s400/step-datamining.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Data cleaning (untuk menghilangkan noise data yang tidak konsisten) Data integration (di mana sumber data yang terpecah dapat disatukan)"&gt;Data cleaning (to remove noise inconsistent data) Data integration (in which the divided data sources can be put together)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Data selection (di mana data yang relevan dengan tugas analisis dikembalikan ke dalam database)"&gt;Data selection (where data relevant to the task of analysis is returned to the database)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Data transformation (di mana data berubah atau bersatu menjadi bentuk yang tepat untuk menambang dengan ringkasan performa atau operasi agresi)"&gt;Data transformation (where the data is changed or united to form the 
right to mine with a summary of operating performance or aggression)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Data mining (proses esensial di mana metode yang intelejen digunakan untuk mengekstrak pola data)"&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data mining&lt;/a&gt; (an essential process in which the intelligence methods used to extract data patterns)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Pattern evolution (untuk mengidentifikasi pola yang benar-benar menarik yang mewakili pengetahuan berdasarkan atas beberapa tindakan yang menarik)"&gt;Pattern evolution (to identify the pattern that is really interesting 
that represents knowledge based on several measures of interest)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Knowledge presentation (di mana gambaran teknik visualisasi dan pengetahuan digunakan untuk memberikan pengetahuan yang telah ditambang kpada user)."&gt;Knowledge presentation (where the image and knowledge visualization 
techniques are used to provide the knowledge that has been mined kpada 
user).&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Arsitektur dari data mining yang khas memiliki beberapa komponen utama yaitu :"&gt;The architecture of a typical data mining has several main components, namely:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Database, data warehouse, atau tempat penyimpanan informasi lainnya."&gt;Database, data warehouse, or other information storage.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Server database atau data warehouse."&gt;Server database or data warehouse.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Knowledge base"&gt;Knowledge base&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Data mining engine."&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining-and-web-mining.html" target="_blank"&gt;Data mining engine&lt;/a&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Pattern evolution module."&gt;Pattern Evolution module.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Graphical user interface."&gt;Graphical user interface.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Ada beberapa jenis data dalam data mining yaitu :"&gt;There are several types of data in data mining are :&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Relation Database : Sebuah sistem database, atau disebut juga database management system (DBMS), mengandung sekumpulan data yang saling berhubungan, dikenal sebagai sebuah database, dan satu set program perangkat lunak untuk mengatur dan mengakses data tersebut."&gt;Relation &lt;a href="http://freelearningcenter.blogspot.com/2011/12/default-password-in-oracle.html" target="_blank"&gt;Database&lt;/a&gt;: A database system, also called a database management
 system (DBMS), containing a collection of related data, known as a 
database, and a set of software programs to manage and access data.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="- Data Warehouse : Sebuah data warehouse merupakan sebuah ruang penyimpaan informasi yang terkumpul dari beraneka macam sumber, disimpan dalam skema yang menyatu, dan biasanya terletak pada sebuah site."&gt;Data Warehouse: A data warehouse is a space penyimpaan information 
gathered from a variety of sources, stored in a unified scheme, and is 
usually located on a site.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Kemudian pola seperti apa yang dapat ditambang ?"&gt;Then what kind of pattern that can be mined ?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Kemudian pola seperti apa yang dapat ditambang ?"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span title="Kegunaan data mining adalah untuk menspesifikasikan pola yang harus ditemukan dalam tugas data mining."&gt;The usefulness of &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; is to specify a pattern to be found in the task of &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt;. &lt;/span&gt;&lt;span title="Secara umum tugas data mining dapat diklasifikasikan ke dalam dua kategori: deskriptif dan prediktif."&gt;In general, data mining tasks can be classified into two categories: descriptive and predictive. &lt;/span&gt;&lt;span title="Tugas menambang secara deskriptif adalah untuk mengklasifikasikan sifat umum suatu data di dalam database."&gt;Mine the descriptive task is to classify the general nature of the data in the database. &lt;/span&gt;&lt;span title="Tugas data mining secara prediktif adalah untuk mengambil kesimpulan terhadap data terakhir untuk membuat prediksi."&gt;In predictive data mining tasks is to take the conclusions of recent data to make predictions.&lt;/span&gt;&lt;span title="Konsep/Class Description"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Konsep/Class Description"&gt;&lt;b&gt;Concept / Class Description&lt;/b&gt;&lt;/span&gt;&lt;span title="Data dapat diasosiasikan dengan pembagian class atau konsep."&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Data dapat diasosiasikan dengan pembagian class atau konsep."&gt;Data can be associated with the division of classes or concepts. &lt;/span&gt;&lt;span title="Untuk contohnya, ditoko All Electronics, pembagian class untuk barang yang akan dijual termasuk komputer dan printer, dan konsep untuk konsumen adalah big Spenders dan budget Spender."&gt;For
 example, All Electronics stores, the division of classes for the goods 
to be sold including computers and printers, and concepts for the 
consumer is Spender Big Spenders and budget. &lt;/span&gt;&lt;span title="Hal tersebut sangat berguna untuk menggambarkan pembagian class secara individual dan konsep secara ringkas, laporan ringkas, dan juga pengaturan harga."&gt;This
 is very useful to describe the distribution of individual classes and 
concepts are concise, succinct reports, and also setting the price. &lt;/span&gt;&lt;span title="Deskripsi suatu class atau konsep seperti itu disebut class/concept descripition."&gt;Description of a class or concept as it is called class / concept descripition.&lt;/span&gt;&lt;span title="Association Analysis"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Association Analysis"&gt;&lt;b&gt;Association Analysis&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Association Analysis"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;span title="Association analysis adalah penemuan association rules yang menunjukkan nilai kondisi suatu attribute yang terjadi bersama-sama secara terus-menerus dalam memmberikan set data."&gt;Association
 analysis is the discovery of association rules that show the value of 
an attribute conditions that occur together constantly in memmberikan 
data sets. &lt;/span&gt;&lt;span title="Association analysis secara luas dipakai untuk market basket atau analisa data transaksi."&gt;Association analysis is widely used for market basket or transaction data analysis.&lt;/span&gt;&lt;span title="Klasifikasi dan Predikasi"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Klasifikasi dan Predikasi"&gt;&lt;b&gt;Classification and predication&lt;/b&gt;&lt;/span&gt;&lt;span title="Klasifikasi dan prediksi mungkin perlu diproses oleh analisis relevan, yang berusaha untuk mengidentifikasi atribut-atribut yang tidak ditambahkan pada proses klasifikasi dan prediksi."&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Klasifikasi dan prediksi mungkin perlu diproses oleh analisis relevan, yang berusaha untuk mengidentifikasi atribut-atribut yang tidak ditambahkan pada proses klasifikasi dan prediksi."&gt;Classification
 and prediction may need to be processed by the relevant analysis, which
 seeks to identify the attributes that are not added to the process of 
classification and prediction. &lt;/span&gt;&lt;span title="Atribut-atribut ini kemudian dapat di keluarkan."&gt;These attributes can then be issued.&lt;/span&gt;&lt;span title="Cluster Analysis"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Cluster Analysis"&gt;&lt;b&gt;Cluster Analysis&lt;/b&gt;&lt;/span&gt;&lt;span title="Tidak seperti klasifikasi dan prediksi, yang menganalisis objek data dengan kelas yang terlabeli, clustering menganalisis objek data tanpa mencari keterangan pada label kelas yang diketahui."&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Tidak seperti klasifikasi dan prediksi, yang menganalisis objek data dengan kelas yang terlabeli, clustering menganalisis objek data tanpa mencari keterangan pada label kelas yang diketahui."&gt;Unlike
 the classification and prediction, which analyze the data object with a
 class that terlabeli, clustering analyzes data objects without seeking 
information on a known class label. &lt;/span&gt;&lt;span title="Pada umumnya, label kelas tidak ditampilkan di dalam latihan data simply, karena mereka tidak tahu bagaimana memulainya."&gt;In general, the class labels are not displayed in the training data simply because they do not know how to begin. &lt;/span&gt;&lt;span title="Clustering dapat digunakan untuk menghasilkan label-label."&gt;Clustering can be used to produce labels.&lt;/span&gt;&lt;span title="Outlier Analysis"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Outlier Analysis"&gt;&lt;b&gt;Outliers Analysis&lt;/b&gt;&lt;/span&gt;&lt;span title="§ Outlier dapat dideteksi menggunakan test yang bersifat statistik yang mengambil sebuah distribusi atau probabilitas model untuk data, atau menggunakan langkah-langkah jarak jauh di mana objek yang penting jauh dari cluster lainnya dianggap outlier."&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="§ Outlier dapat dideteksi menggunakan test yang bersifat statistik yang mengambil sebuah distribusi atau probabilitas model untuk data, atau menggunakan langkah-langkah jarak jauh di mana objek yang penting jauh dari cluster lainnya dianggap outlier."&gt;Outliers can be detected using a test statistic that is taking a 
distribution or a probability model for the data, or using measures 
distance at which objects are important away from other clusters are 
considered outliers.&lt;/span&gt;&lt;span title="§ Sebuah database mungkin mengandung objek data yang tidak mengikuti tingkah laku yang umum atau model dari data."&gt;A database may contain data objects that do not follow the general behavior or model of the data. &lt;/span&gt;&lt;span title="data ini disebut outlier."&gt;These data are called outliers.&lt;/span&gt;&lt;span title="Evolution Analysis"&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Evolution Analysis"&gt;&lt;b&gt;Evolution Analysis&lt;/b&gt;&lt;/span&gt;&lt;span title="Data analisa evolusi menggambarkan ketetapan model atau kecenderungan objek yang memiliki kebiasaan berubah setiap waktu."&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Data analisa evolusi menggambarkan ketetapan model atau kecenderungan objek yang memiliki kebiasaan berubah setiap waktu."&gt;The
 data analysis illustrates the evolution of model statutes or the 
tendency of objects that have a habit of changing all the time. &lt;/span&gt;&lt;span title="Meskipun ini mungkin termasuk karakteristik, diskriminasi, asosiasi, klasifikasi, atau clustering data berdasarkan waktu, kelebihan yang jelas seperti analisa termasuk analisa data time-series, urutan atau pencocockkan pola secara berkala, dan kesamaan berdasarkan analisa data."&gt;Although
 this might include the characteristics, discrimination, association, 
classification, or clustering of data based on time, clear advantages 
such as data analysis including time-series analysis, sequence or 
pattern pencocockkan periodically, and similarity based on data 
analysis.&lt;/span&gt;&lt;span title="Untuk melakukan data mining yang baik ada beberapa persoalan utama yaitu menyangkut metodologi mining dan interaksi user, performance dan perbedaan tipe database."&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Untuk melakukan data mining yang baik ada beberapa persoalan utama yaitu menyangkut metodologi mining dan interaksi user, performance dan perbedaan tipe database."&gt;To
 perform &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; is good there are several main issues concerning 
mining methodology and user interaction, performance and different types
 of databases. &lt;/span&gt;&lt;span title="Hal inilah yang sering kali dihadapi disaat kita ingin melakukan data mining."&gt;This is often encountered when we want to do data mining.&lt;/span&gt;&lt;/span&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span class=""&gt; &lt;/span&gt;&lt;/span&gt;</description><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" height="72" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZZ_bI9hPblZNHDsp9qhHPF3uk_Kqr-Glo5ApBhOgoJvLXAkVWMoVhjqQat85DvQAj3yfa4qEub1tev0Z27OILrnIPbcS6gYzwLRREgybWYtZfbTzORaEbB-V25Kdkp37FE4rXWl6JhgKg/s72-c/step-datamining.jpg" width="72"/><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><title>Application of Data Mining</title><link>http://freelearningcenter.blogspot.com/2011/12/application-of-data-mining.html</link><category>Data Mining</category><category>Information Technology</category><author>noreply@blogger.com (Andy)</author><pubDate>Wed, 14 Dec 2011 19:30:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-6555928212653431839</guid><description>&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Sebagai cabang ilmu baru di bidang komputer cukup banyak penerapan yang dapat dilakukann oleh Data Mining."&gt;As a new branch of science in the areas of computers quite a lot of applications that can do by &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data Mining&lt;/a&gt;. &lt;/span&gt;&lt;span title="Apalagi ditunjang kekayaan dan keanekaragaman berbagai bidang ilmu (artificial intelligence, database, statistik, pemodelan matematika, pengolahan citra dsb.) membuat penerapan data mining menjadi makin luas."&gt;Moreover,
 supported by the richness and diversity of the various fields of 
science (artificial intelligence, databases, statistics, mathematical 
modeling, image processing, etc..) Makes the application of data mining 
becomes more widespread. &lt;/span&gt;&lt;span title="Di bidang apa saja penerapan data mining dapat dilakukan?"&gt;In any field application of data mining can be done? &lt;/span&gt;&lt;span title="Artikel singkat ini berusaha memberikan jawabannya."&gt;This brief article attempts to answer.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Artikel singkat ini berusaha memberikan jawabannya."&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;u&gt;&lt;span title="Analisa Pasar dan Manajemen"&gt;&lt;span style="font-size: small;"&gt;&lt;b&gt;Market Analysis and Management&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="Untuk analisa pasar, banyak sekali sumber data yang dapat digunakan seperti transaksi kartu kredit, kartu anggota club tertentu, kupon diskon, keluhan pembeli, ditambah dengan studi tentang gaya hidup publik."&gt;For
 market analysis, many data sources that can be used like a credit card 
transaction, certain club membership cards, discount coupons, buyer 
complaints, coupled with a study of public lifestyle.&lt;/span&gt;&lt;span title="Beberapa solusi yang bisa diselesaikan dengan data mining diantaranya:"&gt;Some solutions that can be solved by data mining include:&lt;/span&gt;&lt;span title="• Menembak target pasar"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="• Menembak target pasar"&gt;• &lt;b&gt;Shoot the target market&lt;/b&gt;&lt;/span&gt;&lt;span title="Data mining dapat melakukan pengelompokan (clustering) dari model-model pembeli dan melakukan klasifikasi terhadap setiap pembeli sesuai dengan karakteristik yang diinginkan seperti kesukaan yang sama, tingkat penghasilan yang sama, kebiasaan membeli dan karakteristik lainnya."&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="Data mining dapat melakukan pengelompokan (clustering) dari model-model pembeli dan melakukan klasifikasi terhadap setiap pembeli sesuai dengan karakteristik yang diinginkan seperti kesukaan yang sama, tingkat penghasilan yang sama, kebiasaan membeli dan karakteristik lainnya."&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data mining&lt;/a&gt; can perform grouping (clustering) of these models the buyer and 
the classification of each purchaser in accordance with the desired 
characteristics such as liking the same, the same income level, buying 
habits and other characteristics.&lt;/span&gt;&lt;span title="• Melihat pola beli pemakai dari waktu ke waktu"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="• Melihat pola beli pemakai dari waktu ke waktu"&gt;• &lt;b&gt;Seeing the purchasing patterns of users from time to time&lt;/b&gt;&lt;/span&gt;&lt;span title="Data mining dapat digunakan untuk melihat pola beli seseorang dari waktu ke waktu."&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="Data mining dapat digunakan untuk melihat pola beli seseorang dari waktu ke waktu."&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data mining&lt;/a&gt; can be used to see someone purchasing patterns over time. &lt;/span&gt;&lt;span title="Sebagai contoh, ketika seseorang menikah bisa saja dia kemudian memutuskan pindah dari single account ke joint account (rekening bersama) dan kemudian setelah itu pola beli-nya berbeda dengan ketika dia masih bujangan."&gt;For
 example, when someone got married he might then decide to move from the
 single account to joint account (joint account) and then after that its
 purchasing pattern is different from when he was a bachelor.&lt;/span&gt;&lt;span title="• Cross-Market Analysis"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="• Cross-Market Analysis"&gt;• &lt;b&gt;Cross-Market Analysis&lt;/b&gt;&lt;/span&gt;&lt;span title="Kita dapat memanfaatkan data mining untuk melihat hubungan antara penjualan satu produk dengan produk lainnya."&gt;&lt;br /&gt;We can use &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; to look at the relationship between sales of one product with another product. &lt;/span&gt;&lt;span title="Berikut ini saya sajikan beberapa contoh:"&gt;Below I present some examples:&lt;/span&gt;&lt;span title="o Cari pola penjualan Coca Cola sedemikian rupa sehingga kita dapat mengetahui barang apa sajakah yang harus kita sediakan untuk meningkatkan penjualan Coca Cola?"&gt;o Search for Coca Cola's sales patterns so that we can know what items are that we must provide to increase sales of Coca Cola?&lt;/span&gt;&lt;span title="o Cari pola penjualan IndoMie sedemikian rupa sehingga kita dapat mengetahui barang apa saja yang juga dibeli oleh pembeli IndoMie."&gt;o Find Indomie sales patterns so that we can know what items are also purchased by the buyer Indomie. &lt;/span&gt;&lt;span title="Dengan demikian kita bisa mengetahui dampak jika kita tidak lagi menjual IndoMie."&gt;Thus we can determine the impact if we no longer sell Indomie.&lt;/span&gt;&lt;span title="o Cari pola penjualan"&gt;o Find sales patterns&lt;/span&gt;&lt;span title="• Profil Customer"&gt;&lt;br /&gt;• &lt;b&gt;Customer Profile&lt;/b&gt;&lt;/span&gt;&lt;span title="Data mining dapat membantu Anda untuk melihat profil customer/pembeli/nasabah sehingga kita dapat mengetahui kelompok customer tertentu suka membeli produk apa saja."&gt;&lt;br /&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data mining &lt;/a&gt;can help you to see the profile of the customer / buyer / 
customer so that we can know certain groups of customers like to buy any
 product.&lt;/span&gt;&lt;span title="• Identifikasi Kebutuhan Customer"&gt;• Identification of Customer Needs&lt;/span&gt;&lt;span title="Anda dapat mengidentifikasi produk-produk apa saja yang terbaik untuk tiap kelompok customer dan menyusun faktor-faktor apa saja yang kira-kira dapat menarik customer baru untuk bergabung/membeli."&gt;You
 can identify what products are best for each customer group and arrange
 any factors which may attract some new customers to join / buy.&lt;/span&gt;&lt;span title="• Menilai Loyalitas Customer"&gt;&lt;br /&gt;• &lt;b&gt;Assessing Customer Loyalty&lt;/b&gt;&lt;/span&gt;&lt;span title="VISA International Spanyol menggunakan data mining untuk melihat kesuksesan program-program customer loyalty mereka."&gt;&lt;br /&gt;VISA International Spain using &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; to see the success of customer loyalty programs them. &lt;/span&gt;&lt;span title="Anda bisa lihat di www.visa.es/ingles/info/300300.html"&gt;You can see in www.visa.es/ingles/info/300300.html&lt;/span&gt;&lt;span title="• Informasi Summary"&gt;&lt;br /&gt;• &lt;b&gt;Information Summary&lt;/b&gt;&lt;/span&gt;&lt;span title="Anda juga dapat memanfaatkan data mining untuk membuat laporan summary yang bersifat multi-dimensi dan dilengkapi dengan informasi statistik lainnya."&gt;&lt;br /&gt;You
 also can use data mining to create summary reports that are 
multi-dimensional and equipped with other statistical information.&lt;/span&gt;&lt;br /&gt;
&lt;span title="Anda juga dapat memanfaatkan data mining untuk membuat laporan summary yang bersifat multi-dimensi dan dilengkapi dengan informasi statistik lainnya."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Analisa Perusahaan dan Manajemen Resiko"&gt;&lt;b&gt;Corporate Analysis and Risk Management&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="• Perencanaan Keuangan dan Evaluasi Aset"&gt;• &lt;b&gt;Financial Planning and Asset Evaluation&lt;/b&gt;&lt;/span&gt;&lt;span title="Data Mining dapat membantu Anda untuk melakukan analisis dan prediksi cash flow serta melakukan contingent claim analysis untuk mengevaluasi aset."&gt;&lt;br /&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data Mining&lt;/a&gt; can help you to do the analysis and prediction of cash flow and make contingent claim analysis to evaluate assets. &lt;/span&gt;&lt;span title="Selain itu Anda juga dapat menggunakannya untuk analisis trend."&gt;In addition you can also use it for trend analysis.&lt;/span&gt;&lt;span title="• Perencanaan Sumber Daya (Resource Planning)"&gt;&lt;br /&gt;• &lt;b&gt;Resource Planning&lt;/b&gt;&lt;/span&gt;&lt;span title="Dengan melihat informasi ringkas (summary) serta pola pembelanjaan dan pemasukan dari masing-masing resource, Anda dapat memanfaatkannya untuk melakukan resource planning."&gt;&lt;br /&gt;By
 looking at the summary information (summary) and the pattern of 
expenditure and income of each resource, you can use it to perform 
resource planning.&lt;/span&gt;&lt;span title="• Persaingan (Competition)"&gt;&lt;br /&gt;• &lt;b&gt;Competition &lt;/b&gt;&lt;/span&gt;&lt;span title="o Sekarang ini banyak perusahaan yang berupaya untuk dapat melakukan competitive intelligence."&gt;&lt;br /&gt;o Today many companies are trying to be able to do competitive intelligence. &lt;/span&gt;&lt;span title="Data Mining dapat membantu Anda untuk memonitor pesaing-pesaing Anda dan melihat market direction mereka."&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;Data Mining&lt;/a&gt; can help you to monitor your competitors and see their market direction.&lt;/span&gt;&lt;span title="o Anda juga dapat melakukan pengelompokan customer Anda dan memberikan variasi harga/layanan/bonus untuk masing-masing grup."&gt;&lt;br /&gt;o You can also do grouping your customers and diversify the price / service / bonus for each group.&lt;/span&gt;&lt;span title="o Menyusun strategi penetapan harga di pasar yang sangat kompetitif."&gt;&lt;br /&gt;o Develop pricing strategies in highly competitive markets. &lt;/span&gt;&lt;span title="Hal ini diterapkan oleh perusahaan minyak REPSOL di Spanyol dalam menetapkan harga jual gas di pasaran."&gt;It is applied by the oil company Repsol in Spain in the sale price of gas in the market.&lt;/span&gt;&lt;br /&gt;
&lt;span title="Hal ini diterapkan oleh perusahaan minyak REPSOL di Spanyol dalam menetapkan harga jual gas di pasaran."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Telekomunikasi"&gt;&lt;b&gt;Telecommunication&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="Sebuah perusahaan telekomunikasi menerapkan data mining untuk melihat dari jutaan transaksi yang masuk, transaksi mana sajakah yang masih harus ditangani secara manual (dilayani oleh orang)."&gt;A
 telecommunications companies to apply &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; to look at the 
millions of transactions are entered, the transaction which are still to
 be handled manually (served by people). &lt;/span&gt;&lt;span title="Tujuannya tidak lain adalah untuk menambah layanan otomatis khusus untuk transaksi-transaksi yang masih dilayani secara manual."&gt;The aim is none other than to add an automatic service for transactions that are still served by hand. &lt;/span&gt;&lt;span title="Dengan demikian jumlah operator penerima transaksi manual tetap bisa ditekan minimal."&gt;Thus the number of receiver operator manual transaction can still be suppressed minimum.&lt;/span&gt;&lt;br /&gt;
&lt;span title="Dengan demikian jumlah operator penerima transaksi manual tetap bisa ditekan minimal."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Keuangan"&gt;&lt;b&gt;Finance&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="Financial Crimes Enforcement Network di Amerika Serikat baru-baru ini menggunakan data mining untuk me-nambang trilyunan dari berbagai subyek seperti property, rekening bank dan transaksi keuangan lainnya untuk mendeteksi transaksi-transaksi keuangan yang mencurigakan (seperti money laundry)."&gt;Financial
 Crimes Enforcement Network in the United States recently used data 
mining &lt;/span&gt;&lt;span class="" id="result_box" lang="en"&gt;&lt;span class="hps"&gt;to mine&lt;/span&gt;&lt;/span&gt;&lt;span title="Financial Crimes Enforcement Network di Amerika Serikat baru-baru ini menggunakan data mining untuk me-nambang trilyunan dari berbagai subyek seperti property, rekening bank dan transaksi keuangan lainnya untuk mendeteksi transaksi-transaksi keuangan yang mencurigakan (seperti money laundry)."&gt; trillions of various subjects such as property, 
bank accounts and other financial transactions-transactions to detect 
suspicious financial transactions (such as money laundry). &lt;/span&gt;&lt;span title="Mereka menyatakan bahwa hal tersebut akan susah dilakukan jika menggunakan analisis standar."&gt;They claimed that it would be hard to do if using a standard analysis. &lt;/span&gt;&lt;span title="Anda bisa lihat di www.senate.gov/~appropriations/treasury/testimony/sloan.htm."&gt;You can see in www.senate.gov/ ~ appropriations / treasury / testimony / sloan.htm. &lt;/span&gt;&lt;span title="Mungkin sudah saatnya juga Badan Pemeriksa Keuangan Republik Indonesia menggunakan teknologi ini untuk mendeteksi aliran dana BLBI."&gt;Perhaps
 it is time also the Supreme Audit Board of the Republic of Indonesia 
uses this technology to detect the flow of funds BLBI.&lt;/span&gt;&lt;br /&gt;
&lt;span title="Mungkin sudah saatnya juga Badan Pemeriksa Keuangan Republik Indonesia menggunakan teknologi ini untuk mendeteksi aliran dana BLBI."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Asuransi"&gt;&lt;b&gt;Insurance&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="Australian Health Insurance Commision menggunakan data mining untuk mengidentifikasi layanan kesehatan yang sebenarnya tidak perlu tetapi tetap dilakukan oleh peserta asuransi."&gt;Australian
 Health Insurance Commission uses &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; to identify the health 
services that are not necessary but is still being done by the 
participants of insurance. &lt;/span&gt;&lt;span title="Hasilnya?"&gt;The result? &lt;/span&gt;&lt;span title="Mereka berhasil menghemat satu juta dollar per tahunnya."&gt;They managed to save one million dollars per year. &lt;/span&gt;&lt;span title="Anda bisa lihat di www.informationtimes.com.au/data-sum.htm."&gt;You can see in www.informationtimes.com.au / data-sum.htm. &lt;/span&gt;&lt;span title="Tentu saja ini tidak hanya bisa diterapkan untuk asuransi kesehatan, tetapi juga untuk berbagai jenis asuransi lainnya."&gt;Of course this can not only be applied to health insurance, but also for various other types of insurance.&lt;/span&gt;&lt;br /&gt;
&lt;span title="Tentu saja ini tidak hanya bisa diterapkan untuk asuransi kesehatan, tetapi juga untuk berbagai jenis asuransi lainnya."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Olah Raga"&gt;&lt;b&gt;Sports&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="IBM Advanced Scout menggunakan data mining untuk menganalisis statistik permainan NBA (jumlah shots blocked, assists dan fouls) dalam rangka mencapai keunggulan bersaing (competitive advantage) untuk tim New York Knicks dan Miami Heat."&gt;IBM
 Advanced Scout uses &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; to analyze the NBA game statistics 
(number of shots blocked, assists, and fouls) in order to achieve 
competitive advantage (competitive advantage) for the team the New York 
Knicks and Miami Heat.&lt;/span&gt;&lt;br /&gt;
&lt;span title="IBM Advanced Scout menggunakan data mining untuk menganalisis statistik permainan NBA (jumlah shots blocked, assists dan fouls) dalam rangka mencapai keunggulan bersaing (competitive advantage) untuk tim New York Knicks dan Miami Heat."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Astronomi"&gt;&lt;b&gt;Astronomy&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="Jet Propulsion Laboratory (JPL) di Pasadena, California dan Palomar Observatory berhasil menemukan 22 quasar dengan bantuan data mining."&gt;Jet
 Propulsion Laboratory (JPL) in Pasadena, California and Palomar 
Observatory discovered 22 quasars with the help of &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt;. &lt;/span&gt;&lt;span title="Hal ini merupakan salah satu kesuksesan penerapan data mining di bidang astronomi dan ilmu ruang angkasa."&gt;This is one of the successful application of data mining in astronomy and space science. &lt;/span&gt;&lt;span title="Anda bisa lihat di www-aig.jpl.nasa.gov/public/mls/news/SKICAT-PR12-95.html."&gt;You can see in www-aig.jpl.nasa.gov/public/mls/news/SKICAT-PR12-95.html.&lt;/span&gt;&lt;br /&gt;
&lt;span title="Anda bisa lihat di www-aig.jpl.nasa.gov/public/mls/news/SKICAT-PR12-95.html."&gt;&lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span title="Internet Web Surf-Aid"&gt;&lt;b&gt;Internet Web Surf-Aid&lt;/b&gt;&lt;/span&gt;&lt;/u&gt;&lt;br /&gt;
&lt;span title="IBM Surf-Aid menggunakan algoritma data mining untuk mendata akses halaman Web khususnya yang berkaitan dengan pemasaran guna melihat prilaku dan minat customer serta melihat ke-efektif-an pemasaran melalui Web."&gt;IBM
 Surf-Aid uses data mining algorithms to data access Web pages 
specifically related to marketing in order to see the behavior and 
customer interest as well as looking into-an-effective marketing via the
 Web.&lt;/span&gt;&lt;br /&gt;
&lt;span title="IBM Surf-Aid menggunakan algoritma data mining untuk mendata akses halaman Web khususnya yang berkaitan dengan pemasaran guna melihat prilaku dan minat customer serta melihat ke-efektif-an pemasaran melalui Web."&gt;&lt;br /&gt;&lt;/span&gt;&lt;span title="Dengan melihat beberapa aplikasi yang telah disebutkan di atas, terlihat sekali potensi besar dari penerapan Data Mining di berbagai bidang."&gt;By
 looking at some of the applications mentioned above, look at all the 
great potential of applying data mining in various fields. &lt;/span&gt;&lt;span title="Bahkan beberapa pihak berani menyatakan bahwa Data Mining merupakan salah satu aktifitas di bidang perangkat lunak yang dapat memberikan ROI (return on investment) yang tinggi."&gt;Even
 some of the bold claim that &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; is one of the activities in the
 field of software that can provide ROI (return on investment) is high. &lt;/span&gt;&lt;span title="Namun demikian, perlu diingat bahwa Data Mining hanya melihat keteraturan atau pola dari sejarah, tetapi tetap saja sejarah tidak sama dengan masa datang."&gt;However,
 keep in mind that the Data Mining only see the regularity or pattern of
 history, but still not the same history with the future. &lt;/span&gt;&lt;span title="Contoh: jika orang terlalu banyak minum Coca Cola bukan berarti dia pasti akan kegemukan, jika orang terlalu banyak merokok bukan berarti dia pasti akan kena kanker paru-paru atau mati muda."&gt;Example:
 if people drink too much Coca Cola does not mean he'll be overweight, 
if people are too much smoke does not mean he's definitely going to get 
lung cancer or die young. &lt;/span&gt;&lt;span title="Bagaimanapun juga data mining tetaplah hanya alat bantu yang dapat membantu manusia untuk melihat pola, menganalisis trend dsb."&gt;However data mining remains the only tool that can help humans to see patterns, analyze trends and so on. &lt;/span&gt;&lt;span title="dalam rangka mempercepat pembuatan keputusan."&gt;in order to speed up decision making.&lt;/span&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>Data Mining and Web Mining</title><link>http://freelearningcenter.blogspot.com/2011/12/data-mining-and-web-mining.html</link><category>Data Mining</category><category>Information Technology</category><author>noreply@blogger.com (Andy)</author><pubDate>Wed, 14 Dec 2011 19:02:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-5277171930597402829</guid><description>&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Data mining (DM) yang juga dikenal sebagai Knowledge Discovery (Frawley et al., 1992) , merupakan salah satu bidang yang berkembang pesat karena besarnya kebutuhan akan nilai tambah dari database skala besar yang makin banyak terakumulasi sejalan dengan pertumbuhan teknologi informasi."&gt;&lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;&lt;b&gt;Data mining&lt;/b&gt;&lt;/a&gt; (DM), also known as Knowledge Discovery (Frawley et al., 1992), 
is one of the rapidly growing field due to the large demand for 
value-added of large-scale database that accumulates more in line with 
the growth of information technology. &lt;/span&gt;&lt;span title="Secara umum, data mining dapat didefinisikan sebagai suatu rangkaian proses untuk menggali nilai tambah berupa ilmu pengetahuan yang selama ini tidak diketahui secara manual dari suatu kumpulan data (Pramudiono, 2003)."&gt;In
 general, data mining can be defined as a series of processes to explore
 the added value of science, which is not known manually from a data set
 (Pramudiono, 2003).&lt;/span&gt;&lt;span title="Web mining merupakan penerapan teknik data mining terhadap web dengan tujuan untuk memperoleh pengetahuan dan informasi lebih dari dalam web."&gt;Web mining is the application of &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; techniques to the web in order to acquire knowledge and information over the web. &lt;/span&gt;&lt;span title="Web mining dapat dikategorikan ke dalam tiga ruang lingkup yang berbeda, yaitu web content mining, web structure mining dan web usage mining (Srivastava et al., 2000)."&gt;Web
 mining can be categorized into three different scope, namely Web 
content mining, web structure mining and web usage mining (Srivastava et
 al., 2000).&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span title="Association Rules dan Algoritma Apriori"&gt;&lt;b&gt;Association Rules and Apriori Algorithm&lt;/b&gt;&lt;/span&gt;&lt;span title="Association rules merupakan salah satu teknik data mining yang berfungsi untuk menemukan asosiasi antar variabel, korelasi atau suatu struktur diantara item atau objek-objek didalam database transaksi, database relasional, maupun pada penyimpanan informasi lainnya."&gt;Association
 rules are one of the functions of &lt;a href="http://freelearningcenter.blogspot.com/2011/12/data-mining.html" target="_blank"&gt;data mining&lt;/a&gt; techniques to discover 
associations among variables, or a structure correlations between items 
or objects in transaction databases, relational databases, as well as on
 other information storage.&lt;/span&gt;&lt;span title="Sebagai ilustrasi dalam analisis weblog dari association rules adalah sebagai berikut, pola yang mungkin adalah “jika seseorang mengunjungi website CNN, terdapat kemungkinan sebesar 60% orang tersebut mengunjungi website Detik pada bulan yang sama.” Pada ilustrasi tersebut, pola yang ditemukan berpotensi menghasilkan potongan informasi"&gt;As
 illustrated in the analysis of the weblog of association rules is as 
follows, a pattern that might be "if someone visits the CNN website, 
there is a probability of 60% of people are visiting the website Seconds
 in the same month." In the illustration, the pattern found potentially 
generate pieces of information &lt;/span&gt;&lt;span title="yang menarik dan dibutuhkan oleh perusahaan yang terkait."&gt;interesting and required by the related companies.&lt;/span&gt;&lt;span title="Proses di dalam teknik assocation rules adalah mencari aturan-aturan yang memenuhi minimum support dan confidence."&gt;Processes in engineering assocation rules is to find rules that satisfy minimum support and confidence. &lt;/span&gt;&lt;span title="Algoritma yang pertama kali digunakan dalam teknik association rules dan yang paling banyak digunakan adalah algoritma apriori (Agrawal &amp;amp; Srikant, 1994)."&gt;The
 algorithm was first used in the technique of association rules and the 
most widely used is the a priori algorithm (Agrawal &amp;amp; Srikant, 
1994).&lt;/span&gt;&lt;br /&gt;
&lt;span title="Algoritma yang pertama kali digunakan dalam teknik association rules dan yang paling banyak digunakan adalah algoritma apriori (Agrawal &amp;amp; Srikant, 1994)."&gt;&lt;br /&gt;&lt;/span&gt;&lt;span title="Web Crawler"&gt;&lt;b&gt;Web Crawler&lt;/b&gt;&lt;/span&gt;&lt;span title="Web crawler (yang juga dikenal dengan web spider atau web robot) adalah suatu program atau script otomatis yang menjelajahi WWW dengan menggunakan sebuah metode atau cara yang otomatis."&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="Web crawler (yang juga dikenal dengan web spider atau web robot) adalah suatu program atau script otomatis yang menjelajahi WWW dengan menggunakan sebuah metode atau cara yang otomatis."&gt;Web
 crawler (also known as web spider or web robot) is a program or 
automated script that explore the WWW by using a method or an automated 
way. &lt;/span&gt;&lt;span title="Nama-nama yang jarang digunakan pada sebuah web crawler adalah ants, automatic indexers, bots, worms (Kobayashi &amp;amp; Takeda, 2000)."&gt;The names that are rarely used on a web crawler are ants, automatic indexers, bots, worms (Kobayashi &amp;amp; Takeda, 2000).&lt;/span&gt;&lt;br /&gt;
&lt;span title="Nama-nama yang jarang digunakan pada sebuah web crawler adalah ants, automatic indexers, bots, worms (Kobayashi &amp;amp; Takeda, 2000)."&gt;&lt;br /&gt;&lt;/span&gt;&lt;span title="Extended Log File Format"&gt;&lt;b&gt;Extended Log File Format&lt;/b&gt;&lt;/span&gt;&lt;span title="Extended Log Format dirancang untuk memenuhi beberapa kebutuhan di bawah ini (Baker &amp;amp; Behlendorf, 1996):"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span title="Extended Log Format dirancang untuk memenuhi beberapa kebutuhan di bawah ini (Baker &amp;amp; Behlendorf, 1996):"&gt;Extended Log Format designed to meet several needs below (Baker &amp;amp; Behlendorf, 1996):&lt;/span&gt;&lt;span title="* Memperbolehkan kontrol pada data yang direkam."&gt;* Allow control on the data recorded.&lt;/span&gt;&lt;span title="* Memenuhi kebutuhan proxy, client dan server dalam format yang umum."&gt;&lt;br /&gt;* Meet the needs of the proxy, the client and server in a common format.&lt;/span&gt;&lt;span title="* Menyediakan penanganan yang sempurna akan masalah penghilangan karakter."&gt;&lt;br /&gt;* Provides a perfect handling of character removal problem.&lt;/span&gt;&lt;span title="* Memperbolehkan dalam pertukaran demografis data."&gt;&lt;br /&gt;* Allow the exchange of demographic data. &lt;/span&gt;&lt;span title="* Memperbolehkan dalam menyajikan rekapitulasi data."&gt;&lt;br /&gt;* Allow the recapitulation presents the data.&lt;/span&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></item><item><title>Data Mining</title><link>http://freelearningcenter.blogspot.com/2011/12/data-mining.html</link><category>Data Mining</category><category>Information Technology</category><author>noreply@blogger.com (Andy)</author><pubDate>Tue, 13 Dec 2011 19:35:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-7438676895158113438</guid><description>&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Data Mining memang salah satu cabang ilmu komputer yang relatif baru."&gt;&lt;a href="http://en.wikipedia.org/wiki/Data_mining" target="_blank"&gt;&lt;b&gt;Data Mining&lt;/b&gt;&lt;/a&gt; is one branch of computer science is relatively new. &lt;/span&gt;&lt;span title="Dan sampai sekarang orang masih memperdebatkan untuk menempatkan data mining di bidang ilmu mana, karena data mining menyangkut database, kecerdasan buatan (artificial intelligence), statistik, dsb."&gt;And
 until now people are still debating to put data mining in the area of 
​​science which, because of &lt;a href="http://en.wikipedia.org/wiki/Data_mining" target="_blank"&gt;data mining&lt;/a&gt; involving databases, artificial 
intelligence (artificial intelligence), statistics, etc.. &lt;/span&gt;&lt;span title="Ada pihak yang berpendapat bahwa data mining tidak lebih dari machine learning atau analisa statistik yang berjalan di atas database."&gt;There
 are those who argue that data mining is nothing more than machine 
learning or statistical analysis that runs on the database. &lt;/span&gt;&lt;span title="Namun pihak lain berpendapat bahwa database berperanan penting di data mining karena data mining mengakses data yang ukurannya besar (bisa sampai terabyte) dan disini terlihat peran penting database terutama dalam optimisasi query-nya."&gt;Yet
 others argue that the database is an important role in data mining 
because &lt;a href="http://en.wikipedia.org/wiki/Data_mining" target="_blank"&gt;data mining&lt;/a&gt; to access data whose size is large (up to terabytes)
 and it seemed particularly important role in database query 
optimization it.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span title="Lalu apakah data mining itu?"&gt;So whether it is &lt;a href="http://en.wikipedia.org/wiki/Data_mining" target="_blank"&gt;data mining&lt;/a&gt;? &lt;/span&gt;&lt;span title="Apakah memang berhubungan erat dengan dunia pertambangan…."&gt;Does it relate closely to the world of mining .... &lt;/span&gt;&lt;span title="tambang emas, tambang timah, dsb."&gt;gold mines, tin mines, etc.. &lt;/span&gt;&lt;span title="Definisi sederhana dari data mining adalah ekstraksi informasi atau pola yang penting atau menarik dari data yang ada di database yang besar."&gt;A
 simple definition of data mining is the extraction of information or 
important or interesting patterns from existing data in large databases.
 &lt;/span&gt;&lt;span title="Dalam jurnal ilmiah, data mining juga dikenal dengan nama Knowledge Discovery in Databases (KDD)."&gt;In scientific journals, data mining is also known as Knowledge Discovery in Databases (KDD).&lt;/span&gt;&lt;span title="Kehadiran data mining dilatar belakangi dengan problema data explosion yang dialami akhir-akhir ini dimana banyak organisasi telah mengumpulkan data sekian tahun lamanya (data pembelian, data penjualan, data nasabah, data transaksi dsb.)."&gt;Attendance
 data mining against the background with the data explosion problem 
experienced lately where many organizations have collected so many years
 of data (purchasing data, sales data, customer data, transaction data, 
etc..). &lt;/span&gt;&lt;span title="Hampir semua data tersebut dimasukkan dengan menggunakan aplikasi komputer yang digunakan untuk menangani transaksi sehari-hari yang kebanyakan adalah OLTP (On Line Transaction Processing)."&gt;Almost
 all of the data is entered using a computer application used to handle 
the daily transactions that are mostly OLTP (On Line Transaction 
Processing). &lt;/span&gt;&lt;span title="Bayangkan berapa transaksi yang dimasukkan oleh hypermarket semacam Carrefour atau transaksi kartu kredit dari sebuah bank dalam seharinya dan bayangkan betapa besarnya ukuran data mereka jika nanti telah berjalan beberapa tahun."&gt;Imagine
 how many transactions are entered by such a Carrefour hypermarket or 
credit card transactions from a bank in a day and imagine how big the 
size of their data if it has been running a few years later. &lt;/span&gt;&lt;span title="Pertanyaannya sekarang, apakah data tersebut akan dibiarkan menggunung, tidak berguna lalu dibuang, ataukah kita dapat me-'nambang'-nya untuk mencari 'emas', 'berlian' yaituinformasi yang berguna untuk organisasi kita."&gt;The
 question now is whether the data is allowed to build up, useless and 
thrown away, or are we able to'nambang' her to seek 'gold', 'diamond' 
yaituinformasi useful for our organization. &lt;/span&gt;&lt;span title="Banyak diantara kita yang kebanjiran data tapi miskin informasi."&gt;Many of us are swamped with data but information poor.&lt;/span&gt;&lt;span title="Jika Anda mempunyai kartu kredit, sudah pasti Anda bakal sering menerima surat berisi brosur penawaran barang atau jasa."&gt;If you have a credit card, be sure you will often receive a letter containing a brochure offering goods or services. &lt;/span&gt;&lt;span title="Jika Bank pemberi kartu kredit Anda mempunyai 1.000.000 nasabah, dan mengirimkan sebuah (hanya satu) penawaran dengan biaya pengiriman sebesar Rp."&gt;If your credit card provider bank has 1,000,000 customers, and sends a (only one) deals with shipping costs Rp. &lt;/span&gt;&lt;span title="1.000 per buah maka biaya yang dihabiskan adalah Rp."&gt;1.000 per fruit then the amount spent is Rp. &lt;/span&gt;&lt;span title="1 Milyar!!"&gt;1 Billion! &lt;/span&gt;&lt;span title="Jika Bank tersebut mengirimkan penawaran sekali sebulan yang berarti 12x dalam setahun maka anggaran yang dikeluarkan per tahunnya adalah Rp."&gt;If the bank send offers once a month which means 12x in a year then the budget spent per year is Rp. &lt;/span&gt;&lt;span title="12 Milyar!!"&gt;12 Billion! &lt;/span&gt;&lt;span title="Dari dana Rp."&gt;Of the funds of Rp. &lt;/span&gt;&lt;span title="12 Milyar yang dikeluarkan, berapa persenkah konsumen yang benar-benar membeli?"&gt;12 Billion is spent, what percentage of consumers who actually buy? &lt;/span&gt;&lt;span title="Mungkin hanya 10 %-nya saja."&gt;Maybe only 10% of his course. &lt;/span&gt;&lt;span title="Secara harfiah, berarti 90% dari dana tersebut terbuang sia-sia."&gt;Literally, it means that 90% of the funds were wasted.&lt;/span&gt;&lt;span title="Persoalan di atas merupakan salah satu persoalan yang dapat diatasi oleh data mining dari sekian banyak potensi permasalahan yang ada."&gt;The issue above is one issue that could be addressed by data mining of the many potential problems that exist. &lt;/span&gt;&lt;span title="Data mining dapat menambang data transaksi belanja kartu kredit untuk melihat manakah pembeli-pembeli yang memang potensial untuk membeli produk tertentu."&gt;Data
 mining can mine the data of credit card shopping transactions to see 
what buyers are indeed potential to buy a particular product. &lt;/span&gt;&lt;span title="Mungkin tidak sampai presisi 10%, tapi bayangkan jika kita dapat menyaring 20% saja, tentunya 80% dana dapat digunakan untuk hal lainnya."&gt;Probably not until the precision of 10%, but imagine if we could filter out 20%, 80% of funds must be used for other things.&lt;/span&gt;&lt;span title="Lalu apa beda data mining dengan data warehouse dan OLAP (On-line Analytical Processing)?"&gt;So what different data mining with data warehousing and OLAP (On-line Analytical Processing)? &lt;/span&gt;&lt;span title="Secara singkat bisa dijawab bahwa teknologi yang ada di data warehouse dan OLAP dimanfaatkan penuh untuk melakukan data mining."&gt;Can be answered briefly that existing technology in data warehouse and OLAP fully utilized to perform data mining. &lt;/span&gt;&lt;span title="Teknologi data warehouse digunakan untuk melakukan OLAP, sedangkan data mining digunakan untuk melakukan information discovery yang informasinya lebih ditujukan untuk seorang Data Analyst dan Business Analyst (dengan ditambah visualisasi tentunya)."&gt;Data
 warehouse technology is used to perform OLAP, while data mining is used
 to perform information discovery that the information is intended for a
 Data Analyst and Business Analyst (with added visualization of course).
 &lt;/span&gt;&lt;span title="Dalam prakteknya, data mining juga mengambil data dari data warehouse."&gt;In practice, &lt;a href="http://en.wikipedia.org/wiki/Data_mining" target="_blank"&gt;data mining&lt;/a&gt; also take data from a data warehouse. &lt;/span&gt;&lt;span title="Hanya saja aplikasi dari data mining lebih khusus dan lebih spesifik dibandingkan OLAP mengingat database bukan satu-satunya bidang ilmu yang mempengaruhi data mining, banyak lagi bidang ilmu yang turut memperkaya data mining seperti: information science (ilmu informasi), high performance computing, visualisasi,"&gt;It's
 just an application of data mining is more specific and more specific 
than given OLAP database is not the only field of science that affect 
data mining, many more areas of science that enriched data mining such 
as: information science (information science), high performance 
computing, visualization, &lt;/span&gt;&lt;span title="machine learning, statistik, neural networks (jaringan syaraf tiruan), pemodelan matematika, information retrieval dan information extraction serta pengenalan pola."&gt;machine
 learning, statistics, neural networks (neural networks), mathematical 
modeling, information retrieval and information extraction and pattern 
recognition. &lt;/span&gt;&lt;span title="Bahkan pengolahan citra (image processing) juga digunakan dalam rangka melakukan data mining terhadap data image/spatial."&gt;Even the image processing (image processing) are also used in order to perform data mining of image data / spatial. &lt;/span&gt;&lt;span title="Dengan memadukan teknologi OLAP dengan data mining diharapkan pengguna dapat melakukan hal-hal yang biasa dilakukan di OLAP seperti drilling/rolling untuk melihat data lebih dalam atau lebih umum, pivoting, slicing dan dicing."&gt;By
 integrating OLAP with data mining technology is expected the user can 
do things that are usually done in OLAP such as drilling / rolling to 
see the data more deeply or more generally, pivoting, slicing and 
dicing. &lt;/span&gt;&lt;span title="Semua hal tersebut diharapkan nantinya dapat dilakukan secara interaktif dan dilengkapi dengan visualisasi."&gt;All are expected later this can be done interactively and equipped with visualization.&lt;/span&gt;&lt;span title="Data mining tidak hanya melakukan mining terhadap data transaksi saja."&gt;Data mining not only do the mining of transaction data only. &lt;/span&gt;&lt;span title="Penelitian di bidang data mining saat ini sudah merambah ke sistem database lanjut seperti object oriented database, image/spatial database, time-series data/temporal database, teks (dikenal dengan nama text mining), web (dikenal dengan nama web mining) dan multimedia"&gt;Research
 in the field of data mining are now venturing into advanced database 
systems such as object-oriented databases, image / spatial databases, 
time-series data / temporal databases, text (known as text mining), web 
(known as web mining) and multimedia &lt;/span&gt;&lt;span title="database."&gt;the database. &lt;/span&gt;&lt;span title="Meskipun gaungnya mungkin tidak seramai seperti ketika Client/Server Database muncul, tetapi industri-industri seperti IBM, Microsoft, SAS, SGI, dan SPSS terus gencar melakukan penelitian-penelitian di bidang data mining."&gt;Although
 the repercussions may not be as busy as when Client / Server Database 
appears, but industries such as IBM, Microsoft, SAS, SGI, and SPSS 
continues to aggressively conduct research in the field of data mining. &lt;/span&gt;&lt;span title="Beberapa penelitian sekarang ini sedang dilakukan untuk memajukan data mining diantaranya adalah peningkatan kinerja jika berurusan dengan data berukuran terabyte, visualisasi yang lebih menarik untuk user, pengembangan bahasa query untuk data mining yang sedapat mungkin mirip dengan SQL."&gt;Some
 research now being conducted to advance data mining include performance
 improvements when dealing with terabyte-sized data, visualization is 
more attractive to users, the development of data mining query language 
for that as far as possible similar to SQL. &lt;/span&gt;&lt;span title="Tujuannya tidak lain adalah agar end-user dapat melakukan data mining dengan mudah dan cepat serta mendapatkan hasil yang akurat."&gt;The aim is none other than to end-users to perform data mining to easily and quickly and get accurate results.&lt;/span&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>How to Improve Website Ranking with easy</title><link>http://freelearningcenter.blogspot.com/2011/12/how-to-improve-website-ranking-with.html</link><category>tips and tricks</category><author>noreply@blogger.com (Andy)</author><pubDate>Tue, 13 Dec 2011 19:22:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-1778651762324809220</guid><description>&lt;a href="http://en.wikipedia.org/wiki/Search_engine_optimization" target="_blank"&gt;Search engine optimization&lt;/a&gt; (&lt;b&gt;SEO&lt;/b&gt;) involves taking steps to help a website
 index higher in the search engines for the purpose of increasing 
organic traffic to the site. Organic traffic represents website visitors
 are those who find their way to a particular site as the result of 
running a search engine inquiry rather than because they click on an 
advertisement.
&lt;br /&gt;
Each search engine uses a different algorithm to determine search engine
 ranking. One of the first steps in figuring out how to improve web page
 ranking is to develop an understanding of what search algorithms work 
and using that knowledge to develop and market your website.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span class="hps"&gt;There&lt;/span&gt; &lt;span class="hps"&gt;are several ways&lt;/span&gt; &lt;span class="hps"&gt;to&lt;/span&gt; &lt;span class="hps"&gt;improve&lt;/span&gt; &lt;span class="hps"&gt;website&lt;/span&gt; &lt;span class="hps"&gt;rankings&lt;/span&gt;&lt;span class=""&gt; :&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;h3&gt;


&lt;span style="font-size: small;"&gt;Build Backlinks
&lt;/span&gt;&lt;/h3&gt;
One of the factors that search engines consider when indexing websites 
is the number of backlinks pointing to a website. A backlink is simply a
 link to your website from another site. When you want to improve your 
site's ranking, you should spend time building backlinks to your site. 
There are two types of backlinks: Reciprocal and one way. When your 
website links to a site that also links to yours, the link is 
reciprocal. When a website links to yours, but your site does not also 
include a link to that site, the link is non-reciprocal. Both types of 
links can be beneficial, but search engines tend to look more favorably 
upon one-way links.
&lt;br /&gt;
The quality of the website that links to your website also plays a role 
in how favorably search engines will rank your website. Focus on 
building links on content rich websites that are contextually relevant 
to the content of your site.
&lt;br /&gt;
Tips for building backlinks include:
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Submit articles that include links to your website to free article directories.
&lt;/li&gt;
&lt;li&gt;Post comments on relevant blogs that include links to your website.
&lt;/li&gt;
&lt;li&gt;Participate in forums related to the topic of your website and include a link to your site in the signature line.
&lt;/li&gt;
&lt;li&gt;Submit your website to free directory websites.
&lt;/li&gt;
&lt;li&gt;Consider purchasing paid blog posts that include website links from providers such as &lt;a href="http://www.smorty.com/" rel="nofollow"&gt;Smorty&lt;/a&gt;, &lt;a href="http://www.blogsvertise.com/" rel="nofollow"&gt;Blogsvertise&lt;/a&gt; , or &lt;a href="http://www.sponsoredreviews.com/" rel="nofollow"&gt;SponsoredReviews&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;If you own more than one website, include links to your other sites on them, as appropriate.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;b&gt;&lt;span style="font-size: small;"&gt;Frequently Updated Original Content
&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;
Websites that have original content that is updated frequently index 
higher in search engines than those that contain duplicate content 
and/or are rarely updated. Tips for how to improve web page ranking with
 content include:
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Place an articles section to your website and add original content to it several times each week.
&lt;/li&gt;
&lt;li&gt;Add a blog to your website and update it with original content at least three times weekly.
&lt;/li&gt;
&lt;li&gt;You can write your own articles, or hire a ghostwriter to do it for you.
&lt;/li&gt;
&lt;li&gt;Use &lt;a class="external" href="http://www.copyscape.com/" rel="nofollow" target="_blank"&gt;Copyscape&lt;/a&gt; to make sure that the content you're adding to your site is not duplicate content.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;


&lt;span style="font-size: small;"&gt;Use High Traffic Keywords and Key Phrases
&lt;/span&gt;&lt;/h3&gt;
In order to generate organic traffic, both the content of your website 
and the source code should include high traffic keywords and key 
phrases. It's important to research words and phrases that people 
actually use to look for content on the Internet when deciding which key
 terms to use in your website. It's also important to consider how much 
competition there is for search engine positioning with the keywords you
 are thinking about using.There are several free keyword research 
resources. such as &lt;a class="external" href="http://freekeywords.wordtracker.com/" rel="nofollow" target="_blank"&gt;Word Tracker&lt;/a&gt; and &lt;a class="external" href="https://adwords.google.com/select/KeywordToolExternal" rel="nofollow" target="_blank"&gt;Google's Adwords keyword tool&lt;/a&gt;.
 If you are operating a commercial website, it's advisable to utilize a 
subscription based keyword research site, because you'll have access to 
much more comprehensive research on which to base your keyword 
decisions.
&lt;br /&gt;
The keywords you choose for your website should be used in the site's content as well as in the &lt;a href="http://web-design.lovetoknow.com/Meta_Tag_Optimization_Service" title="Meta Tag Optimization Service"&gt;meta tags&lt;/a&gt;
 that appear in your site's source code. It's smart to use high search 
key terms in your site's page titles, descriptions, and lists of meta 
tag keywords. There must be consistency between the keywords and phrases
 in your source code and in the copy on your website.
&lt;br /&gt;
Avoid overusing key phrases in your content to keep the copy on your 
site from sounding like SPAM. Using the same phrases over and over in 
the same article or section of your site can actually hurt your site's 
search engine ranking. Your content should flow and read naturally, yet 
be rich in high traffic keyword terminology.
&lt;br /&gt;
&lt;br /&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></item><item><title>Free Journal Sites</title><link>http://freelearningcenter.blogspot.com/2011/12/free-journal-sites.html</link><category>tips and tricks</category><author>noreply@blogger.com (Andy)</author><pubDate>Tue, 13 Dec 2011 19:05:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-2074931275097106581</guid><description>&lt;span class="" id="result_box" lang="en"&gt; &lt;span class="hps"&gt;For&lt;/span&gt; &lt;span class="hps"&gt;you guys&lt;/span&gt; &lt;span class="hps"&gt;whose hobby is&lt;/span&gt; &lt;span class="hps"&gt;researching&lt;/span&gt; &lt;span class="hps"&gt;or&lt;/span&gt; &lt;span class="hps"&gt;working on&lt;/span&gt; &lt;span class="hps"&gt;the final project&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span class="hps"&gt;thesis&lt;/span&gt; &lt;span class="hps"&gt;or&lt;/span&gt; &lt;span class="hps"&gt;dissertation&lt;/span&gt;&lt;span class=""&gt;,&lt;/span&gt; &lt;span class="hps"&gt;the reference&lt;/span&gt; &lt;span class="hps"&gt;is very important&lt;/span&gt; &lt;span class="hps"&gt;and&lt;/span&gt; &lt;span class="hps"&gt;sometimes&lt;/span&gt; &lt;span class="hps"&gt;this becomes&lt;/span&gt; &lt;span class="hps"&gt;a major&lt;/span&gt; &lt;span class="hps"&gt;obstacle&lt;/span&gt; &lt;span class="hps"&gt;because of&lt;/span&gt; &lt;span class="hps"&gt;the possibility of&lt;/span&gt; &lt;span class="hps"&gt;confusion&lt;/span&gt; &lt;span class="hps"&gt;to search&lt;/span&gt; &lt;span class="hps"&gt;the journals&lt;/span&gt; &lt;span class="hps"&gt;for free&lt;/span&gt;&lt;span&gt;.&lt;/span&gt; &lt;span class="hps"&gt;below&lt;/span&gt; &lt;span class="hps"&gt;the site address to&lt;/span&gt; &lt;span class="hps"&gt;several&lt;/span&gt; &lt;span class="hps"&gt;journals&lt;/span&gt; &lt;span class="hps"&gt;for free:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt; &lt;span class="hps"&gt;1.&lt;/span&gt; &lt;span class="hps"&gt;Citeseer&lt;/span&gt; &lt;span class="hps atn"&gt;(&lt;/span&gt;&lt;span class=""&gt;thousands of&lt;/span&gt; &lt;span class="hps"&gt;paper&lt;/span&gt; &lt;span class="hps"&gt;journals&lt;/span&gt; &lt;span class="hps"&gt;the field of computer&lt;/span&gt; &lt;span class="hps"&gt;science&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;2.&lt;/span&gt; &lt;a href="http://www.doaj.org/" target="_blank"&gt;&lt;span class="hps"&gt;Directory&lt;/span&gt; &lt;span class="hps"&gt;of&lt;/span&gt; &lt;span class="hps"&gt;Open&lt;/span&gt; &lt;span class="hps"&gt;Access&lt;/span&gt; &lt;span class="hps"&gt;Journal&lt;/span&gt;&lt;/a&gt;&lt;br /&gt; &lt;span class="hps"&gt;3.&lt;/span&gt; &lt;a href="http://www.blogger.com/goog_648842313"&gt;&lt;span class="hps"&gt;PubMed&lt;/span&gt; &lt;/a&gt;&lt;span class="hps atn"&gt;&lt;a href="http://www.pubmedcentral.nih.gov/" target="_blank"&gt;Central&lt;/a&gt; (&lt;/span&gt;&lt;span&gt;free&lt;/span&gt; &lt;span class="hps"&gt;digital&lt;/span&gt; &lt;span class="hps"&gt;archive&lt;/span&gt; &lt;span class="hps"&gt;of&lt;/span&gt; &lt;span class="hps"&gt;biomedical&lt;/span&gt; &lt;span class="hps"&gt;and&lt;/span&gt; &lt;span class="hps"&gt;life sciences&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;4.&lt;/span&gt; &lt;a href="http://scholar.google.co.id/" target="_blank"&gt;&lt;span class="hps"&gt;Google&lt;/span&gt; &lt;span class="hps"&gt;Scholar&lt;/span&gt;&lt;/a&gt; &lt;span class="hps atn"&gt;(&lt;/span&gt;&lt;span class=""&gt;citation&lt;/span&gt; &lt;span class="hps"&gt;index&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span class="hps"&gt;abstraction&lt;/span&gt; &lt;span class="hps"&gt;dam&lt;/span&gt; &lt;span class="hps"&gt;fulltext&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;5.&lt;/span&gt; &lt;a href="http://www.arsip.lipi.go.id/" target="_blank"&gt;&lt;span class="hps"&gt;Scientific&lt;/span&gt; &lt;span class="hps"&gt;Data&lt;/span&gt; &lt;span class="hps"&gt;in&lt;/span&gt; &lt;span class="hps"&gt;LIPI&lt;/span&gt; &lt;span class="hps"&gt;Mirror&lt;/span&gt;&lt;/a&gt; &lt;span class="hps"&gt;(mirror&lt;/span&gt; &lt;span class="hps"&gt;in&lt;/span&gt; &lt;span class="hps"&gt;LIPI&lt;/span&gt; &lt;span class="hps"&gt;for&lt;/span&gt; &lt;span class="hps"&gt;international&lt;/span&gt; &lt;span class="hps"&gt;scientific&lt;/span&gt; &lt;span class="hps"&gt;journals&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;6.&lt;/span&gt; &lt;a href="http://www.informatik.uni-trier.de/%7Eley/db/" target="_blank"&gt;&lt;span class="hps"&gt;DBLP&lt;/span&gt; &lt;span class="hps"&gt;Bibliography&lt;/span&gt;&lt;/a&gt;&lt;br /&gt; &lt;span class="hps"&gt;7.&lt;/span&gt; &lt;a href="http://libra.msra.cn/" target="_blank"&gt;&lt;span class="hps"&gt;Libra&lt;/span&gt; &lt;span class="hps"&gt;Academic Search&lt;/span&gt;&lt;/a&gt;&lt;br /&gt; &lt;span class="hps"&gt;8.&lt;/span&gt; &lt;a href="http://www.jstor.org/" target="_blank"&gt;&lt;span class="hps"&gt;JSTOR&lt;/span&gt; &lt;span class="hps"&gt;scholarly&lt;/span&gt; &lt;span class="hps"&gt;Journal&lt;/span&gt; &lt;span class="hps"&gt;Archieve&lt;/span&gt;&lt;/a&gt;&lt;br /&gt; &lt;span class="hps"&gt;9.&lt;/span&gt; &lt;a href="http://www.blogger.com/goog_648842330"&gt;&lt;span class="hps"&gt;Biomed&lt;/span&gt; &lt;/a&gt;&lt;span class="hps atn"&gt;&lt;a href="http://www.biomedcentral.com/" target="_blank"&gt;Central&lt;/a&gt; (&lt;/span&gt;&lt;span&gt;the Open&lt;/span&gt; &lt;span class="hps"&gt;Access&lt;/span&gt; &lt;span class="hps"&gt;Publisher&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;10.&lt;/span&gt; &lt;a href="http://highwire.stanford.edu/" target="_blank"&gt;&lt;span class="hps"&gt;Highwire&lt;/span&gt; &lt;span class="hps"&gt;Press&lt;/span&gt; &lt;span class="hps"&gt;Stanford&lt;/span&gt; &lt;span class="hps"&gt;University&lt;/span&gt;&lt;/a&gt;&lt;br /&gt; &lt;span class="hps"&gt;11.&lt;/span&gt; &lt;a href="http://itunes.berkeley.edu/" target="_blank"&gt;&lt;span class="hps"&gt;UC&lt;/span&gt; &lt;span class="hps"&gt;Berkeley&lt;/span&gt; &lt;span class="hps"&gt;on&lt;/span&gt; &lt;span class="hps"&gt;iTunes&lt;/span&gt; &lt;span class="hps"&gt;U&lt;/span&gt;&lt;/a&gt; &lt;span class="hps atn"&gt;(&lt;/span&gt;&lt;span&gt;for free&lt;/span&gt; &lt;span class="hps"&gt;study&lt;/span&gt; &lt;span class="hps"&gt;material&lt;/span&gt; &lt;span class="hps"&gt;from&lt;/span&gt; &lt;span class="hps"&gt;UC&lt;/span&gt; &lt;span class="hps"&gt;Berkeley&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;12.&lt;/span&gt; &lt;a href="http://ocw.mit.edu/OcwWeb/web/home/home/index.htm" target="_blank"&gt;&lt;span class="hps"&gt;MIT&lt;/span&gt; &lt;span class="hps"&gt;OpenCourseWare&lt;/span&gt;&lt;/a&gt; &lt;span class="hps atn"&gt;(&lt;/span&gt;&lt;span&gt;MIT&lt;/span&gt; &lt;span class="hps"&gt;free&lt;/span&gt; &lt;span class="hps"&gt;course&lt;/span&gt; &lt;span class="hps"&gt;material&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;br /&gt; &lt;span class="hps"&gt;13.&lt;/span&gt; &lt;a href="http://www.irossco.com/patentsearching.htm" target="_blank"&gt;&lt;span class="hps"&gt;Patent&lt;/span&gt; &lt;span class="hps"&gt;Searching&lt;/span&gt;&lt;/a&gt; &lt;span class="hps atn"&gt;(&lt;/span&gt;&lt;span&gt;Patent&lt;/span&gt; &lt;span class="hps"&gt;Document&lt;/span&gt; &lt;span class="hps"&gt;Search&lt;/span&gt;&lt;span class=""&gt;)&lt;/span&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>How to Enrol Website to Search Engines</title><link>http://freelearningcenter.blogspot.com/2011/12/how-to-enrol-website-to-search-engines.html</link><category>tips and tricks</category><author>noreply@blogger.com (Andy)</author><pubDate>Tue, 13 Dec 2011 18:49:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-5689208177272924567</guid><description>&lt;br /&gt;
&lt;span class="" id="result_box" lang="en"&gt;&lt;span title="Sebelum mendaftarkan website ke search engine, ada baiknya kalau Anda mengetahui sekilas tentang cara kerjanya."&gt;Before registering a website to search engines, it helps if you know at a glance about how it works. &lt;/span&gt;&lt;span title="Search engine merupakan sebuah sistem database yang dirancang untuk meng-index alamat-alamat internet (URL, FTP, dll)."&gt;&lt;b&gt;Search engine&lt;/b&gt; is a database system designed to index internet addresses (URL, FTP, etc.). &lt;/span&gt;&lt;span title="Untuk melaksanakan tugasnya, search engine memiliki program khusus yang biasanya disebut spider, bot, atau crawler."&gt;To carry out its duties, search engines have a special program that is usually called a spider, bot, or crawler. &lt;/span&gt;&lt;span title="Pada saat Anda mendaftarkan sebuah alamat web (URL), spider dari search engine akan menerima dan menganalisa URL tersebut."&gt;By the time you register a web address (URL), the search engine spiders will receive and analyze the URL. &lt;/span&gt;&lt;span title="Dengan algoritma tertentu, spider akan memutuskan apakah web yang anda daftarkan layak diterima atau tidak."&gt;With certain algorithms, spider web will decide whether you submitted acceptable or not. &lt;/span&gt;&lt;span title="Jika layak, spider akan menambahkan URL tersebut ke sistem database mereka."&gt;If feasible, the spider will add the URL to their database systems. &lt;/span&gt;&lt;span title="Namun jika tidak, terpaksa Anda harus bersabar dan mengulangi pendaftaran dalam periode tertentu."&gt;But if not, you are forced to be patient and repeat the registration within a certain period. &lt;/span&gt;&lt;span title="Kecepatan crawler setiap search engine berbeda-beda."&gt;The speed of each search engine crawler is different. &lt;/span&gt;&lt;span title="Oleh sebab itu jika Anda mendaftarkan URL sekarang, kemungkinan baru bisa dilisting dalam 2 minggu hingga 2 bulan."&gt;Therefore, if you register the URL now, new possibilities can dilisting in 2 weeks to 2 months. &lt;/span&gt;&lt;span title="Ini disebabkan karena ada ribuan URL baru yang mendaftar setiap harinya."&gt;This is because there are thousands of new URLs that register every day. &lt;/span&gt;&lt;span title="Google.com dan AllTheWeb.com adalah dua search engine yang memiliki kapasitas crawler terbesar saat ini."&gt;&lt;a href="http://google.com/"&gt;Google.com&lt;/a&gt; and &lt;a href="http://alltheweb.com/"&gt;AllTheWeb.com&lt;/a&gt; are two search engines that have the largest capacity crawler today.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span title="Melakukan Pendaftaran"&gt;&lt;b&gt;Doing Register&lt;/b&gt;&lt;/span&gt;&lt;span title="Bagaimana melakukan pendaftaran ?"&gt;How to register? &lt;/span&gt;&lt;span title="Gampang, Anda tinggal mengisi form pendaftaran URL yang disediakan masing-masing search engine."&gt;Easy, you just fill out the registration form provided URL that each search engine. &lt;/span&gt;&lt;span title="Misalnya untuk Google, form pendaftarannya ada di http://www.google.com/addurl.html."&gt;For example, for &lt;a href="http://www.google.com/"&gt;Google&lt;/a&gt;, there are at &lt;a href="http://www.google.com/addurl.html"&gt;http://www.google.com/addurl.html&lt;/a&gt; registration form. &lt;/span&gt;&lt;span title="Setelah itu tunggu beberapa minggu dan lakukan pengecekan apakah website Anda sudah terlisting atau belum."&gt;After that, wait a few weeks and do check whether your website is terlisting or not. &lt;/span&gt;&lt;span title="Cara mengeceknya, masuk ke situs search engine yang bersangkutan kemudian ketikkan domain Anda sebagai kata kunci, misalnya andyku.wordpress.com."&gt;How
 to check, go to the site search engine in question and then type your 
domain as a keyword, for example andyku.wordpress.com. &lt;/span&gt;&lt;span title="Jika URL Anda muncul pada hasil pencarian (search result), berarti website Anda sudah terdaftar."&gt;If your URL appears on the search results (search result), it means that your website is listed. &lt;/span&gt;&lt;span title="Bagaimana jika tidak muncul ?"&gt;What if does not appear? &lt;/span&gt;&lt;span title="Ada dua kemungkinan penyebab."&gt;There are two possible causes. &lt;/span&gt;&lt;span title="Pertama, karena crawler search engine memang belum sempat berkunjung dan membaca isi (content) dari web Anda."&gt;First, because search engine crawlers do not visit and read the contents (content) from your web. &lt;/span&gt;&lt;span title="Kedua, crawler sudah berkunjung namun tidak dapat membaca content karena sesuatu hal, misalnya pada saat web server Anda down atau kelebihan traffic."&gt;Second,
 the crawler has visited but can not read the content because the case, 
for example, when the web server down or excess traffic. &lt;/span&gt;&lt;span title="Pada kasus kedua, search engine biasanya akan berusaha membaca ulang URL tersebut dalam jangka waktu tertentu, namun cara terbaik adalah melakukan pendaftaran ulang apabila website Anda sudah benar-benar siap."&gt;In
 both cases, search engines will usually try to re-read the URL in the 
specified period, but the best way is to re-register if your website is 
completely ready.&lt;/span&gt;&lt;span title="Sekarang kita asumsikan pendaftaran website Anda sudah berhasil, apakah itu sudah cukup ?"&gt;Now we assume that your website registration is successful, whether it is enough? &lt;/span&gt;&lt;span title="Tentu saja tidak, Anda perlu melakukan pengujian berikutnya."&gt;Of course not, you need to do the next test. &lt;/span&gt;&lt;span title="Coba ketikkan keyword atau frase umum yang berhubungan dengan isi website."&gt;Try typing the keyword or phrase commonly associated with the content of the website. &lt;/span&gt;&lt;span title="Misalnya jika website Anda menawarkan produk komputer bekas, cobalah cari dengan kata kunci “komputer bekas”."&gt;For example if your website offers products used computer, try the keyword search for "old computers". &lt;/span&gt;&lt;span title="Jika website Anda muncul di halaman pertama atau kedua, itu sudah bagus."&gt;If your website appears on the first page or two, it is good. &lt;/span&gt;&lt;span title="Namun jika website Anda muncul di halaman-halaman belakang, janganlah berharap traffic terlalu banyak."&gt;But if your website appears in the pages back, do not expect too much traffic. &lt;/span&gt;&lt;span title="Sangat kecil kemungkinan visitor akan dapat menemukan URL Anda."&gt;Very small visitor will be able to find your URLs. &lt;/span&gt;&lt;span title="Mau tidak mau, website Anda harus berjuang memperebutkan posisi 20 besar dari ratusan bahkan jutaan website pesaing."&gt;Inevitably, your website must fight position 20 of the hundreds of millions of competing websites.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;table align="center" border="1" cellpadding="3"&gt;&lt;caption&gt;&lt;strong&gt;The Websites to Register Your Domain&lt;/strong&gt;&lt;/caption&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Search Engine&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;Registration URL&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;a class="external" href="http://www.google.com/" rel="nofollow" target="_blank"&gt;Google&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;a class="external" href="http://www.google.com/addurl/?continue=/addurl" rel="nofollow" target="_blank"&gt;Register Here&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;a class="external" href="http://www.yahoo.com/" rel="nofollow" target="_blank"&gt;Yahoo&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;a class="external" href="http://search.yahoo.com/info/submit.html" rel="nofollow" target="_blank"&gt;Register Here&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;a class="external" href="http://search.msn.com/" rel="nofollow" target="_blank"&gt;MSN&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;a class="external" href="http://search.msn.com/docs/submit.aspx" rel="nofollow" target="_blank"&gt;Register Here&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;a class="external" href="http://www.dmoz.org/" rel="nofollow" target="_blank"&gt;DMOZ&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;a class="external" href="http://www.dmoz.org/" rel="nofollow" target="_blank"&gt;Add URL to Category&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;br /&gt;&lt;span title="Selamat Mencoba ……."&gt;Good luck ... ....&lt;/span&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>Default Password in Oracle</title><link>http://freelearningcenter.blogspot.com/2011/12/default-password-in-oracle.html</link><category>tips and tricks</category><author>noreply@blogger.com (Andy)</author><pubDate>Tue, 13 Dec 2011 00:51:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-6610222515964883419</guid><description>&lt;span style="color: darkblue;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: darkblue;"&gt;
       Here is a sample session of running these scripts to show you how
 the tools work. The test was run against a 9.2.0.1 database on Windows 
XP.
&lt;/span&gt;&lt;br /&gt;
&lt;pre&gt;&lt;span style="color: darkblue;"&gt;Connected to:
Personal Oracle9i Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

SQL&amp;gt; @\petefinnigan.com\password\marcel-jan\osp_exec.sql
*********************************************
*                                           *
*  Welcome to the Oracle Security Probe     *
*                                           *
*********************************************

Connectstring (destination database): sans
Password of oraprobe?: ********
Connected.
Oracle accounts with default passwords
======================================

Username: SYS
Password: CHANGE_ON_INSTALL
-----------------------------------------------
WARNING! The password of SYS is a default password.&amp;nbsp;&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: darkblue;"&gt;It is well known to hackers
&lt;a name='more'&gt;&lt;/a&gt;
Additional information:
SYS is Oracle's most powerful database management account. 
It allows to read,change and destroy all data in your database.


Username: SYSTEM
Password: MANAGER
-----------------------------------------------
WARNING! The password of SYSTEM is a default password. 
It is well known to hackers

Additional information:
SYSTEM is Oracle's database management account. 
It allows to read, change and destroy all data in your database.


Username: SCOTT
Password: TIGER
-----------------------------------------------
WARNING! The password of SCOTT is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: DBSNMP
Password: DBSNMP
-----------------------------------------------
WARNING! The password of DBSNMP is a default password. 
It is well known to hackers

Additional information:
DBSNMP is an account for the Oracle Intelligent Agent. 
Under certain circumstances it allows to read passwords from memory.


Username: QS_ES
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED(TIMED)
-----------------------------------------------
WARNING! The password of QS_ES is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: WMSYS
Password: WMSYS
Status: LOCKED
-----------------------------------------------
WARNING! The password of WMSYS is a default password.&amp;nbsp;&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: darkblue;"&gt;It is well known to hackers

Additional information:



Username: ORDSYS
Password: ORDSYS
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of ORDSYS is a default password. 
It is well known to hackers

Additional information:
The account ORDSYS (Oracle Time Series) has a limited number of 
risky system privileges, amongst which those to use external 
libraries and run code on the operating system.


Username: ORDPLUGINS
Password: ORDPLUGINS
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of ORDPLUGINS is a default password. 
It is well known to hackers

Additional information:
ORDPLUGINS is an administrative account for Oracle Time Series.


Username: MDSYS
Password: MDSYS
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of MDSYS is a default password. 
It is well known to hackers

Additional information:
The account MDSYS (Oracle Spatial administrator) has DBA-like 
privileges, which allow to read, change and destroy all data 
in your database.


Username: CTXSYS
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of CTXSYS is a default password. 
It is well known to hackers

Additional information:
CTXSYS (Oracle Text/Intermedia Text/Context option) is 
an account with DBA privileges and therefor allows to read, 
change and destroy all data in your database.


Username: XDB
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of XDB is a default password. 
It is well known to hackers

Additional information:



Username: WKSYS
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of WKSYS is a default password. 
It is well known to hackers

Additional information:
WKSYS is an administrative account of Oracle9iAS Ultrasearch.


Username: WKPROXY
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of WKPROXY is a default password. 
It is well known to hackers

Additional information:
WKPROXY is an administrative account of Oracle9iAS Ultrasearch.


Username: ODM
Password: ODM
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of ODM is a default password. 
It is well known to hackers

Additional information:



Username: ODM_MTR
Password: 
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of ODM_MTR is a default password. 
It is well known to hackers

Additional information:



Username: OLAPSYS
Password: MANAGER
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of OLAPSYS is a default password. 
It is well known to hackers

Additional information:
OLAPSYS is an administrative account for the OLAP Services option.


Username: RMAN
Password: RMAN
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of RMAN is a default password. 
It is well known to hackers

Additional information:
RMAN is an account for the Oracle Recovery Manager. 
This account might be misused to write unwanted changes 
to the database to the backups.


Username: QS_CS
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS_CS is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: QS_CB
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS_CB is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: QS_CBADM
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS_CBADM is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: QS_OS
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS_OS is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: HR
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of HR is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: OE
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of OE is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: PM
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of PM is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: SH
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of SH is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: QS_ADM
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS_ADM is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: QS
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


Username: QS_WS
Password: CHANGE_ON_INSTALL
Status: EXPIRED &amp;amp; LOCKED
-----------------------------------------------
WARNING! The password of QS_WS is a default password. 
It is well known to hackers

Additional information:
This is a training account. It should not be available 
in a production environment.


SQL&amp;gt; 

&lt;/span&gt;&lt;/pre&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><title>Lost Oracle SYS and SYSTEM password ?</title><link>http://freelearningcenter.blogspot.com/2011/12/lost-oracle-sys-and-system-password.html</link><category>tips and tricks</category><author>noreply@blogger.com (Andy)</author><pubDate>Tue, 13 Dec 2011 00:41:00 -0800</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-6316360081793516006.post-7350947555349856856</guid><description>If your administration is as good as anybodies, you are bound to 
loose the not-so-frequently used password for the &lt;b&gt;SYS &lt;/b&gt;and &lt;b&gt;SYSTEM &lt;/b&gt;users 
of &lt;b&gt;oracle&lt;/b&gt;. Here are a few ways I found to re-set those passwords:&lt;br /&gt;

&lt;br /&gt;
&lt;strong&gt;Method 1: SQLPLUS (Tested on AIX Oracle 9.2.0.1.0)&lt;/strong&gt;&lt;br /&gt;

Log into the database server as a user belonging to ‘dba’ [unix ] or 
‘ora_dba’ [windows ] group , typically ‘oracle’, or an administrator on 
your windos machine. You are able to log into Oracle as SYS user, and&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
change the &lt;b&gt;SYSTEM &lt;/b&gt;password by doing the following : &lt;br /&gt;

&lt;pre&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;$ &lt;/span&gt;&lt;span style="color: blue;"&gt;sqlplus "/ as sysdba"&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;SQL*Plus: Release 9.2.0.1.0 - Production on Mon Apr 5 15:32:09 2004&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;SQL&amp;gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;show user&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;USER is "SYS"&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;SQL&amp;gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;passw system&lt;/span&gt;
Changing password for system
New password:
Retype new password:
Password changed
SQL&amp;gt; &lt;span style="color: blue;"&gt;quit&lt;/span&gt;

&lt;/tt&gt;&lt;/pre&gt;
Next, we need to change the password of &lt;b&gt;SYS &lt;/b&gt;:&lt;br /&gt;

&lt;pre&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;$ &lt;/span&gt;&lt;span style="color: blue;"&gt;sqlplus "/ as system"&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;SQL*Plus: Release 9.2.0.1.0 - Production on Mon Apr 5 15:36:45 2004&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;SP2-0306: Invalid option.
Usage: CONN[ECT] [logon] [AS {SYSDBA|SYSOPER}]
where &amp;lt;logon&amp;gt;  ::= &amp;lt;username&amp;gt;[/&amp;lt;password&amp;gt;][@&amp;lt;connect_string&amp;gt;] | /
Enter user-name: system
Enter password:&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production&lt;/span&gt;&lt;/tt&gt;
&lt;tt&gt;&lt;span style="color: grey;"&gt;
&lt;/span&gt;&lt;/tt&gt;&lt;tt&gt;&lt;span style="color: grey;"&gt;SQL&amp;gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;passw sys&lt;/span&gt;
Changing password for sys
New password:
Retype new password:
Password changed
SQL&amp;gt; &lt;span style="color: blue;"&gt;quit&lt;/span&gt;
&lt;/tt&gt;&lt;/pre&gt;
You should now be able to log on the SYS and SYSTEM users, with the passwords you just typed in.&lt;br /&gt;

&lt;br /&gt;
&lt;strong&gt;Method 2: Creating pwd file (Tested on Windows Oracle 8.1.7)&lt;/strong&gt;&lt;br /&gt;

&lt;ol&gt;
&lt;li&gt;Stop the Oracle service of the instance you want to change the passwords of.&lt;/li&gt;
&lt;li&gt;Find the &lt;tt&gt;PWD###.ora&lt;/tt&gt; file for this instance, this is usuallly located at&lt;tt&gt;C:\oracle\ora81\database\&lt;/tt&gt;, where ### is the SID of your database.&lt;/li&gt;
&lt;li&gt;rename the &lt;tt&gt;PWD###.ora&lt;/tt&gt; file to &lt;tt&gt;PWD###.ora.bak&lt;/tt&gt; for obvious safety reasons.&lt;/li&gt;
&lt;li&gt;Create a new pwd file by issuing the command: &lt;tt&gt;&lt;br /&gt;
orapwd &lt;/tt&gt;&lt;tt&gt;file=C:\oracle\ora81\database\PWD###.ora password=XXXXX&lt;/tt&gt;&lt;br /&gt;
where ### is the SID and XXXXX is the password you would like to use for the SYS and INTERNAL accounts.&lt;/li&gt;
&lt;li&gt;Start the Oracle service for the instance you just fixed. You should
 be able to get in with the SYS user and change other passwords from 
there.&lt;/li&gt;
&lt;/ol&gt;</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item></channel></rss>