<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" gd:etag="W/&quot;A0INSH4-cSp7ImA9WxNVFE0.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121</id><updated>2009-10-24T11:39:59.059-07:00</updated><title>Visualizing High Dimensional Data</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/" /><link rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>28</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><link rel="self" href="http://feeds.feedburner.com/JamesXLiBlog" type="application/atom+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry gd:etag="W/&quot;A0INSHw9fyp7ImA9WxNVFE0.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-2321934728525477511</id><published>2009-10-24T11:15:00.000-07:00</published><updated>2009-10-24T11:39:59.267-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-24T11:39:59.267-07:00</app:edited><title>New productivity features in VisuMap 3.0</title><content type="html">I have just posted a short video that shows some new productivity features implemented in VisuMap version 3.0.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.visumap.net/Resources/Demo/ProductivityFeatures.htm"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 220px;" src="http://www.visumap.net/Resources/Demo/ProductivityFeatures.PNG" alt="" id="BLOGGER_PHOTO_ID_5396234976914657842" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-2321934728525477511?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/2321934728525477511/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=2321934728525477511" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2321934728525477511?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2321934728525477511?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/10/new-productivity-features-in-visumap-30.html" title="New productivity features in VisuMap 3.0" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;AkUFQ3c_fip7ImA9WxNQF0k.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-2860547419568897904</id><published>2009-09-23T16:11:00.001-07:00</published><updated>2009-09-23T16:23:32.946-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-23T16:23:32.946-07:00</app:edited><title>A layman's introduction to principal component analysis.</title><content type="html">I have just uploaded a very short introduction for PCA to youtube. The video is probably the shortest introduction to PCA (1.5min). This introduction emphasizes the geometrical aspects, instead of the usual statistical nature. I especially like the statement that the covariance matrix is just a method to measure the average extend of an object (a point cloud) along any axis.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object height="344" width="425"&gt;&lt;param name="movie" value="http://www.youtube.com/v/BfTMmoDFXyE&amp;amp;hl=en&amp;amp;fs=1&amp;amp;"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/BfTMmoDFXyE&amp;amp;hl=en&amp;amp;fs=1&amp;amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="344" width="425"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-2860547419568897904?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/2860547419568897904/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=2860547419568897904" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2860547419568897904?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2860547419568897904?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/09/laymans-introduction-to-principal.html" title="A layman's introduction to principal component analysis." /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;AkIDSXsyeip7ImA9WxNRFUw.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-3988197063732338689</id><published>2009-08-31T10:23:00.000-07:00</published><updated>2009-09-09T10:49:38.592-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-09T10:49:38.592-07:00</app:edited><title>VisuMap 3.0 Released</title><content type="html">Today we have released VisuMap 3.0. This is a major upgrade from the previous version 2.7.&lt;br /&gt;&lt;br /&gt;The main change in this release is the implementation of the &lt;span style="font-style: italic;"&gt;Atlas&lt;/span&gt; service. An atlas in VisuMap 3.0 is frame-less window  that enables users to organize and compose data views of different types. The following picture shows the snapshot of a sample atlas:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/SpwMEys0O0I/AAAAAAAAFbU/wy6eZpTvvFQ/s1600-h/VisuMapAtlas.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 227px;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/SpwMEys0O0I/AAAAAAAAFbU/wy6eZpTvvFQ/s400/VisuMapAtlas.png" alt="" id="BLOGGER_PHOTO_ID_5376185331642284866" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Atlas service in VisuMap is thus similar as dashboard  service in some other visualization centric software applications. It allows users to group visual information according to the subject instead of information type itself. And more importantly for VisuMap users, atlas service enables  users to combine multiple algorithms to generate and compose heterogeneously structured information visualization. For instance, this release includes a script (PartitionAtlas.js) that automatically partitions a dataset in several clusters, and  generates various maps for each cluster.&lt;br /&gt;&lt;br /&gt;Together with this release we have extensive updated the scripting interface. Unfortunately, some old scripts may need minor adjustments to convert to this new release. We  have also extended the data format extensively to accommodate the new service. However, the data format has kept compatible in the sense that old dataset files can be loaded into new version, but datasets of new format can not be read by previous versions.&lt;br /&gt;&lt;br /&gt;Click the following image to watch a short video that shows how to create an atlas interactively:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.visumap.net/Resources/Demo/AtlasDemo.htm"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 664px; height: 249px;" src="http://www.visumap.net/Resources/Demo/AtlasDemo.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.visumap.net/Resources/Demo/PartitionAtlas.htm"&gt;Here&lt;/a&gt; is another video that shows how a script uses kmean algorithm to partition a dataset and generate different data views for the partitions.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.visumap.net/Resources/Demo/AtlasClusters.htm"&gt;This video&lt;/a&gt; shows using pre-configured SOG cluster to create mutiple data views for a dataset.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-3988197063732338689?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/3988197063732338689/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=3988197063732338689" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3988197063732338689?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3988197063732338689?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/08/visumap-30-released.html" title="VisuMap 3.0 Released" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Wca44kNaYGE/SpwMEys0O0I/AAAAAAAAFbU/wy6eZpTvvFQ/s72-c/VisuMapAtlas.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;AkUMQX0zcCp7ImA9WxJVFEQ.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-3354639627353773759</id><published>2009-07-01T09:00:00.000-07:00</published><updated>2009-07-01T18:31:20.388-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-01T18:31:20.388-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="data clustering" /><category scheme="http://www.blogger.com/atom/ns#" term="self-oranizing map" /><category scheme="http://www.blogger.com/atom/ns#" term="self-organizing graph" /><title>Data Clustering with Self-Organizing Graph</title><content type="html">We have just released VisuMap version 2.7.832. In this release we have added a very interesting clustering service called the&lt;span style="font-style: italic;"&gt; self-organizing graph&lt;/span&gt; (SOG).  SOG is basically an extension to the self-ogranizing map (SOM) that simulates a homogeneous artificial neural network. Unlike SOM, SOG simulates a network of arbitrary structure. The network can be in fact any weighted undirected graph. The network is a kind of parameter for the algorithm: the user defines the network depending on his/her knowledge or interest about the dataset.&lt;br /&gt;&lt;br /&gt;The following picture illustrates an application of SOG. On the left side is a map of a dataset that shows roughly two data point clouds. The right side shows the network we used to classify the dataset. During the learning process each node of the network will become associated with some  data points. After the learning process has finished each data point will be classified as their corresponding nodes in the SOG network.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/Sku2NA14vQI/AAAAAAAAFKQ/aL0l7FLlsAA/s1600-h/SogOverview.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 146px;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/Sku2NA14vQI/AAAAAAAAFKQ/aL0l7FLlsAA/s400/SogOverview.PNG" alt="" id="BLOGGER_PHOTO_ID_5353572916740537602" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We see that the SOG algorithm not only correctly classified the two data point clouds, but also captured those data points located a the peripheries of the clouds (data points shown as blue triangles).&lt;br /&gt;&lt;br /&gt;The following is a short video tutorial for the SOG clustering service in VisuMap:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.visumap.net/rc2.aspx?f=VideoClips/SogClustering.htm&amp;amp;c=Video"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 252px;" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SkwM4iv-hRI/AAAAAAAAFKY/b63LP_HDZCQ/s400/SogVideo.PNG" alt="" id="BLOGGER_PHOTO_ID_5353668222576919826" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Comparing with traditional clustering algorithms, like k-Mean and hierarchical clustering algorithms, SOG provides more flexible clustering service. With SOG the user can not only specify the number of clusters, but also the size and other geometric or topological relationships between the clusters. In fact, SOG enables the user to specify a model to reflect existing knowledge and interests about the data.&lt;br /&gt;&lt;br /&gt;There are many variants of the self-organizing map that use irregular networks. Those algorithms often employ adaptive strategies to generate the irregular networks depending on the data. Those irregular network are results of those algorithms. Contrary to those algorithms SOG  uses the network as input for the algorithm, the user can express his/her interest and knowledge about the data through the network.&lt;br /&gt;&lt;br /&gt;SOG is our first attempt to provide a new kind of clustering service. While its framework resembles to provide general pattern matching capability. The learning algorithm it employed has only limited capability to capture structural information. In view of this there are still many challenges and potentials for SOG algorithm in the future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-3354639627353773759?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/3354639627353773759/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=3354639627353773759" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3354639627353773759?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3354639627353773759?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/07/data-clustering-with-self-organizing.html" title="Data Clustering with Self-Organizing Graph" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Wca44kNaYGE/Sku2NA14vQI/AAAAAAAAFKQ/aL0l7FLlsAA/s72-c/SogOverview.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CUEMSX8zcCp7ImA9WxVVGUU.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-3089278103497419156</id><published>2009-03-13T15:14:00.001-07:00</published><updated>2009-03-13T15:34:48.188-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-13T15:34:48.188-07:00</app:edited><title>Visualizing the real projecitve plane.</title><content type="html">I have just uploaded a video clip to YouTube that shows some ways to explorer the real projective plane with VisuMap. The link to this video is included here:&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/QQML-6hcqGc&amp;amp;hl=en&amp;amp;fs=1"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/QQML-6hcqGc&amp;amp;hl=en&amp;amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;This dataset contains 14,358 data points (4-dimensional vectors) sampled from the surface of RP2 embedded into  R&lt;sup&gt;4&lt;/sup&gt;. The analytical formula is given in the video clip. A smaller sample dataset in VisuMap format can be downloaded at &lt;a href="http://www.visumap.net/rc1.aspx?f=KleinBottle4D.xvmz&amp;amp;g=res"&gt;KleinBottle4D&lt;/a&gt; that also contains the dataset for the Klein-Bottle surface.&lt;br /&gt;&lt;br /&gt;It is interesting to notice that RPM algorithm flattens the 2D sphere by cutting it into two equal sized half; but it flattens the real projective plane in the way that the whole plane is kept in a single connected region.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-3089278103497419156?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/3089278103497419156/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=3089278103497419156" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3089278103497419156?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3089278103497419156?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/03/visualizing-real-projecitve-plane.html" title="Visualizing the real projecitve plane." /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;Ck4FRnwyfCp7ImA9WxNWGUs.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-3138520426595499866</id><published>2009-03-09T10:26:00.000-07:00</published><updated>2009-10-19T07:01:57.294-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-19T07:01:57.294-07:00</app:edited><title>Projective and Injective Exploration Methods</title><content type="html">In a recent &lt;a href="http://terrytao.wordpress.com/2009/01/26/245b-notes-6-duality-and-the-hahn-banach-theorem/"&gt;blog post&lt;/a&gt; Terence Tao has put mathematical approaches to study a space X (which can be a space, a manifold etc.) into two fundamental camps:&lt;br /&gt;&lt;br /&gt;1. By looking into a map &lt;span style="font-style: italic;"&gt;f&lt;/span&gt;: Y-&gt;X  that maps certain (better known) space Y into X.&lt;br /&gt;&lt;br /&gt;2. By looking into a map &lt;span style="font-style: italic;"&gt;f&lt;/span&gt;: X-&gt;Y that maps X into certain (better known) space Y.&lt;br /&gt;&lt;br /&gt;I think that these statements not only apply to mathematical studies, they also provide useful framework for the exploration of high dimensional data. In our case, the unknown space X becomes a high dimensional dataset that we wish to understand. The space Y is usually a lower dimensional space that we know better and is normally easy to visualize.&lt;br /&gt;&lt;br /&gt;For the sake of simplicity I would like to refer to methods from the above two camps as &lt;span style="font-style: italic;"&gt;injective &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;projective &lt;/span&gt;methods respectively. That means, injective methods inject known space into the target space, whereas the projective methods project the target space into a better known space.&lt;br /&gt;&lt;br /&gt;Following this point of view, algorithms to visualize high dimensional data can be grouped as injective or projective. Most MDS (&lt;a href="http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm"&gt;multidiemsnional scaling&lt;/a&gt;) based visualization algorithms are projective. Algorithm like Sammon map and PCA (principal component analysis) map the high dimensional dataset into a low dimensional Euclidean space. The RPM algorithm (&lt;a href="http://www.visumap.net/index.aspx?p=Resources/RpmOverview"&gt;relational perspective map&lt;/a&gt;) maps high dimensional to flat torus.&lt;br /&gt;&lt;br /&gt;On the other side, most clustering algorithms can be considered as injective algorithm. For instance, the method of vector quantization maps a finite discrete set into the target space, whereas the Kohonen net (or Self-organizing map) maps a discrete set with a mesh topology into the target space. The hierarchical clustering algorithms (like the agglomerative clustering algorithm) basically map binary tree structures into the target space.&lt;br /&gt;&lt;br /&gt;One major benefit of this classification of algorithms is that it can guide us to apply existing algorithms to new data, in the sense that algorithms from the same camp might be able to share some methods/concepts/tricks. For instance, we might be apply an optimization method used by one clustering algorithm to another injective algorithm.&lt;br /&gt;&lt;br /&gt;Furthermore, as be pointed out in Tao's blog post, there is certain "duality" between injective and projective methods. That means, for a given method in one camp you can sometimes find its corresponding counter part in the other camp. This duality can then help us to transfer a method from camp to anther one. As an example for this "duality", the RPM algorithm is basically a dual algorithm of the self-organizing map: they are in different camp, but are dual to each there as they share the flat torus as the image space (i.e. the space Y mentioned at the beginning of this post).&lt;br /&gt;&lt;br /&gt;Similar to the duality concept I have&lt;a href="http://jamesxli.blogspot.com/2007/10/symmetry-between-dimensionality.html"&gt; blogged before&lt;/a&gt; about the "symmetry" between these two camps in more restricted situations, namely when the datasets are multidimensional vector spaces (e.g. data tables). In these cases, we can often transfer a method from one camp to the another one in a straightforward way (e.g. by transposing the data matrix).&lt;br /&gt;&lt;br /&gt;The injective and projective algorithm are realized in VisuMap by clustering and mapping services, which form the two main groups of services in the VisuMap application.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-3138520426595499866?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/3138520426595499866/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=3138520426595499866" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3138520426595499866?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3138520426595499866?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/03/projective-and-injective-exploration.html" title="Projective and Injective Exploration Methods" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CEEFQ344fCp7ImA9WxVVEkU.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-147110495038344333</id><published>2009-03-05T09:09:00.000-08:00</published><updated>2009-03-05T11:50:12.034-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-05T11:50:12.034-08:00</app:edited><title>Dimension analysis on mulitdimensional scaling</title><content type="html">This post brings forward an inconsistency problem in the study of multidimensional scaling from the perspective of dimensional analysis. This post is rather of philosophical nature. Nevertheless, I find it is worthwhile being aware of this issue&lt;br /&gt;for further theoretical investigations.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Dimensional_analysis"&gt;Dimensional analysis&lt;/a&gt; is a simple and useful rule to check the consistence of a mathematical formula or equation. Roughly speaking, this rule demands that both sides of an equation (that has any physical or chemical bearings) must have the same dimension, or unit of measurement. So for instance the equation E = mC&lt;sup&gt;2&lt;/sup&gt; is consistent with the dimensional analysis, since both sides of the equation have &lt;span style="font-style: italic;"&gt;Energy&lt;/span&gt; as dimension. The equation E=mC is not consistent, since the right side would have &lt;span style="font-style: italic;"&gt;momentum&lt;/span&gt; as dimension.&lt;br /&gt;&lt;br /&gt;Similarly, if a formula is composed of multiple terms by addition, then all terms must have the same dimension. So for instance, the formula: S:=π·r&lt;sup&gt;2&lt;/sup&gt; + r&lt;sup&gt;2&lt;/sup&gt;/2 makes sense, but not S:=π·r&lt;sup&gt;2&lt;/sup&gt; + r/2.&lt;br /&gt;&lt;br /&gt;Now, let's turn to a problem of &lt;a href="http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm"&gt;multidimensional scaling&lt;/a&gt; (MDS) from the perspective of the dimensional analysis. Roughly speaking, MDS maps data points from an abstract space with a distance function to a low dimensional, normally, Euclidean space like R&lt;sup&gt;2&lt;/sup&gt;. The goal of a MDS algorithm is to construct the map in a way, so that the distance in the low dimensional space approximates the distance in the original abstract space.&lt;br /&gt;&lt;br /&gt;More formally, let D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; be the distance between two data points in the original space, d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; be the distance low dimensional space after MDS mapping; Then a MDS algorithm tries to minimize the following quantity (often called &lt;span style="font-style: italic;"&gt;stress&lt;/span&gt;):&lt;br /&gt;&lt;br /&gt;S := Σ ( D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; - d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;)&lt;sup&gt;2&lt;/sup&gt;&lt;br /&gt;&lt;br /&gt;Now, from the view point of dimensional analysis the stress function in its above form is &lt;span style="font-style: italic;"&gt;not &lt;/span&gt;consistent, since D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; and d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; have arguably different dimensions as they measure distance in different spaces.&lt;br /&gt;&lt;br /&gt;One work-around to solve this inconsistency issue is to add a special constant or apply a monotonous function on d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; like:&lt;br /&gt;&lt;br /&gt;S := Σ ( D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; - &lt;span style="font-style: italic;"&gt;C&lt;/span&gt;·d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;)&lt;sup&gt;2&lt;/sup&gt;   &amp;nbsp; &amp;nbsp;         or   &amp;nbsp; &amp;nbsp;        S := Σ ( D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; - &lt;span style="font-style: italic;"&gt;f&lt;/span&gt;(d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;))&lt;sup&gt;2&lt;/sup&gt;&lt;br /&gt;&lt;br /&gt;The constant &lt;span style="font-style: italic;"&gt;C&lt;/span&gt; or the function &lt;span style="font-style: italic;"&gt;f&lt;/span&gt;  should magically convert the dimension of d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; to that of D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;. This kind of ad-hoc techniques look pretty artificial. Most scientists probably won't like this method.&lt;br /&gt;&lt;br /&gt;It should be noticed that the RPM (relational perspecitve map) algorithm avoided this inconsistency by using the following stress function:&lt;br /&gt;&lt;br /&gt;S := Σ D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;/d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;&lt;br /&gt;&lt;br /&gt;The dimensional analysis rule does not prohibit composing terms of different dimensions by multiplication. In above formula, both sides of the equation have a completely new dimension, that is the dimension of  D&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt; divided by the dimension of d&lt;sub&gt;&lt;small&gt;ij&lt;/small&gt;&lt;/sub&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-147110495038344333?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/147110495038344333/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=147110495038344333" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/147110495038344333?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/147110495038344333?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/03/dimension-analysis-on-mulitdimensional.html" title="Dimension analysis on mulitdimensional scaling" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CUABR3c4cCp7ImA9WxVQEEw.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-4187922184435155092</id><published>2009-01-15T15:04:00.000-08:00</published><updated>2009-01-26T15:49:16.938-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-01-26T15:49:16.938-08:00</app:edited><title>Experiements with kernel PCA</title><content type="html">The newly released VisuMap software version 2.7 has added the kernel PCA (kPCA) service. From the user's perspective kPCA basically provides the same service as MDS methods, namely maping an abstract set of data points with a similarity distance to a multidimensional vector space, so that  the Euclidean distance in the dimensional space somehow reflect the original similarity distance. Thus, the input for kPCA is a dataset with a similarity distance (or matrix), the output is a table of numerical values with one row vector for each data point.&lt;br /&gt;&lt;br /&gt;As there are many effective analysis methods operating on dimensional vector space (like k-mean clustering, support vector machine), kPCA enables us, like MDS, to apply effective methods to a broader range of data.&lt;br /&gt;&lt;br /&gt;In order see how kPCA works, I created a dataset with about 3000 data points which form together a sphere in the 3D space.  With the Gaussian kernel (a way to measure similarity between data points) kPCA of VisuMap created a 50 dimensional datasets within about 100 seconds. That means, kPCA mapped a 3 D dataset into the 50 D space. The first 3 dimensions in the 50 D space basically mirror the original 3 dimensions.  In order to see how other dimensions look like, I fixed the x- and y-axises of a 3D view window to the first two dimensions, then  assign the z-axis to other dimensions one after the other while rotating the 3D view window.  Then following small video clip shows how the spherical dataset looks like with those extra dimensions:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="320" height="266" class="BLOG_video_class" id="BLOG_video-c65880e670073a4c" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.blogger.com/img/videoplayer.swf?videoUrl=http%3A%2F%2Fvp.video.google.com%2Fvideodownload%3Fversion%3D0%26secureurl%3DqAAAAHZQAKfu6jF-JfdYz_38VlhRIFTcND3iXgNxepItZoBmgfDRper5HhICTQqY89t2CAWRdH88cTuIwprOhaga3tJ7EtY4Yk-ucpqTVsGvpC9gjxaOOo7u9V1KFdmYOe2jh8H5FrvnXrnmSHPSCsVTjy3RlEKG6caY2xRqU6xwbcFHtytsRnZ0yr8i_9w1YtNjGSbSLArOalQqZOQOQQ4ijG0WNunIpKMswW1NqE0euQjR%26sigh%3D7V56YiB-amQuMFN60Ydw7A8PTz8%26begin%3D0%26len%3D86400000%26docid%3D0&amp;amp;nogvlm=1&amp;amp;thumbnailUrl=http%3A%2F%2Fvideo.google.com%2FThumbnailServer2%3Fapp%3Dblogger%26contentid%3Dc65880e670073a4c%26offsetms%3D5000%26itag%3Dw320%26sigh%3Dgt3Bi3BkNWQR--7ZpqFf5cWuJI0&amp;amp;messagesUrl=video.google.com%2FFlashUiStrings.xlb%3Fframe%3Dflashstrings%26hl%3Den"&gt;
&lt;param name="bgcolor" value="#FFFFFF"&gt;
&lt;embed width="320" height="266" src="http://www.blogger.com/img/videoplayer.swf?videoUrl=http%3A%2F%2Fvp.video.google.com%2Fvideodownload%3Fversion%3D0%26secureurl%3DqAAAAHZQAKfu6jF-JfdYz_38VlhRIFTcND3iXgNxepItZoBmgfDRper5HhICTQqY89t2CAWRdH88cTuIwprOhaga3tJ7EtY4Yk-ucpqTVsGvpC9gjxaOOo7u9V1KFdmYOe2jh8H5FrvnXrnmSHPSCsVTjy3RlEKG6caY2xRqU6xwbcFHtytsRnZ0yr8i_9w1YtNjGSbSLArOalQqZOQOQQ4ijG0WNunIpKMswW1NqE0euQjR%26sigh%3D7V56YiB-amQuMFN60Ydw7A8PTz8%26begin%3D0%26len%3D86400000%26docid%3D0&amp;amp;nogvlm=1&amp;amp;thumbnailUrl=http%3A%2F%2Fvideo.google.com%2FThumbnailServer2%3Fapp%3Dblogger%26contentid%3Dc65880e670073a4c%26offsetms%3D5000%26itag%3Dw320%26sigh%3Dgt3Bi3BkNWQR--7ZpqFf5cWuJI0&amp;amp;messagesUrl=video.google.com%2FFlashUiStrings.xlb%3Fframe%3Dflashstrings%26hl%3Den" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;
&lt;br /&gt;&lt;br /&gt;The first few seconds of the video shows how the first 3 dimension re-constructs the original sphere. Then, each time when the z-axis switched to another dimension, the sphere turned in to another geometrical shape. I am not sure how those geometrical shape get formed and how to take advantage of those new dimensions. But, those dimensions surely look interesting and a little mysterious.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-4187922184435155092?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="enclosure" type="video/mp4" href="http://www.blogger.com/video-play.mp4?contentId=c65880e670073a4c&amp;type=video%2Fmp4" length="0" /><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/4187922184435155092/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=4187922184435155092" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/4187922184435155092?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/4187922184435155092?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2009/01/experiements-with-kernel-pca.html" title="Experiements with kernel PCA" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;A0ADQ307cSp7ImA9WxRVF0s.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-111521978468497127</id><published>2008-10-30T10:19:00.000-07:00</published><updated>2008-11-15T09:22:52.309-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-15T09:22:52.309-08:00</app:edited><title>K-Center  and Affinity Propagation Clustering</title><content type="html">Recent release of VisuMap implements a new clustering algorithm called &lt;a href="http://www.psi.toronto.edu/affinitypropagation/"&gt;Affinity Propagation&lt;/a&gt;  (AP).  AP provides similar service as the metric sampling (MS) algorithm that has been available in VisuMap for a while. In this blog I will compare AP with MS from  practical point of view with concrete samples.&lt;br /&gt;&lt;br /&gt;First, a few words about the implementations of these algorithms. MS is basically a variation of the k-mean algorithm that uses the expectation maximization (EM) method to solve the k-medoid problem. Such algorithms are sometimes called k-center or k-medoid algorithm. Simple direct implementation of k-medoid algorithm often just find bad solutions (i.e. local minimums). People normally has to repeat the algorithm many many times to find an acceptable solution. The MS algorithm implemented in VisuMap uses a stochastic mechanism to avoid or reduce such local minimum problem.&lt;br /&gt;&lt;br /&gt;AP uses message passing mechanism to find representative exemplars within dataset with a similarity structure. AP is basically a deterministic algorithm, it does not require random initialization or randomness during the optimization process. However, AP algorithm often suffers from the oscillation problem in the way that the algorithm wanders around several local minimums for quite long time. To reduce the oscillation, we can use a large dumping factor as suggested in the original paper of AP. But a more efficient way seems to be using small randomness when assigning preferences to the data points. By default, the implementation of AP in VisuMap enables this randomness.&lt;br /&gt;&lt;br /&gt;Thus, both MS and AP in VisuMap are of stochasitc nature. I have applied the two algorithms each 500 times on the dataset &lt;a href="http://www.visumap.net/index.aspx?p=Resources/VisuMapDatasets"&gt;SP500  &lt;/a&gt;(each data point contains the weekly price history of a stock in a year) and got all together 1000 different clustering solutions (each solution is a selection of 25 exemplar points). Using the t-SNE algorithm I created the map of these 1000 solutions as shown in the following picture:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/SQo5_CuhJfI/AAAAAAAAEbM/qj0KGyKEYiw/s1600-h/SP500.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 256px;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/SQo5_CuhJfI/AAAAAAAAEbM/qj0KGyKEYiw/s400/SP500.png" alt="" id="BLOGGER_PHOTO_ID_5263082869762369010" border="0" /&gt;&lt;/a&gt;In above map each red/green dot represents a solution found by AP/MS algorithm respectively. As a special characteristic of t-SNE algorithm, identical solutions will be arranged as one more circles. That means, dots in a rectangle in above map represent acutally identical solutions.&lt;br /&gt;&lt;br /&gt;We notice first that the two algorithms found very different solutions as red and green dots are clearly separated from each other. There are only two small regions (marked by ovals) where the two algorithms found similar solutions. Secondly, we notice that MS found a large number of different solutions.  AP found just few different solutions. This means that the AP optimization process often converged to common solutions.&lt;br /&gt;&lt;br /&gt;The following table shows the statistics about the solutions found by AP and MS:&lt;center&gt;&lt;table style="border-width: 1px; width: 400px;" border="1" cellpadding="10" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="width: 140px;"&gt;&lt;/td&gt;&lt;td&gt;Metric&lt;br /&gt;Sampling&lt;/td&gt;&lt;td&gt;Affinity&lt;br /&gt;Propagation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Min Error&lt;/td&gt;&lt;td&gt;0.4972&lt;/td&gt;&lt;td&gt;0.4901&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Max Error&lt;/td&gt;&lt;td&gt;0.5058&lt;/td&gt;&lt;td&gt;0.5035&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Average Error&lt;/td&gt;&lt;td&gt;0.5004&lt;/td&gt;&lt;td&gt;0.4960&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Process Time&lt;br /&gt;in Seconds&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;4&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;In the above table the error of a solution is the average distance of the data points to their cluster center (i.e. exemplars). The average error is calculated over the 500 solutions found by each of the two algorithms. We can see that AP has found, on the average, better solutions than MS, but the improvement is only about 0.4%. So, it is fair to say that these two algorithms performed similarly with respect to this dataset.&lt;br /&gt;&lt;br /&gt;The situation changed a lot when I applied the two algorithms to a different dataset, namely the dataset &lt;a href="http://jamesxli.blogspot.com/2008/09/feature-selection-by-clustering.html"&gt;PharMine171 &lt;/a&gt; discussed in previous blog entry. This dataset contains about 1000 binary 171-dimensional vectors , whereas the SP500 dataset contains 500 40-dimensional vectors with real values. The following is the map of the clustering solutions and the corresponding statistics.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SQpI9JfnlWI/AAAAAAAAEbU/nLOK0ewA5bU/s1600-h/Pharm171.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 301px;" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SQpI9JfnlWI/AAAAAAAAEbU/nLOK0ewA5bU/s400/Pharm171.png" alt="" id="BLOGGER_PHOTO_ID_5263099329893602658" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;&lt;table style="border-width: 1px; width: 400px;" border="1" cellpadding="10" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="width: 140px;"&gt;&lt;/td&gt;&lt;td&gt;Metric&lt;br /&gt;Sampling&lt;/td&gt;&lt;td&gt;Affinity&lt;br /&gt;Propagation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Min Error&lt;/td&gt;&lt;td&gt;2.1775&lt;/td&gt;&lt;td&gt;2.1101&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Max Error&lt;/td&gt;&lt;td&gt;2.2321&lt;/td&gt;&lt;td&gt;2.1686&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Average Error&lt;/td&gt;&lt;td&gt;2.1972&lt;/td&gt;&lt;td&gt;2.1603&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Process Time&lt;br /&gt;in Seconds&lt;br /&gt;&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;From above map we can see that AP and MS still produced different set of solutions. AP also produced more different solutions than with the previous dataset. This means that, with respect to this dataset, AP is more sensitive to small variations/noise in the input data.  This phenomenon might be explained as following: Because this dataset contains binary vectors, many similarities between data points might have exact equal values. Those equivalence of similarity will be broken different by small variations in the input data. Since AP indeed uses operation like max() to distinguish even infinitesimal differences, small variations may lead to significantly different solutions.&lt;br /&gt;&lt;br /&gt;We also notice that AP is substantially slower than MS. For larger datasets, the performance difference will be even larger. For instance, for a dataset with about 4000 data points, MS is more than 100 times faster than AP. It should be pointed that both implementations of AP and MS uses the full similarity matrix. It remains to see how much speedup can be achieved by using sparse matrix.&lt;br /&gt;&lt;br /&gt;More interestingly we see that the solutions found by AP is about 3.7% better than those found by MS; and even the worst solution found by AP is better than the best solution found by MS.&lt;br /&gt;&lt;br /&gt;Now, one would ask the question: Does AP always find better solutions than MS? Unfortunately, this is not the case. The following picture shows the 3D presentation of the &lt;a href="http://www.visumap.net/index.aspx?p=Resources/VisuMapDatasets"&gt;spheroid &lt;/a&gt;dataset that contains 1825 data points forming together a spheroid:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/SQthz4_YK3I/AAAAAAAAEbc/g7_Z6cfPt_M/s1600-h/Sphorid.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 122px;" src="http://1.bp.blogspot.com/_Wca44kNaYGE/SQthz4_YK3I/AAAAAAAAEbc/g7_Z6cfPt_M/s400/Sphorid.png" alt="" id="BLOGGER_PHOTO_ID_5263408133611989874" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As be shown in the following test statistics, solutions found by MS is about 3.7% better than those found by AP:&lt;center&gt;&lt;table style="border-width: 1px; width: 400px;" border="1" cellpadding="10" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="width: 140px;"&gt;&lt;/td&gt;&lt;td&gt;Metric&lt;br /&gt;Sampling&lt;/td&gt;&lt;td&gt;Affinity&lt;br /&gt;Propagation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Min Error&lt;/td&gt;&lt;td&gt;5.9029&lt;/td&gt;&lt;td&gt;5.9029&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Max Error&lt;/td&gt;&lt;td&gt;6.4034&lt;/td&gt;&lt;td&gt;6.6483&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Average Error&lt;/td&gt;&lt;td&gt;6.0948&lt;/td&gt;&lt;td&gt;6.3199&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Process Time&lt;br /&gt;in Seconds&lt;/td&gt;&lt;td&gt;9&lt;/td&gt;&lt;td&gt;284&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;/center&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Conclusions: &lt;/span&gt;&lt;br /&gt;I believe AP is a useful and promising addition to the arsenal of clustering algorithms. Not only that AP can find better solutions, it also finds different solutions as existing algorithms. As we know that there are many different clustering problems. AP doesn't seem to be designed to directly attack the k-medoid problem, but nevertheless it is competitive to those algorithms directly targeting the k-medoid problem. This indicates that AP may have potential to attack a much larger class of clustering problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-111521978468497127?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/111521978468497127/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=111521978468497127" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/111521978468497127?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/111521978468497127?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/10/comparing-two-clustering-algorithms.html" title="K-Center  and Affinity Propagation Clustering" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Wca44kNaYGE/SQo5_CuhJfI/AAAAAAAAEbM/qj0KGyKEYiw/s72-c/SP500.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;DEAEQn8-fyp7ImA9WxRXFk8.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-985997992205248012</id><published>2008-10-21T12:39:00.000-07:00</published><updated>2008-10-21T15:05:03.157-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-10-21T15:05:03.157-07:00</app:edited><title>Clustering clusters</title><content type="html">Many clustering algorithms, like k-mean algorithm, are of stochastic nature in the sense that they produce different partitions (of a data set) depending on the random initialization.  In this blog I am going to use the mapping service provided by VisuMap to investigate the partitions generated by a particular clustering algorithm. In other word, I want to visualize the clusters among partitions generated by a clustering algorithm.&lt;br /&gt;&lt;br /&gt;The clustering algorithm I am going to look at is the &lt;span style="font-style: italic;"&gt;metrical sampling&lt;/span&gt; (MS) clustering algorithm which is basically a k-mean algorithm with some randomization extension. For a given set of data points with a distance matrix,  MS algorithm tries to find a set of sample points, so that minimizes the &lt;span style="font-style: italic;"&gt;average error&lt;/span&gt;, that is average distance of the data points to their nearest sample point (also called &lt;span style="font-style: italic;"&gt;exemplar&lt;/span&gt;).  Thus, MS algorithm is one of many available algorithms to solve the so called &lt;span style="font-style: italic;"&gt;k-medoid clustering&lt;/span&gt; problem (which is an NP complete problem according to certain literature).&lt;br /&gt;&lt;br /&gt;For a dataset of N data points, MS algorithm produces a N dimensional binary vectors. Each component is a binary value that decides whether a data point is selected as a sample point. So, when we run the MS algorithm many times with different random initializations, we will get a list of N dimensional binary vectors.  I am interested to see clusters or patterns among those binary vectors. For this study, I picked up the dataset &lt;a href="http://jamesxli.blogspot.com/2008/09/feature-selection-by-clustering.html"&gt;PharMine171&lt;/a&gt; that contains 1025 fingerprints of 171 pharmacologically interesting compounds. I have run MS algorithm on the dataset for 1000 times. The following is a map of the 1000 partitions ( produced with the t-SNE mapping algorithm):&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SP45btNRMUI/AAAAAAAADYI/9kxjSv159nU/s1600-h/PharmMine171.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SP45btNRMUI/AAAAAAAADYI/9kxjSv159nU/s400/PharmMine171.png" alt="" id="BLOGGER_PHOTO_ID_5259704562969817410" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In above map, each spot represents a partition generated by one run of MS algorithm. Closely located partitions means they share more sample points. The coloring is done manually to highlight the clusters. We can easily see that there are 8 or 9 clusters among the 1000 partitions.&lt;br /&gt;&lt;br /&gt;Now, one question that raises naturally is: Do the partitions in one cluster have similar qualities (i.e. average error)?  In order to answer this questions I opened a bar view window that displays the average error of each partition in an ordered bar view as follows:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Wca44kNaYGE/SP4-CQfpJFI/AAAAAAAADYQ/bYnRdrmAS-M/s1600-h/ErrorView.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SP4-CQfpJFI/AAAAAAAADYQ/bYnRdrmAS-M/s400/ErrorView.png" alt="" id="BLOGGER_PHOTO_ID_5259709623323665490" border="0" /&gt;&lt;/a&gt;In above picture, each bar (in yellow or red color) represents a partition. The height of the bar indicates their average error. The red bars represent those partitions selected manually on the partition maps. So, when we select a cluster in the partition map, we can easily see the corresponding distribution of average error of that clusters. The following two pictures show two such distributions:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/SP5HCg1C5vI/AAAAAAAADYY/5GWdHmlKyI4/s1600-h/ClusterErrors.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_Wca44kNaYGE/SP5HCg1C5vI/AAAAAAAADYY/5GWdHmlKyI4/s400/ClusterErrors.png" alt="" id="BLOGGER_PHOTO_ID_5259719523313051378" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SP5HcL3foYI/AAAAAAAADYg/jqDDKYa9Lu4/s1600-h/ClusterErrors2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SP5HcL3foYI/AAAAAAAADYg/jqDDKYa9Lu4/s400/ClusterErrors2.png" alt="" id="BLOGGER_PHOTO_ID_5259719964362776962" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We notice that the average error in a cluster is spread all over the whole error range (from very bad to very good). This means, similar partitions can have very different qualities. Thus, the answer to above questions is negative.&lt;br /&gt;&lt;br /&gt;More in general, I suppose this study indicates that any algorithm trying to find good solutions for the k-medoid problem need a mechanism to allow global variations in the searching space. The MS algorithm implemented in VisuMap does this through controlled randomness. Another interesting algorithm, named &lt;a href="http://www.psi.toronto.edu/affinitypropagation/"&gt;affinity propagation&lt;/a&gt; seems to do that with help of 2N&lt;sup&gt;2&lt;/sup&gt; addition system states (called messages). Affinity propagation is also supported in VisuMap. I will do some comparison between MS and AP in another blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-985997992205248012?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/985997992205248012/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=985997992205248012" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/985997992205248012?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/985997992205248012?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/10/clustering-clusters.html" title="Clustering clusters" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_Wca44kNaYGE/SP45btNRMUI/AAAAAAAADYI/9kxjSv159nU/s72-c/PharmMine171.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;DUIDSX49fCp7ImA9WxRSE0k.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-8577764662568854908</id><published>2008-09-13T08:09:00.000-07:00</published><updated>2008-09-13T16:12:58.064-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-09-13T16:12:58.064-07:00</app:edited><title>Feature selection by clustering</title><content type="html">I have&lt;a href="http://jamesxli.blogspot.com/2007/10/symmetry-between-dimensionality.html"&gt; blogged before&lt;/a&gt; that there is a "symmetry" between &lt;span style="font-style: italic;"&gt;data clustering&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;dimensionality reduction&lt;/span&gt;: whereas the former reduces the number of rows; the latter reduces the number of columns in a data table. Thus, we can theoretically convert any clustering algorithm to a dimensionality reduction method by transposing the data table, and vise verse.&lt;br /&gt;&lt;br /&gt;In this blog I am going to demonstrate how to use the &lt;span style="font-style: italic;"&gt;metric sampling&lt;/span&gt; algorithm to reduce dimensionality with a concrete &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;dataset&lt;/span&gt;.  My sample &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;dataset&lt;/span&gt; &lt;a href="http://www.visumap.net/Common/FetchDoc.aspx?f=PharMine171.xvmz&amp;amp;g=res"&gt;(&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;PharMine&lt;/span&gt;171)&lt;/a&gt; contains data for 171 &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;pharmacologically&lt;/span&gt; interesting compounds. Each data point contains 1025 binary vectors which indicate whether the compound contains particular fragments (ie. molecular sub-structurs). Thus, the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;dataset&lt;/span&gt; is a 171 x 1025 table with binary values and its dimension is 1025.&lt;br /&gt;&lt;br /&gt;In order the reduce the dimensionality we first transpose the table to get a 171 dimensional &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;dataset&lt;/span&gt; with 1025 data points (each data point now represents a fragment). In order to get feeling about the transposed &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;dataset&lt;/span&gt; I have created 2 dimensional map for it (with &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;VisuMap&lt;/span&gt;)  as be shown in the following picture:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBRywIMHzI/AAAAAAAADVM/SNaSOnGbzlA/s1600-h/FragmentsMap_tSNE.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBRywIMHzI/AAAAAAAADVM/SNaSOnGbzlA/s400/FragmentsMap_tSNE.PNG" alt="" id="BLOGGER_PHOTO_ID_5242279898613817138" border="0" /&gt;&lt;/a&gt;Map of 1025 binary features&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;Above map shows clearly some clusters among the 1025 data points. In order to find representative columns in our original &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;dataset&lt;/span&gt;, we need to find representative rows in the new data table. In order to do so, we apply the metric sampling algorithm (available in &lt;a href="http://www.visumap.net/index.aspx?p=Products"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;VisuMap&lt;/span&gt;&lt;/a&gt;) that selects representative subset of data points. I have configured the algorithm to select 25 data points. The following picture shows how the 25 data  points are distributed among the 1025 data points:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Wca44kNaYGE/SMvhMszfs0I/AAAAAAAADVs/Y3xaZbJ7kE8/s1600-h/Features25.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SMvhMszfs0I/AAAAAAAADVs/Y3xaZbJ7kE8/s400/Features25.png" alt="" id="BLOGGER_PHOTO_ID_5245533799304901442" border="0" /&gt;&lt;/a&gt;We can then export the 25 data points as a 25 x 171 table, and transpose the table to get an 171 x 25 data table. This table is then our new &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;dataset&lt;/span&gt; with reduced dimension.&lt;br /&gt;&lt;br /&gt;Now, how can we tell that the reduced &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_12"&gt;dataset&lt;/span&gt; approaches the original &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_13"&gt;dataset&lt;/span&gt;? We can do this pretty easily with &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_14"&gt;VisuMap&lt;/span&gt; by creating maps for both &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15"&gt;datasets&lt;/span&gt;. If the two &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_16"&gt;datasets&lt;/span&gt; contain about the same information, their maps should be visually similar to each other.  The following picture shows the map of the original &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_17"&gt;dataset&lt;/span&gt;  (on the left side) and of the reduced &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_18"&gt;dataset&lt;/span&gt; (on the right side). They are both created with the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_19"&gt;SMACOF&lt;/span&gt; mapping method. We see that the map of the reduced &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_20"&gt;dataset&lt;/span&gt; reveals clearly the three clusters as the original &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_21"&gt;dataset&lt;/span&gt;, but it shows less details within the clusters.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Wca44kNaYGE/SMwy9oemeSI/AAAAAAAADV0/88i8eXrAkZQ/s1600-h/MapTwoMaps.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SMwy9oemeSI/AAAAAAAADV0/88i8eXrAkZQ/s400/MapTwoMaps.png" alt="" id="BLOGGER_PHOTO_ID_5245623700398962978" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Notice that the &lt;span style="font-style: italic;"&gt;metric sampling&lt;/span&gt; algorithm implemented in &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_22"&gt;VisuMap&lt;/span&gt; is a variation of the widely known k-&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_23"&gt;medoid&lt;/span&gt; algorithm. The recent release of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_24"&gt;VisuMap&lt;/span&gt; (version 2.6.807) utilized  a randomization strategy in the optimization process that made the algorithm much more robust and less prone to local minimum problem; which has been a big issue for the k-&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_25"&gt;medoid&lt;/span&gt; algorithm.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-8577764662568854908?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/8577764662568854908/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=8577764662568854908" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/8577764662568854908?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/8577764662568854908?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/09/feature-selection-by-clustering.html" title="Feature selection by clustering" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBRywIMHzI/AAAAAAAADVM/SNaSOnGbzlA/s72-c/FragmentsMap_tSNE.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CkQBSX8yeSp7ImA9WxRSEU0.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1307854311663525506</id><published>2008-09-04T11:05:00.000-07:00</published><updated>2008-09-10T19:32:38.191-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-09-10T19:32:38.191-07:00</app:edited><title>On similarity metrics for chemical compounds</title><content type="html">Recently, Yap Chun Wei has posted a &lt;a href="http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/"&gt;dataset &lt;/a&gt;on the &lt;a href="http://voyagememoirs.com/pharmine/"&gt;pharmine&lt;/a&gt; blog. The dataset consists of fingerprints of 171 pharmacologically interesting compounds. Just to recapture,  the fingerprint of a compound is here a vector of 1025 binary flags, each flag in the vector indicates whether a particular molecular fragment is present in the compound. There are many ways to calculate fingerprints. Depending on the nature of the problem, you can use different algorithms or different collection of fragments. The mentioned dataset, for instance, used &lt;a href="http://openbabel.org/wiki/Main_Page"&gt;OpenBabel &lt;/a&gt;to calculate those fingerprints.&lt;br /&gt;&lt;br /&gt;The dataset constains 3 different groups of compounds:  penicillins, cephalosporins, fluoroquinolones. Using VisuMap Yap created different maps of these 171 compounds which showed, more or less, the cluster structure of the dataset. I personally find that the PCA map provides the best visualization. The following picture is a PCA map I created with VisuMap for this dataset:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/SMBEGqaAh5I/AAAAAAAADVE/rY-IvzzBMzo/s1600-h/PcaWithFrequency.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_Wca44kNaYGE/SMBEGqaAh5I/AAAAAAAADVE/rY-IvzzBMzo/s400/PcaWithFrequency.PNG" alt="" id="BLOGGER_PHOTO_ID_5242264847512536978" border="0" /&gt;&lt;/a&gt;Compound Map of 171 compounds&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;In above map, the 3 compound groups are displayed as glyphs in 3 different colors. The coloring are done manually with VisuMap based on known information about these groups. Although, for this dataset, you can get almost exactly these 3 clusters using the k-mean algorithm provided by VisuMap. The bar diagram in the picture shows the presence frequency of the 1025 fragments among the 171 selected compounds. That means, a higher bar indicates that a particular fragment is present in more compounds.&lt;br /&gt;&lt;br /&gt;The above map visualizes the similarity information between the 171 compounds. That means, closely located compounds will have similar fragment collections and therefore similar pharmacological  properties. The similarity information are basically encoded in the fingerprints. Thus, the method to calculate of those fingerprints is naturally critical for this kind of data analysis.&lt;br /&gt;&lt;br /&gt;In order to better understand those fingerprints we can created a map of those 1025 features with VisuMap.  In order to do so, we simply transpose the binary data table (via the menu Edit&gt;Filter Data&gt;Transpose Table in VisuMap), so that each binary feature becomes a new data point; and each compound become a feature in the transposed dataset. We can then pick a mapping algorithm and metrics to create a feature map. The following picture shows such a map created with the t-SNE algorithm and the tanimoto dissimilarity metric:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBRywIMHzI/AAAAAAAADVM/SNaSOnGbzlA/s1600-h/FragmentsMap_tSNE.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBRywIMHzI/AAAAAAAADVM/SNaSOnGbzlA/s400/FragmentsMap_tSNE.PNG" alt="" id="BLOGGER_PHOTO_ID_5242279898613817138" border="0" /&gt;&lt;/a&gt;Feature map of 1025 binary features&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;Above picture shows 4 or 5 clear clusters on the left side represented by colored glyphs. The rest are more or less randomly distributed. It turned out that those yellow-square features are those fragments which are NOT present in any of those 171 compounds (all bits are zero). Therefore, they carry no direct information about our compounds. Interestingly, these zero vectors form together a homogeneous cluster in the map.&lt;br /&gt;&lt;br /&gt;Other clusters in above map represent groups of fragments which have high frequency and are informative to distinguish the three compound groups in the original dataset. We can verify this with the help of the bar diagram in VisuMap as follows:&lt;br /&gt;&lt;br /&gt;We first open the feature map in a separate window (via the menu Tools&gt;Map Snapsot). Then open the compound map, and then select all compounds and open the bar view through the context menu "Bar View". The bar view by default displays the frequency in the order as given in the transposed data table. We then sort the bars through the context menu "Sort Values" so that bars are displayed in the order from low frequency to high frequency as depicted in the following two picture:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SMBaE1aKb1I/AAAAAAAADVc/34qF6pRx6YE/s1600-h/SortingBarView.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SMBaE1aKb1I/AAAAAAAADVc/34qF6pRx6YE/s400/SortingBarView.PNG" alt="" id="BLOGGER_PHOTO_ID_5242289005362048850" border="0" /&gt;&lt;/a&gt;The sorted bar view shows 3  plateaus which correspond to clusters in the feature map and in the compound maps. In order to see the correspondence we select a plateau in the bar view with the mouse, the snapshot window of the feature map will automatically high light those selected features. As we can see in the following picture, the selected plateau of features clearly correspond to a particular cluster in the feature map (the marked cluster at lower left corner).&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBUw8tALpI/AAAAAAAADVU/d72IQC_EU70/s1600-h/FragmentsMapWithFreq2.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/SMBUw8tALpI/AAAAAAAADVU/d72IQC_EU70/s400/FragmentsMapWithFreq2.PNG" alt="" id="BLOGGER_PHOTO_ID_5242283166164594322" border="0" /&gt;&lt;/a&gt;Correspondence between high frequency features and feature clusters&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;We notice that some of the clusters in the feature map show some fine sequential structure that may lead to more hints about the internal structure of the fragment collections.&lt;br /&gt;&lt;br /&gt;With the knowledge about informative feature clusters we can, for instance, reduce the number of features significantly without significant loss of information about the clusters in the original dataset. The VisuMap dataset folder &lt;a href="http://www.visumap.net/Common/FetchDoc.aspx?f=PharMine171.xvmz&amp;amp;g=res"&gt;PharMine171.xvmz (zipped XML file)&lt;/a&gt;  includes a reduced dataset with 298 features that characterizes similar similarity information as the original dataset with 1025 features.&lt;br /&gt;&lt;br /&gt;The above VisuMap dataset folder also contains feature maps created with other mapping algorithms. It is interesting to notice that for the feature map dataset, the t-SNE algorithm provides the best visualization, whereas the result of the PCA algorithm is rather disappointing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1307854311663525506?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1307854311663525506/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1307854311663525506" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1307854311663525506?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1307854311663525506?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/09/on-similarity-metrics-for-chemical.html" title="On similarity metrics for chemical compounds" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_Wca44kNaYGE/SMBEGqaAh5I/AAAAAAAADVE/rY-IvzzBMzo/s72-c/PcaWithFrequency.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;Dk4AR3g-fSp7ImA9WxdbFkU.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-2778740621244928743</id><published>2008-08-13T08:56:00.000-07:00</published><updated>2008-08-13T20:35:46.655-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-08-13T20:35:46.655-07:00</app:edited><title>New mapping algorithm t-SNE</title><content type="html">Recently I spent some time to experiment with the  t-SNE mapping algorithm proposed by L. Maaten and G. Hinton in &lt;a href="http://www.cs.unimaas.nl/l.vandermaaten/Laurens_van_der_Maaten/t-SNE_files/utml.pdf"&gt;Visualizing Data Using t-SNE.&lt;/a&gt; The algorithm is a variation of the &lt;a href="http://www.cs.toronto.edu/%7Eroweis/papers/sne_final.pdf"&gt;SNE&lt;/a&gt; algorithm developed few years ago.&lt;br /&gt;&lt;br /&gt;The t-SNE &lt;a href="http://www.cs.unimaas.nl/l.vandermaaten/Laurens_van_der_Maaten/t-SNE.html"&gt;home site&lt;/a&gt; provides source code, technical papers and many samples, so that it is pretty easy to test the algorithm. The matlab source code is provided as a component in a matlab toolbox that also implements many other popular methods for dimensionality reduction. The web site also provides a binary executable that is optimized for the Intel/Windows platform. It took me a while to figure out its binary data format, but the time spent is worthwhile, as it runs much faster than the matlab version.&lt;br /&gt;&lt;br /&gt;As demonstrated in its technical paper, t-SNE has the nice capability to show the cluster structure of high dimensional datasets in low dimensional maps. More interestingly, I noticed that t-SNE is able to segment dataset with complex structure in a natural way to preserve relevant information from certain perspective. For instance, the left map in the following picture shows a dataset that forms two crossing squares in 3D space, the map on the right side is a 2D map created with t-SNE. We notice that t-SNE segmented the connected shape into 5 clusters. The segmentation occurred roughly along the center line where the two squares crossing each other.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Wca44kNaYGE/SKNPBu_rpiI/AAAAAAAADUc/J1H1ZpCu3rE/s1600-h/TwoSquaresTSNE.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SKNPBu_rpiI/AAAAAAAADUc/J1H1ZpCu3rE/s400/TwoSquaresTSNE.png" alt="" id="BLOGGER_PHOTO_ID_5234114083148244514" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In general, I think there are two major challenges for mapping algorithms intended to visualize high dimensional complex data. The first one, I call it the &lt;span style="font-style: italic;"&gt;singularity problem&lt;/span&gt;, is present in the above sample. The dataset is mostly a 2D surface except the center region where the two square cross each other. In order to map this dataset into 2D map in non-overlapping manner, an algorithm has to find a non-trivial way to segment the dataset along the singularity region. t-SNE did a good job here compared to other algorithms like CCA (curvilinear component analysis) which segments the dataset rather randomly.&lt;br /&gt;&lt;br /&gt;The second challenge, I call it the &lt;span style="font-style: italic;"&gt;closedness problem&lt;/span&gt;, occurs when the dataset forms certain kind of closed manifold in high dimensional space, like sphere, torus (e.g. surfaces of 3D objects). In this case, a mapping algorithm needs to have a mechanism to cut-off the closed structure underlying the dataset. The t-SNE algorithm behaviors also pretty well in addressing  this kind of problems. The following video clip (produced with &lt;a href="http://www.visumap.net/index.aspx?p=Products"&gt;VisuMap 2.6.800&lt;/a&gt;) shows how t-SNE gradually deforms two overlapping spheres to a 2D map.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;object width="320" height="266" class="BLOG_video_class" id="BLOG_video-34311eaa011321ab" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.blogger.com/img/videoplayer.swf?videoUrl=http%3A%2F%2Fvp.video.google.com%2Fvideodownload%3Fversion%3D0%26secureurl%3DqAAAAKXn9zyzXTyW6NoE_4ojujoMAywBlu-_yp7AFFOVjSxrsIgE0oc_jXBBV722Y-FO70L3B2ZLXLotbEc_61c4UmdZ-PZOimRKpshXIDLZIy2oaRyC9AT0YHrn3ARs-5pLaLLV_Og0TnckGNvs7fR99Cotgv_UBQUNw_CfbGSwZFMGYPI1iFttK5V1J28ykdzBsWRmObMILHgxzo5Zdz6tH9AOfW4owksSkDuSZ8Fv_0Hm%26sigh%3DZYCJgxkc6HlVfXULkM8i9xWO8WQ%26begin%3D0%26len%3D86400000%26docid%3D0&amp;amp;nogvlm=1&amp;amp;thumbnailUrl=http%3A%2F%2Fvideo.google.com%2FThumbnailServer2%3Fapp%3Dblogger%26contentid%3D34311eaa011321ab%26offsetms%3D5000%26itag%3Dw320%26sigh%3Dyo_0t7M-Owfz1eKrDJm__zUSzDE&amp;amp;messagesUrl=video.google.com%2FFlashUiStrings.xlb%3Fframe%3Dflashstrings%26hl%3Den"&gt;
&lt;param name="bgcolor" value="#FFFFFF"&gt;
&lt;embed width="320" height="266" src="http://www.blogger.com/img/videoplayer.swf?videoUrl=http%3A%2F%2Fvp.video.google.com%2Fvideodownload%3Fversion%3D0%26secureurl%3DqAAAAKXn9zyzXTyW6NoE_4ojujoMAywBlu-_yp7AFFOVjSxrsIgE0oc_jXBBV722Y-FO70L3B2ZLXLotbEc_61c4UmdZ-PZOimRKpshXIDLZIy2oaRyC9AT0YHrn3ARs-5pLaLLV_Og0TnckGNvs7fR99Cotgv_UBQUNw_CfbGSwZFMGYPI1iFttK5V1J28ykdzBsWRmObMILHgxzo5Zdz6tH9AOfW4owksSkDuSZ8Fv_0Hm%26sigh%3DZYCJgxkc6HlVfXULkM8i9xWO8WQ%26begin%3D0%26len%3D86400000%26docid%3D0&amp;amp;nogvlm=1&amp;amp;thumbnailUrl=http%3A%2F%2Fvideo.google.com%2FThumbnailServer2%3Fapp%3Dblogger%26contentid%3D34311eaa011321ab%26offsetms%3D5000%26itag%3Dw320%26sigh%3Dyo_0t7M-Owfz1eKrDJm__zUSzDE&amp;amp;messagesUrl=video.google.com%2FFlashUiStrings.xlb%3Fframe%3Dflashstrings%26hl%3Den" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;
&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;We notice that t-SNE first captures the global structure of dataset (like the Sammon algorithm  or PCA), then deforms the map locally to achieve non-overlapping mappings. This kind of dynamical behavior is quite similar as the CCA algorithm.&lt;br /&gt;&lt;br /&gt;The implementation of t-SNE is pretty straightforward, and it can be easily tuned for a wide range of data. Even with its current simple implementation, some of its features already surpassed many other popular mapping algorithms. So, I believe it will become a valuable tool for analyzing high dimensional data.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-2778740621244928743?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="enclosure" type="video/mp4" href="http://www.blogger.com/video-play.mp4?contentId=34311eaa011321ab&amp;type=video%2Fmp4" length="0" /><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/2778740621244928743/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=2778740621244928743" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2778740621244928743?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2778740621244928743?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/08/new-mapping-algorithm-t-sne.html" title="New mapping algorithm t-SNE" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_Wca44kNaYGE/SKNPBu_rpiI/AAAAAAAADUc/J1H1ZpCu3rE/s72-c/TwoSquaresTSNE.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0YHQ38zfSp7ImA9WxRVFUs.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1650690012226936125</id><published>2008-04-12T20:24:00.000-07:00</published><updated>2008-11-12T23:25:32.185-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-12T23:25:32.185-08:00</app:edited><title>Villarceau Circles and Diagonal Rotation of RPM Map</title><content type="html">It has been known for long time that there are 4 perfect circles passing every points on the a torus surface. We can get two of these circles buy cutting the torus vertically and horizontally through that points. These two circles are the normal circles most people can easily visualize. The other two, not so apparent, circles are the so-called Villarceau circles. They are made by cutting the torus diagonally as depicted in the following animation (IE7 users need to enable the flag Tools&gt;Internet Options&gt;Advanced&gt;"Play animations in web pages" to see the animation):&lt;br /&gt;&lt;div style="TEXT-ALIGN: center"&gt;&lt;img alt="Villarceau Circles Animation" src="http://upload.wikimedia.org/wikipedia/commons/f/f1/Villarceau_circles.gif" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;center&gt;Figure 1: Creating the Villarceau Circles on a torus.&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;Above animation is very helpful to visualize the Villarceau Circles. Another simple way to show the Villarceau circles is using the &lt;span style="FONT-STYLE: italic"&gt;fundamental polygon &lt;/span&gt;representation where a torus is represented as a rectangle whose opposite edges are glued together. In the polygon represetnation, the horizontal and vertical edges are the two normal circles (horizontal and vertical circles). The two diagonals are the Villarceau circles as depicted in the following diagram:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJdcZhbmsI/AAAAAAAADGI/zutNJHUFtqE/s1600-h/FourCirclesOnTorus.PNG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188812463153060546" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJdcZhbmsI/AAAAAAAADGI/zutNJHUFtqE/s400/FourCirclesOnTorus.PNG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 2: Four circles on a torus passing through a point.&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;We know that RPM algorithm simulates a multi-particle dynamic system confined on the flat torus surface. For some unknown reasons, RPM maps often form symmetry patterns alone the diagonal lines, e.g. along the Villarcea circles. For instance, when we map the sphere to the tours, RPM algorithm cuts the sphere into two equal sized half and map them as two discs to the torus surface. The two discs are positioned alone a a diagonal line as depicted in the following picture:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Wca44kNaYGE/SAGNLJhbmgI/AAAAAAAADEs/IiQYwBQgq4Q/s1600-h/SphereOnTorus.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188583468381739522" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SAGNLJhbmgI/AAAAAAAADEs/IiQYwBQgq4Q/s400/SphereOnTorus.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 3: RPM Map of a sphere&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;We see in above picture that only one disc is completely displayed as a whole, the other one has been cut by the edges into 3 or 4 pices (which will form a complete disc when we glue the opposite edge together). The whole picture is more or less symmetric with respect to the left-up-to-right-bottom diagonal line. We can get a better visualization of above image when we tile multiple copies of above map together as be show in the following picture:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SAJapJhbmrI/AAAAAAAADGA/y69zzO17Mos/s1600-h/TilingSpheremap2.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188809383661509298" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SAJapJhbmrI/AAAAAAAADGA/y69zzO17Mos/s400/TilingSpheremap2.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 4: Tiling duplicates of Figure 3.&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;Now, it would be helpful for the purpose of visualization if we can show the two discs as whole discs in a single map without using the tiling techniques. We can achieve this by transforming the coordinator from the two vertical and horizontal edges to the two diagonals (i.e. the two Villarceau circles). The transformation is schematically depicted in the following diagram:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJe9ZhbmtI/AAAAAAAADGQ/5nknfbzIdCQ/s1600-h/DiagonalRotation.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188814129600371410" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJe9ZhbmtI/AAAAAAAADGQ/5nknfbzIdCQ/s400/DiagonalRotation.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 5: Rotation to the diagonal circles.&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;We call the above transformation the &lt;span style="FONT-STYLE: italic"&gt;diagonal rotation&lt;/span&gt; as it basically rotate the coordinator to the diagonals. Formally, the diagonal rotation is done by the following forms:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SAJGOJhbmlI/AAAAAAAADFU/8bvzdHiQMvg/s1600-h/DiagonalRotation2.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188786929572485714" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SAJGOJhbmlI/AAAAAAAADFU/8bvzdHiQMvg/s400/DiagonalRotation2.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In above forms &lt;span style="FONT-STYLE: italic"&gt;w &lt;/span&gt;and &lt;span style="FONT-STYLE: italic"&gt;h &lt;/span&gt;are the width and height of the fundamental rectangle respectively. &lt;span style="FONT-STYLE: italic"&gt;d &lt;/span&gt;is the length of its diagonal. The above forms transforms points from the region D to D'. For points from other regions, we have to apply some simple shifting after the transformation as indicated in Figure 5.&lt;br /&gt;&lt;br /&gt;As indicated in Figure 5, the two diagonals will become the edges of the transformed map. Thus, in order to avoid the resulting map be cut off by the edges we should shift a RPM map away from the diagonals through torus cyclical shifting. For our sphere example we can shift the map, for instance, to the following position:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/SAJTyJhbmmI/AAAAAAAADFc/MxhXy4Ym94A/s1600-h/SphereOnTorusShifted.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188801841698937442" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_Wca44kNaYGE/SAJTyJhbmmI/AAAAAAAADFc/MxhXy4Ym94A/s400/SphereOnTorusShifted.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 6: Sphere map shifted off from the diagonal lines.&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;When we now apply the diagonal rotation on the map in Figure 6 we will get the following rotated map:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJVxZhbmoI/AAAAAAAADFs/lWpJOmVoOig/s1600-h/SphereRotated.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188804027837291138" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJVxZhbmoI/AAAAAAAADFs/lWpJOmVoOig/s400/SphereRotated.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 7: The sphere map after diagonal rotation.&lt;/center&gt;&lt;br /&gt;Comparing with the original RPM map in Figure 3, the rotated map in figure 7 is obviously easier to investigate for human eyes.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It should be pointed out that a RPM map after diagonal rotation is no long a torus map in the sense that you cannot tile the map to fill 2D space the same way as be shown in Figure 4. Since this will break the edge-connectivity as indicated in Figure 5. But you can fill the 2D space with the rotated RPM map like building a bridge wall: you tile the map site by site to build a continuous lay, then build next lay the same while shift the lay to the left (or right) for half block. The following picture depicts this characteristics:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Wca44kNaYGE/SAJZN5hbmqI/AAAAAAAADF4/IsQeKtD-sIM/s1600-h/TilingRotatedMap.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5188807815998446242" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: pointer; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_Wca44kNaYGE/SAJZN5hbmqI/AAAAAAAADF4/IsQeKtD-sIM/s400/TilingRotatedMap.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt;Figure 8: Tiling with rotated map.&lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1650690012226936125?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1650690012226936125/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1650690012226936125" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1650690012226936125?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1650690012226936125?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/04/villarceau-circles-and-diagonal.html" title="Villarceau Circles and Diagonal Rotation of RPM Map" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_Wca44kNaYGE/SAJdcZhbmsI/AAAAAAAADGI/zutNJHUFtqE/s72-c/FourCirclesOnTorus.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;D08NQX49cSp7ImA9WxZUFE0.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-3864716895589242483</id><published>2008-04-04T17:00:00.001-07:00</published><updated>2008-04-05T07:31:30.069-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-04-05T07:31:30.069-07:00</app:edited><title>The Equivalence of Cooling and Expanding Dynamics</title><content type="html">The RPM (relational perspective map) algorithm has a seemly add-&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;hoc&lt;/span&gt; measure that gradually decreases the learning rate to force the convergence of the algorithm. I have tried to avoid such measure with various methods (e.g. conjugated algorithm exploring second order gradient), but was not able to find anything working.&lt;br /&gt;&lt;br /&gt;I have notice a significant problem here: In the original Newton-&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;Raphson&lt;/span&gt; algorithm, the algorithm should converge to a local minimum if the learning rate is sufficiently small. But, this not so for the optimization problem of RPM. The algorithm never converges, no matter how small the learning rate is (in a way that is doable with double precision arithmetic). If I keep the learning rate constant at certain small value, the simulated dynamic system goes in to an equilibrium state with constant variance.&lt;br /&gt;&lt;br /&gt;Thus, it looks like NR method, as it is, does not work for massively complex dynamic system, it is not stable enough. Letting the learning rate gradually vanish seems to be the simplest method to enforce the convergence.&lt;br /&gt;&lt;br /&gt;Mathematically, a very fine granulated gradient descent algorithm should be able to find a local minimum. But, I think this is practically not possible. To understand this, we notice that the learning rate practically controls the average variance of the positions of the simulated particles. That means, the learning rate is a kind of "temperature" of the system. In physics, we know that if we keep the temperature constant, the system will never converge to a frozen state. We can only reach that state when we gradually reduce the temperature to zero.&lt;br /&gt;&lt;br /&gt;It is interesting to notice that the cooling dynamic system simulated by RPM is theoretically equivalent to a n-body dynamic system that is ever expanding. To see this, we notice that the learning rate is directly linked to the average variance of the particle's positions. Thus, instead of gradually lowering learning rate, we can just reduce the scale of unit length of the space compared to total size of the space (e.g. the size of the torus). The latter is again equivalent to a system in which we keep the scale of unit length constant, but gradually expand the size of the whole space.&lt;br /&gt;&lt;br /&gt;We see here certain similarity of RPM model with the inflationary big-bang model from cosmology. Whereas RPM keeps the total size of the system stationary and gradually reduces the energy, the big-bang model keeps the total energy constant but let the universe expands. It is probably not a too crazy idea to consider a dual big-bang model that keeps the apparent universe stationary, but magically reduces the total energy/mass/temperature.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-3864716895589242483?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/3864716895589242483/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=3864716895589242483" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3864716895589242483?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3864716895589242483?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/04/equivalence-of-cooling-and-expanding.html" title="The Equivalence of Cooling and Expanding Dynamics" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CUAAQH48fip7ImA9WxRaEkQ.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1447266590970600349</id><published>2008-03-10T08:28:00.000-07:00</published><updated>2008-12-14T14:29:01.076-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-12-14T14:29:01.076-08:00</app:edited><title>The "Big Three"  Data Categories</title><content type="html">Data analysis is an essential part of science and engineering. As a guide to deal with various data analysis tasks, I find it helpful to group data into three basic categories:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Multidimensional tabular data (e.g . tables of numbers and text).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Time series data (e.g. image, audio/video  data).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Graphics oriented data (e.g. tree and network structured data).&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;All data analysis methods can be grouped in to three categories depending on which data category they target. So, for instance, wavelet transformation is a method for time series; whereas MDS are methods for multidimensional data. Context free grammar and text parsers are mainly tools for graphics oriented data.&lt;br /&gt;&lt;br /&gt;Furthermore, all software applications can also be grouped in to three corresponding categories. A software package that supports multiple data categories (like java system, Matlab) is either a generic tool that requires further software for end-users; or it is hopelessly difficult to design, implement, understand and to use.&lt;br /&gt;&lt;br /&gt;When I encounter a new data analysis problem I first identify its category, then I start to search for methods and software packages targeting that category. I would not try, for instance, to use MDS or SQL to analyze images or video/audio data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1447266590970600349?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1447266590970600349/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1447266590970600349" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1447266590970600349?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1447266590970600349?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/03/big-three-data-categories.html" title="The &quot;Big Three&quot;  Data Categories" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0YHQ309eCp7ImA9WxRVFUs.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1456336903071750132</id><published>2008-02-21T20:26:00.000-08:00</published><updated>2008-11-12T23:25:32.360-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-12T23:25:32.360-08:00</app:edited><title>VisuMap 2.6 Released</title><content type="html">We have released &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;VisuMap&lt;/span&gt; 2.6 today. This release includes some significant internal architecture change that may simplify future development.&lt;br /&gt;&lt;br /&gt;We have considered implementing &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;DirectX&lt;/span&gt; based code with &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;WPF&lt;/span&gt; or &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;XNA&lt;/span&gt;, but both libraries have turned out to be &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_4"&gt;inadequate&lt;/span&gt; for our purpose. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;WPF&lt;/span&gt; does not support sprite; whereas &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;XNA&lt;/span&gt; requires &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;shader&lt;/span&gt; 1.1 as a minimum that is too much for our customers.&lt;br /&gt;&lt;br /&gt;Apart from many accumulative improvements we have implemented a new data view, called mountain view, for high dimensional data. This feature can be considered as 3D extension of the existing value diagram. Mountain view also uses &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;DirectX&lt;/span&gt; for fast navigation. The following is snapshot of the mountain view window:&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_Wca44kNaYGE/R778d2h89eI/AAAAAAAADAA/BvEYUBZqhwc/s1600-h/MountainView.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5169847012052235746" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_Wca44kNaYGE/R778d2h89eI/AAAAAAAADAA/BvEYUBZqhwc/s400/MountainView.png" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1456336903071750132?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1456336903071750132/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1456336903071750132" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1456336903071750132?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1456336903071750132?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/02/visumap-26-released.html" title="VisuMap 2.6 Released" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_Wca44kNaYGE/R778d2h89eI/AAAAAAAADAA/BvEYUBZqhwc/s72-c/MountainView.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CEUESHc5fCp7ImA9WB9aGE8.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1919254667779517703</id><published>2008-01-08T09:27:00.000-08:00</published><updated>2008-01-08T11:56:49.924-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-01-08T11:56:49.924-08:00</app:edited><title>P. Dirac about geometric and algebraic way.</title><content type="html">Recently, during my search for information about projective space I bumped into an interesting script of P. Dirac with title "&lt;a href="http://www.atomicprecision.com/Other/Paul%20Dirac%20Talk%20-%20Projective%20Geometry%20(2).pdf"&gt;Projective Geometry, Origin of Quantum Equations&lt;/a&gt;".&lt;br /&gt;The script is made from a talk Dirac gave in 1972.  He seemed to talk to a general public, so the talk was rather inform.&lt;br /&gt;&lt;br /&gt;In his talk Dirac briefly described projective geometry and argued that projective geometry is more appropriate than Euclidean geometry as a mathematical structure for quantum theory. I was not able to really understand the link between projective geometry and quantum theory, but I believe his view was fundamentally correct in theoretical physics.&lt;br /&gt;&lt;br /&gt;What has interested me the most was his philosophical comments about geometry and algebra. In mathematical works you will, according to Dirac, either prefer the algebraic way or the geometrical way. The algebraical way is more &lt;em&gt;deductive&lt;/em&gt; where you start with equations or axioms and follow rules to get to interesting results. The geometrical way is more &lt;em&gt;inductive &lt;/em&gt;where you start with some concrete pictures and try to find relationship among the pictures.&lt;br /&gt;&lt;br /&gt;Dirac put himself into the geometrical camp. But he lamented that his works rather appear more algebraic (e.g. lots of equations) although he used a lot of geometrical methods. The reason for this is that it was pretty awkward to produce and print pictures in published papers.  Writing equations was just much easier for him (and probably most other theorists in his time) than drawing pictures. &lt;em&gt;So, it was rather a technique limit that led to heavy algebraic appearance of his works. &lt;/em&gt;&lt;br /&gt;&lt;br /&gt;That technique limit seems to disappear with the availability of modern computers. I am wandering what impact could it have had if the modern computers were available to those great scientists. On the other hand, in the current academic world the algebraic way still seems to dominate the stage: a paper with a lot of differential equations would be considered more scientific than a paper with a lot pictures.&lt;br /&gt;&lt;br /&gt;The mission statement of VisuMap Technologies is &lt;em&gt;unleashing&lt;/em&gt; &lt;em&gt;human visualization power for complex high dimensional data&lt;/em&gt;. In order to achieve this we not only need new software, new methodologies and theories, but also people who embrace geometrical way of research. So, there is still a long way to go for us.&lt;br /&gt;&lt;br /&gt;I feel pleased that Euclidean space has been considered inadequate by Dirac for quantum theory. Since I have felt for long time that the open Euclidean space is not appropriate for generic study of high dimensional data. For instance, some concepts like left/right, centers/peripherals (which make sense in Euclidean space) can not be applied intuitively to high dimensional data. For this reason, I have called RPM (relational perspective map) as &lt;em&gt;MDS on closed manifolds.&lt;/em&gt; I have looked in to several low dimensional manifolds as our image space (like sphere, torus and real projective space). Our long term plan is to evolve the image space to more expressive structures to visualize high dimensional data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1919254667779517703?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1919254667779517703/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1919254667779517703" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1919254667779517703?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1919254667779517703?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2008/01/p-dirac-about-geometric-and-algebraic.html" title="P. Dirac about geometric and algebraic way." /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0UCQ3Y4fCp7ImA9WB9QGEk.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-3108736736472359670</id><published>2007-10-30T09:14:00.000-07:00</published><updated>2007-10-31T07:34:22.834-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2007-10-31T07:34:22.834-07:00</app:edited><title>Transformation between rectangle and torus</title><content type="html">I have found today a nice video clip from &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;YouTube&lt;/span&gt; that visualizes the transformation between rectangle and torus surface. Torus surface is the base information space used by relational perspective map (RPM). This is main characteristics of RPM that distinguish RPM from other conventional mapping methods which use open Euclidean space as image space.&lt;br /&gt;&lt;br /&gt;There is also good clips showing the construction of &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_1"&gt;Klein&lt;/span&gt;-Bottle. But I could not find a good clip for the construction of real projective space. In order to visualize projective space and other more complex 2/3-D manifolds we probably need to partition the objects in some intuitive way.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object height="355" width="425"&gt;&lt;param name="movie" value="http://www.youtube.com/v/0H5_h-RB0T8"&gt;&lt;param name="wmode" value="transparent"&gt;&lt;embed src="http://www.youtube.com/v/0H5_h-RB0T8" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-3108736736472359670?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/3108736736472359670/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=3108736736472359670" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3108736736472359670?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/3108736736472359670?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/10/transformation-between-rectangle-and.html" title="Transformation between rectangle and torus" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry gd:etag="W/&quot;CkQHRH0-cSp7ImA9WB9QFkU.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-2325870628772144255</id><published>2007-10-29T08:44:00.000-07:00</published><updated>2007-10-29T10:52:15.359-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2007-10-29T10:52:15.359-07:00</app:edited><title>New paper comparing CCA and Manifold learning arlgorithms</title><content type="html">&lt;div&gt;The journal &lt;em&gt;Information Visualization &lt;/em&gt;has recently published a very interesting paper of J. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Venna&lt;/span&gt; and S. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;Kaski&lt;/span&gt; with the title "&lt;a href="http://www.palgrave-journals.com/ivs/journal/v6/n2/abs/9500153a.html"&gt;Comparison of Visualization methods for an atlas of gene expression data sets&lt;/a&gt;".&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;This paper has compared the &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_2"&gt;performance&lt;/span&gt; of many algorithms for mapping &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_3"&gt;various&lt;/span&gt; kind of data sets. Algorithms considered include: &lt;strong&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;PCA&lt;/span&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;LLE&lt;/span&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;Laplacian&lt;/span&gt;&lt;/strong&gt; &lt;strong&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;Eigenmap&lt;/span&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;Isomap&lt;/span&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;CCA&lt;/span&gt;&lt;/strong&gt;. The performance comparison are done with diagram of trustworthiness and continuity which visualize, like Shepard diagram, the discrepancies between input- and output-distance matrices.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;The advantage of the trustworthiness and continuity diagrams over the Shepard diagram is that they aggregate the discrepancy information uniformly over all data points, so that you get a single curve to show the quality of a map. With Shepard diagram you get a curve for each data point. Thus, trustworthiness/continuity diagrams are much easier to apply in practice. On the other side, the Shepard diagram provides more detailed information that allows, for instance, users to investigate mapping quality with respect to individual data points (not just the whole map).&lt;/div&gt;&lt;br /&gt;Whereas those &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_10"&gt;diagrams&lt;/span&gt; provide objective measures for mapping quality, I think they should be used with care. They may not always reflect the &lt;em&gt;subjective&lt;/em&gt; mapping quality perceived by human, and the &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_11"&gt;ultimately&lt;/span&gt; goal should be helping people not machines. Blindly trusting these numbers might discourage development of new useful algorithms. One main problem with these diagrams is, for instance, they don't have the concept of partition. Algorithms (like RPM) which simplify data by partitioning (apart from dimensionality reduction) are &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_12"&gt;greatly&lt;/span&gt; &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_13"&gt;penalized&lt;/span&gt;.  Partition, as a perception method, is probably as &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_14"&gt;fundamentally&lt;/span&gt; as focusing-by-proximity.&lt;p&gt; &lt;/p&gt;A main message of this paper is that &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15"&gt;CCA&lt;/span&gt; &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_16"&gt;algorithm&lt;/span&gt; clearly and significantly out-performed other algorithms based on explicit unfolding.  Our experience supports this assessment. We have not encountered a single data set that &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_17"&gt;CCA&lt;/span&gt; performed &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_18"&gt;noticeably&lt;/span&gt; worse than algorithms like &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_19"&gt;LLE&lt;/span&gt; and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_20"&gt;Isomap&lt;/span&gt;. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_21"&gt;Sammon&lt;/span&gt; map and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_22"&gt;PCA&lt;/span&gt; cannot be compared directly with &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_23"&gt;CCA&lt;/span&gt; as they &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_24"&gt;preserve&lt;/span&gt; long distance information and visualize the over-all &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_25"&gt;structure&lt;/span&gt; of the &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_26"&gt;data set&lt;/span&gt; (instead of unfolding no-linear structure).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-2325870628772144255?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/2325870628772144255/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=2325870628772144255" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2325870628772144255?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2325870628772144255?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/10/new-paper-comparing-cca-and-manifold.html" title="New paper comparing CCA and Manifold learning arlgorithms" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;A0IMQn0_fSp7ImA9WB9RFUs.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-21080784104281698</id><published>2007-10-16T13:21:00.000-07:00</published><updated>2007-10-16T14:19:43.345-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2007-10-16T14:19:43.345-07:00</app:edited><title>Symmetry between dimensionality reduction and data clustering</title><content type="html">Dimensionality reduction and data clustering are two main types of services offered by &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;VisuMap&lt;/span&gt; to support visual exploration of high dimensional data. Although there was no explicit plan in the initial design of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;VisuMap&lt;/span&gt; to develop these two types of services in parallel, they have evolved nicely together in the software architecture.&lt;br /&gt;&lt;br /&gt;We might argue that the reason for the nice coexistence are because they are both &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_3"&gt;indispensable&lt;/span&gt; to explore high dimensional no-linear data. But there is another more profound reason for this: namely, they are conceptually symmetry to each other.&lt;br /&gt;&lt;br /&gt;In order to see this "symmetry" let us consider a high dimensional &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;dataset&lt;/span&gt; as a data table with rows and columns. The number of columns is normally also considered as the dimension of the data. A dimensionality reduction method basically tries to reduce the number of columns without losing much relevant information. On the other side, a clustering algorithm tries to group similar rows together so that the clusters preserve as much as possible information. In other words, the purpose of clustering algorithm is to reduce the number rows by replacing them with clusters.&lt;br /&gt;&lt;br /&gt;In terms of "symmetry" we can say, dimensionality reduction algorithms reduce the number of columns whereas clustering algorithms reduce the number rows.&lt;br /&gt;&lt;br /&gt;So, what does this symmetry brings us? One application of this symmetry is that we can transfer any clustering &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_6"&gt;algorithm&lt;/span&gt; to a dimensionality reduction method (and vice &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;versa&lt;/span&gt;) by transposing the data table as a matrix. For instance, we can apply a clustering method on the columns of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;dataset&lt;/span&gt; to get three clusters and use the centroids of the clusters as 3D coordinates for the rows. By doing so we reduce the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;dataset's&lt;/span&gt; dimension to 3.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-21080784104281698?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/21080784104281698/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=21080784104281698" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/21080784104281698?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/21080784104281698?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/10/symmetry-between-dimensionality.html" title="Symmetry between dimensionality reduction and data clustering" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0YHQ3c5cCp7ImA9WxRVFUs.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-8484262497909834449</id><published>2007-08-31T18:57:00.000-07:00</published><updated>2008-11-12T23:25:32.928-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-12T23:25:32.928-08:00</app:edited><title>The null-dataset</title><content type="html">&lt;div&gt;When we analyze a high dimensional dataset the first question is often: &lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;em&gt;Does the dataset contain useful information at all? &lt;/em&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;A simple way to answer this questions is as follows: We first construct a so called&lt;em&gt; null-dataset&lt;/em&gt; that contains 100 data points; each data point is a 100 dimensional row vector of the following table:&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;1, 0, 0, ..., 0&lt;/div&gt;&lt;div&gt;0, 1, 0, ..., 0&lt;br /&gt;0, 0, 1,...., 0&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;0, 0, 0,..., 1&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;Since the Euclidean distance between any two points from the null-dataset is the constant sqrt(2), the distance matrix contains no information about the data points in the sense that the distance matrix does not single out any group of points with special meaning.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;Now for a new high dimensional dataset, we can simply apply a collection of MDS mapping methods to the dataset and to the null-dataset with the same configuration settings. If the MDS maps of the new dataset resemble those of the null-dataset, we can expect that the new dataset contains no meaningful information.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;The following pictures show the 2D MDS maps of the null-dataset with 100 data points created with 4 MDS algorithms (i.e. RPM, Sammon, CCA and SMACOF) available in VisuMap:&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/RtmgOupK_9I/AAAAAAAAB70/BptjDLJBMzc/s1600-h/NullDataset.PNG"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_Wca44kNaYGE/RtmgOupK_9I/AAAAAAAAB70/BptjDLJBMzc/s400/NullDataset.PNG" alt="" id="BLOGGER_PHOTO_ID_5105287827500498898" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;We notice that three of these MDS algorithms resulted in similar disk alike maps, whereas the RPM algorithm produced a homogeneously distributed map (the first map).&lt;br /&gt;&lt;br /&gt;In order to understand the behavior of MDS algorithms it is helpful to see how they react to injection of information by means of &lt;span style="font-style: italic;"&gt;broken symmetry&lt;/span&gt; between the data points. For this purpose I have altered the null-dataset slightly be multiplying the vector of the first data point by the factor 1.05.  This change singles out the first data point as it has slightly larger distance to other points. The following pictures are the corresponding MDS maps of dataset:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Wca44kNaYGE/Rtmg0upK_-I/AAAAAAAAB78/3EL8XGud2Lg/s1600-h/NullDatasetWithOutlier.PNG"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_Wca44kNaYGE/Rtmg0upK_-I/AAAAAAAAB78/3EL8XGud2Lg/s400/NullDatasetWithOutlier.PNG" alt="" id="BLOGGER_PHOTO_ID_5105288480335527906" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The first data point is shown as a red spot in above pictures. We can see that 3 of our MDS algorithms singled out the first point as an out-lier, whereas the RPM method  ignored such  alteration in the dataset.  Similarly, we can alter the null-dataset by multiplying a factor 0.95 to the first data point. The following pictures show the corresponding MDS maps:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Wca44kNaYGE/RtmhM-pK__I/AAAAAAAAB8E/UiARB6hqQkg/s1600-h/NullDatasetWithCenter.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_Wca44kNaYGE/RtmhM-pK__I/AAAAAAAAB8E/UiARB6hqQkg/s400/NullDatasetWithCenter.png" alt="" id="BLOGGER_PHOTO_ID_5105288896947355634" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;We can see that all 4 MDS mapping algorithms singled the first data point by moving it to the center of these maps.&lt;br /&gt;&lt;br /&gt;In order to test our method for a real dataset I have created a dataset with ca. 500 data points. Each data point contains 100 daily price changes in percentage. The following pictures show its MDS maps created with the same set of configuration settings used for the null-dataset:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/Rtmhd-pLAAI/AAAAAAAAB8M/weSSnwXo-00/s1600-h/NullDatasetSPPerf.PNG"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/Rtmhd-pLAAI/AAAAAAAAB8M/weSSnwXo-00/s400/NullDatasetSPPerf.PNG" alt="" id="BLOGGER_PHOTO_ID_5105289189005131778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;We can notice, more or less, some resemblance between above pictures and those of the null-dataset. This means, that the distance matrix probably doesn't carry much information about the stocks; and we probably won't be able to extract much information from the dataset using methods based on Euclidean distance. For example, it won't make sense to apply k-mean clustering or neural network algorithms directly on this dataset.&lt;br /&gt;&lt;br /&gt;It should be pointed out here that above statments only apply to the Euclidean distance.  If we use other distance metric to measure to distance between the data points (for instance use the correlation as a kind of distance) we might be able to extract useful information.&lt;br /&gt;&lt;br /&gt;In general, the null-dataset method described above only applies to Euclidean distance. If we explore the dataset as a non-euclidean space, we need to adapt the null-dataset to that non-euclidean space.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-8484262497909834449?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/8484262497909834449/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=8484262497909834449" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/8484262497909834449?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/8484262497909834449?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/08/null-dataset.html" title="The null-dataset" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_Wca44kNaYGE/RtmgOupK_9I/AAAAAAAAB70/BptjDLJBMzc/s72-c/NullDataset.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C08DRXc7fSp7ImA9WB5bFU4.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1657471668183313131</id><published>2007-08-30T20:40:00.000-07:00</published><updated>2007-08-30T20:44:34.905-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2007-08-30T20:44:34.905-07:00</app:edited><title>VisFinance 1.2 Released</title><content type="html">We have just &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_0"&gt;released&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;VisFinance&lt;/span&gt; 1.2.&lt;br /&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;VisFinance&lt;/span&gt; is a &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;plugin&lt;/span&gt; module for &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;VisuMap&lt;/span&gt; to provide&lt;br /&gt;specialized services for finance data analysis.&lt;br /&gt;&lt;br /&gt;Free trial  version of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;VisFinance&lt;/span&gt; 1.2 can be downloaded at &lt;a href="http://www.visumap.net/registered/RegMain.aspx"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1657471668183313131?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1657471668183313131/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1657471668183313131" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1657471668183313131?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1657471668183313131?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/08/visfinance-12-released.html" title="VisFinance 1.2 Released" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CEQCRHk8eip7ImA9WB5UGEk.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-1985240569842359470</id><published>2007-08-22T07:57:00.000-07:00</published><updated>2007-08-22T21:12:45.772-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2007-08-22T21:12:45.772-07:00</app:edited><title>VisuMap 2.5 Released.</title><content type="html">Today we have released VisuMap version 2.5. This version has undergone major extensions and changes in scripting &amp;amp; pluging interfaces in order to accommodate more advanced plugin applications (e. g. the upcoming VisFinance plugin for finance service).&lt;br /&gt;&lt;br /&gt;In this release we have not changed the formats of data files nor the configuration file, but plugins and scripts developed for version 2.4 or older might need to be upgraded to the new interface.&lt;br /&gt;&lt;br /&gt;This release also includes many bug fixes, enhancements in user interfaces and performance. For instance, double-mouse-clicking has now been consistently linked to the task to switch the mode of a window. In the main window, double-clicking the main map now switches the main map between editing and shifting mode, instead of starting the RPM mapping algorithm as in the previous version.&lt;br /&gt;&lt;br /&gt;In general, data drilling features (i.e. PCA, Data Details and Diagram view) now work in a more consistent and integrated way. For instance, you can now create a new dataset for selected data in almost all views with few mouse clicks.&lt;br /&gt;&lt;br /&gt;Also, we have a new price list for our products and services.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-1985240569842359470?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/1985240569842359470/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=1985240569842359470" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1985240569842359470?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/1985240569842359470?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/08/visumap-25-released.html" title="VisuMap 2.5 Released." /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry gd:etag="W/&quot;C0YHQn86eCp7ImA9WxRVFUs.&quot;"><id>tag:blogger.com,1999:blog-710383708872670121.post-2693741194654952135</id><published>2007-07-24T19:31:00.001-07:00</published><updated>2008-11-12T23:25:33.110-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-12T23:25:33.110-08:00</app:edited><title>The symmetry between repulsive and attractive force</title><content type="html">&lt;div&gt;The relational perspective map (RPM) algorithm in its core simulates a multi-particle system on a torus surface in which the particles exerts repulsive forces to each other. Some people has asked the question why just repulsive force? Why not use attractive forces like other force directed mapping methods? &lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;A simple answer to this question is that we prefer to use a simpler dynamic system if it can solve the problem. Just like physicists who try to reduce the number of fundamental interactions, I would prefer to avoid using attractive force if it is not absolutely needed. &lt;/div&gt;&lt;br /&gt;Classical multidimensional scaling (MDS) methods have to use both types of forces (as can be derived from their stress function) because their base information space is the infinite open Euclidean space: without attractive force their configuration will quickly degrade to infinite size; and without repulsive forces their configuration will shrink to a single point. With RPM method, the closed manifold (i.e. the torus surface) confines the configuration into a limited size.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;Another not-so-obvious answer to above question is that on a closed manifold the repulsive and attractive forces are the manifestation of the same thing, from certain point of view at least. To see this, let use consider the simplest case of 1-dimensional &lt;a href="http://jamesxli.blogspot.com/2007/07/rpm-curled-space-and-dimensionality.html"&gt;curled space&lt;/a&gt; (i.e. the circle). As be shown in the following picture, imaging that we have two ants living on the circle and there are two positively charged particles on the circle which can move freely on the circle but are confined on the circle.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Wca44kNaYGE/Rqd8dGGE58I/AAAAAAAAB7I/qpo8PfJbAdU/s1600-h/TwoAntsInCircle.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_Wca44kNaYGE/Rqd8dGGE58I/AAAAAAAAB7I/qpo8PfJbAdU/s320/TwoAntsInCircle.png" alt="" id="BLOGGER_PHOTO_ID_5091174743059785666" border="0" /&gt;&lt;/a&gt;From the point of view of the ant on the left side, the two charges exert repulsive force to each other according to coulomb's law.&lt;br /&gt;&lt;br /&gt;However, from the point of view of the ant on the right side, the two charges attract each other. This is a little contra-intuitive for us as observers from the 3-dimensional space.  But, we need to remind us that these ants are living in the 1-dimensional space. The ant on the right side has no way to see the two charges moving apart from each other. This ant can only move on the circle, particularly the right arc of the circle, to measure the distance between the two charges. What this ant would find out is that the same force as indicated in the picture will seemly press the two charge closer to each other; and larger the charge, the stronger the force. Thus, the ant on the right side would claim that two charges attract each other according a law similar to the coulomb's law (the anti-coulomb's law?).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/710383708872670121-2693741194654952135?l=jamesxli.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://jamesxli.blogspot.com/feeds/2693741194654952135/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=710383708872670121&amp;postID=2693741194654952135" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2693741194654952135?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/710383708872670121/posts/default/2693741194654952135?v=2" /><link rel="alternate" type="text/html" href="http://jamesxli.blogspot.com/2007/07/symmetry-between-repulsive-and.html" title="The symmetry between repulsive and attractive force" /><author><name>James X. Li</name><uri>http://www.blogger.com/profile/17491271097854752129</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08887157860508000723" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Wca44kNaYGE/Rqd8dGGE58I/AAAAAAAAB7I/qpo8PfJbAdU/s72-c/TwoAntsInCircle.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry></feed>
