<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
  <title>Daniel Lemire's articles on arXiv</title>
  <updated>2012-05-24T00:00:00-04:00</updated>
  <id>http://arxiv.org/a/lemire_d_1</id>
  
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/DanielLemiresArticlesOnArxiv" /><feedburner:info uri="daniellemiresarticlesonarxiv" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by/2.0/" /><feedburner:emailServiceId>DanielLemiresArticlesOnArxiv</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><entry>
    <id>http://arxiv.org/abs/1010.1526v5</id>
    <updated>2012-05-22T06:02:44-04:00</updated>
    <published>2010-10-07T15:48:23-04:00</published>
    <title>Time Series Classification by Class-Specific Mahalanobis Distance Measures</title>
    
    <author>
      <name>Zoltán Prekopcsák</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/kPdv7CyM154/1010.1526v5" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/1010.1526v5" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.LG" scheme="http://arxiv.org/schemas/atom" label="Learning (cs.LG)" />
    <category term="cs.LG" scheme="http://arxiv.org/schemas/atom" label="Learning (cs.LG)" />
  <content type="html">To classify time series by nearest neighbors, we need to specify or learn one or several distance measures. We consider variations of the Mahalanobis distance measures which rely on the inverse covariance matrix of the data. Unfortunately --- for time series data --- the covariance matrix has often low rank. To alleviate this problem we can either use a pseudoinverse, covariance shrinking or limit the matrix to its diagonal. We review these alternatives and benchmark them against competitive methods such as the related Large Margin Nearest Neighbor Classification (LMNN) and the Dynamic Time Warping (DTW) distance. As we expected, we find that the DTW is superior, but the Mahalanobis distance measures are one to two orders of magnitude faster. To get best results with Mahalanobis distance measures, we recommend learning one distance measure per class using either covariance shrinking or the diagonal approach.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=kPdv7CyM154:VgvaoiZJVwE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=kPdv7CyM154:VgvaoiZJVwE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=kPdv7CyM154:VgvaoiZJVwE:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=kPdv7CyM154:VgvaoiZJVwE:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=kPdv7CyM154:VgvaoiZJVwE:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=kPdv7CyM154:VgvaoiZJVwE:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=kPdv7CyM154:VgvaoiZJVwE:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=kPdv7CyM154:VgvaoiZJVwE:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=kPdv7CyM154:VgvaoiZJVwE:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/kPdv7CyM154" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/1010.1526v5</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/1202.4961v1</id>
    <updated>2012-02-22T11:34:24-05:00</updated>
    <published>2012-02-22T11:34:24-05:00</published>
    <title>Strongly universal string hashing is fast</title>
    
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Software is available at http://code.google.com/p/variablelengthstringhashing/</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/uZW5g_WEGbg/1202.4961v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/1202.4961v1" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
  <content type="html">We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they requires a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=uZW5g_WEGbg:RX7FC2_byYA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=uZW5g_WEGbg:RX7FC2_byYA:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=uZW5g_WEGbg:RX7FC2_byYA:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=uZW5g_WEGbg:RX7FC2_byYA:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=uZW5g_WEGbg:RX7FC2_byYA:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=uZW5g_WEGbg:RX7FC2_byYA:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=uZW5g_WEGbg:RX7FC2_byYA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=uZW5g_WEGbg:RX7FC2_byYA:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=uZW5g_WEGbg:RX7FC2_byYA:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/uZW5g_WEGbg" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/1202.4961v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0705.4676v7</id>
    <updated>2012-01-04T15:37:05-05:00</updated>
    <published>2007-05-31T14:41:28-04:00</published>
    <title>Recursive n-gram hashing is pairwise independent, at best</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1016/j.csl.2009.12.001</arxiv:doi>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">See software at http://code.google.com/p/ngramhashing/. arXiv admin note: substantial text overlap with arXiv:cs/0610010v3</arxiv:comment>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire and Owen Kaser, Recursive n-gram hashing is pairwise independent, at best, Computer Speech &amp; Language 24(4): 698-710 (2010)</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/1-M1cr6gDI4/0705.4676v7" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0705.4676v7" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1016/j.csl.2009.12.001" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.CL" scheme="http://arxiv.org/schemas/atom" label="Computation and Language (cs.CL)" />
    <category term="H.3.3; H.2.7" scheme="http://arxiv.org/schemas/atom" />
  <content type="html">Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time O(n) or use an exponential amount of memory. As a more scalable alternative, we make hashing by cyclic polynomials pairwise independent by ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials is is twice as fast as hashing by irreducible polynomials. We also show that randomized Karp-Rabin hash families are not pairwise independent.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1-M1cr6gDI4:xpD9DYibHRY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1-M1cr6gDI4:xpD9DYibHRY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=1-M1cr6gDI4:xpD9DYibHRY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1-M1cr6gDI4:xpD9DYibHRY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1-M1cr6gDI4:xpD9DYibHRY:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=1-M1cr6gDI4:xpD9DYibHRY:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1-M1cr6gDI4:xpD9DYibHRY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1-M1cr6gDI4:xpD9DYibHRY:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=1-M1cr6gDI4:xpD9DYibHRY:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/1-M1cr6gDI4" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0705.4676v7</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/1008.1715v5</id>
    <updated>2011-11-24T10:10:52-05:00</updated>
    <published>2010-08-10T10:05:28-04:00</published>
    <title>The universality of iterated hashing over variable-length strings</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1016/j.dam.2011.11.009</arxiv:doi>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Discrete Applied Mathematics 160 (4-5), 604--617 (2012)</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/1M_sWZ6QGpg/1008.1715v5" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/1008.1715v5" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1016/j.dam.2011.11.009" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
  <content type="html">Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1M_sWZ6QGpg:T1d9pZ3ThuA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1M_sWZ6QGpg:T1d9pZ3ThuA:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=1M_sWZ6QGpg:T1d9pZ3ThuA:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1M_sWZ6QGpg:T1d9pZ3ThuA:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1M_sWZ6QGpg:T1d9pZ3ThuA:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=1M_sWZ6QGpg:T1d9pZ3ThuA:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1M_sWZ6QGpg:T1d9pZ3ThuA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=1M_sWZ6QGpg:T1d9pZ3ThuA:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=1M_sWZ6QGpg:T1d9pZ3ThuA:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/1M_sWZ6QGpg" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/1008.1715v5</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/1108.4041v2</id>
    <updated>2011-08-22T22:21:59-04:00</updated>
    <published>2011-08-19T16:16:02-04:00</published>
    <title>Extracting, Transforming and Archiving Scientific Data</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Andre Vellino</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">8 pages, Fourth Workshop on Very Large Digital Libraries, 2011</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/wVkdBzbcy04/1108.4041v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/1108.4041v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DL" scheme="http://arxiv.org/schemas/atom" label="Digital Libraries (cs.DL)" />
    <category term="cs.DL" scheme="http://arxiv.org/schemas/atom" label="Digital Libraries (cs.DL)" />
  <content type="html">It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wVkdBzbcy04:l8vTiF-klXE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wVkdBzbcy04:l8vTiF-klXE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=wVkdBzbcy04:l8vTiF-klXE:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wVkdBzbcy04:l8vTiF-klXE:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wVkdBzbcy04:l8vTiF-klXE:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=wVkdBzbcy04:l8vTiF-klXE:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wVkdBzbcy04:l8vTiF-klXE:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wVkdBzbcy04:l8vTiF-klXE:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=wVkdBzbcy04:l8vTiF-klXE:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/wVkdBzbcy04" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/1108.4041v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/1105.6001v3</id>
    <updated>2011-07-14T13:48:38-04:00</updated>
    <published>2011-05-30T10:07:10-04:00</published>
    <title>A Call to Arms: Revisiting Database Design</title>
    
    <author>
      <name>Antonio Badia</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1145/2070736.2070750</arxiv:doi>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Antonio Badia and Daniel Lemire. A call to arms: revisiting database design. SIGMOD Record 40 (3), pages 61-69, 2011</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/tq-o-bEijp8/1105.6001v3" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/1105.6001v3" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1145/2070736.2070750" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Good database design is crucial to obtain a sound, consistent database, and - in turn - good database design methodologies are the best way to achieve the right design. These methodologies are taught to most Computer Science undergraduates, as part of any Introduction to Database class. They can be considered part of the "canon", and indeed, the overall approach to database design has been unchanged for years. Moreover, none of the major database research assessments identify database design as a strategic research direction.
  Should we conclude that database design is a solved problem?
  Our thesis is that database design remains a critical unsolved problem. Hence, it should be the subject of more research. Our starting point is the observation that traditional database design is not used in practice - and if it were used it would result in designs that are not well adapted to current environments. In short, database design has failed to keep up with the times. In this paper, we put forth arguments to support our viewpoint, analyze the root causes of this situation and suggest some avenues of research.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=tq-o-bEijp8:di633RwUbTY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=tq-o-bEijp8:di633RwUbTY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=tq-o-bEijp8:di633RwUbTY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=tq-o-bEijp8:di633RwUbTY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=tq-o-bEijp8:di633RwUbTY:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=tq-o-bEijp8:di633RwUbTY:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=tq-o-bEijp8:di633RwUbTY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=tq-o-bEijp8:di633RwUbTY:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=tq-o-bEijp8:di633RwUbTY:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/tq-o-bEijp8" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/1105.6001v3</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/1006.3726v2</id>
    <updated>2011-05-30T12:48:25-04:00</updated>
    <published>2010-06-18T11:38:04-04:00</published>
    <title>Diamond Dicing</title>
    
    <author>
      <name>Hazel Webb</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">31 pages</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/XB4WzRflZM0/1006.3726v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/1006.3726v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Many queries that constrain multiple dimensions simultaneously are difficult to express and compute efficiently in both Structured Query Language (SQL) and multidimensional languages. We introduce the diamond cube operator, filling a gap among existing data warehouse operations. It aids decision support and analysis by facilitating the execution of simultaneous multidimensional queries on data cubes. An example query from the business analysis domain follows.
  A company wants to close shops and terminate product lines simultaneously. The CEO wants the maximal set of products and shops such that each shop would have sales greater than $400,000, and each product would bring revenues of at least $100,000 - assuming we terminate all other shops and product lines.
  Diamonds are intrinsically multidimensional and because of the interaction between dimensions the computation of diamond cubes is challenging. We present diamond dicing experiments on large data sets of more than 100 million facts. Our custom Java implementation is more than one hundred times faster, on a large data set, than a popular database engine.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=XB4WzRflZM0:L4QIWsDn2Ak:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=XB4WzRflZM0:L4QIWsDn2Ak:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=XB4WzRflZM0:L4QIWsDn2Ak:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=XB4WzRflZM0:L4QIWsDn2Ak:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=XB4WzRflZM0:L4QIWsDn2Ak:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=XB4WzRflZM0:L4QIWsDn2Ak:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=XB4WzRflZM0:L4QIWsDn2Ak:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=XB4WzRflZM0:L4QIWsDn2Ak:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=XB4WzRflZM0:L4QIWsDn2Ak:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/XB4WzRflZM0" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/1006.3726v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0909.1346v8</id>
    <updated>2011-02-22T11:00:56-05:00</updated>
    <published>2009-09-07T17:34:52-04:00</published>
    <title>Reordering Columns for Smaller Indexes</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1016/j.ins.2011.02.002</arxiv:doi>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">to appear in Information Sciences</arxiv:comment>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire and Owen Kaser, Reordering Columns for Smaller Indexes, Information Sciences 181 (12), 2011</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/Jp96IqvKLFc/0909.1346v8" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0909.1346v8" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1016/j.ins.2011.02.002" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality. Experimentally, sorting based on Hilbert space-filling curves is poor at minimizing the number of runs.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Jp96IqvKLFc:er63K_krNzM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Jp96IqvKLFc:er63K_krNzM:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Jp96IqvKLFc:er63K_krNzM:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Jp96IqvKLFc:er63K_krNzM:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Jp96IqvKLFc:er63K_krNzM:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Jp96IqvKLFc:er63K_krNzM:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Jp96IqvKLFc:er63K_krNzM:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Jp96IqvKLFc:er63K_krNzM:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Jp96IqvKLFc:er63K_krNzM:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/Jp96IqvKLFc" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0909.1346v8</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0707.1913v2</id>
    <updated>2009-08-21T15:57:15-04:00</updated>
    <published>2007-07-12T22:30:10-04:00</published>
    <title>Removing Manually-Generated Boilerplate from Electronic Texts: Experiments with Project Gutenberg e-Books</title>
    
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">short version appeared in CASCON 2007 proceedings, available from http://portal.acm.org/citation.cfm?id=1321246</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/fCTumx1G5E4/0707.1913v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0707.1913v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DL" scheme="http://arxiv.org/schemas/atom" label="Digital Libraries (cs.DL)" />
    <category term="cs.DL" scheme="http://arxiv.org/schemas/atom" label="Digital Libraries (cs.DL)" />
    <category term="cs.CL" scheme="http://arxiv.org/schemas/atom" label="Computation and Language (cs.CL)" />
  <content type="html">Collaborative work on unstructured or semi-structured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templates are not consistent across users and over time. Rule-based parsing of these templates is expensive to maintain and tends to fail as new documents are added. Statistical techniques based on frequent occurrences have the potential to identify automatically a large fraction of the templates, thus reducing the burden on the programmers. We investigate the case of the Project Gutenberg corpus, where most documents are in ASCII format with preambles and epilogues that are often copied and pasted or manually typed. We show that a statistical approach can solve most cases though some documents require knowledge of English. We also survey various technical solutions that make our approach applicable to large data sets.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCTumx1G5E4:qtNYZnY1LJo:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCTumx1G5E4:qtNYZnY1LJo:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=fCTumx1G5E4:qtNYZnY1LJo:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCTumx1G5E4:qtNYZnY1LJo:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCTumx1G5E4:qtNYZnY1LJo:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=fCTumx1G5E4:qtNYZnY1LJo:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCTumx1G5E4:qtNYZnY1LJo:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCTumx1G5E4:qtNYZnY1LJo:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=fCTumx1G5E4:qtNYZnY1LJo:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/fCTumx1G5E4" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0707.1913v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0811.3301v2</id>
    <updated>2009-06-10T13:37:32-04:00</updated>
    <published>2008-11-20T11:22:05-05:00</published>
    <title>Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1016/j.patcog.2008.11.030</arxiv:doi>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Accepted in Pattern Recognition on November 20th, 2008</arxiv:comment>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire, Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound, Pattern Recognition 42(9): 2169-2180 (2009)</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/HZ_3h4Dvb48/0811.3301v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0811.3301v2" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1016/j.patcog.2008.11.030" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.CV" scheme="http://arxiv.org/schemas/atom" label="Computer Vision and Pattern Recognition (cs.CV)" />
  <content type="html">The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB Keogh). We compare LB Keogh with a tighter lower bound (LB Improved). We find that LB Improved-based search is faster. As an example, our approach is 2-3 times faster over random-walk and shape time series.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=HZ_3h4Dvb48:SH5Y9oogfQI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=HZ_3h4Dvb48:SH5Y9oogfQI:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=HZ_3h4Dvb48:SH5Y9oogfQI:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=HZ_3h4Dvb48:SH5Y9oogfQI:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=HZ_3h4Dvb48:SH5Y9oogfQI:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=HZ_3h4Dvb48:SH5Y9oogfQI:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=HZ_3h4Dvb48:SH5Y9oogfQI:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=HZ_3h4Dvb48:SH5Y9oogfQI:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=HZ_3h4Dvb48:SH5Y9oogfQI:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/HZ_3h4Dvb48" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0811.3301v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0901.3751v4</id>
    <updated>2009-06-08T16:04:04-04:00</updated>
    <published>2009-01-23T14:01:06-05:00</published>
    <title>Sorting improves word-aligned bitmap indexes</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1016/j.datak.2009.08.006</arxiv:doi>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes, Data &amp; Knowledge Engineering, Volume 69, Issue 1, 2010, Pages 3-28</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/aSjl17WPbzI/0901.3751v4" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0901.3751v4" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1016/j.datak.2009.08.006" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate row-reordering heuristics. Simply permuting the columns of the table can increase the sorting efficiency by 40%. Secondary contributions include efficient algorithms to construct and aggregate bitmaps. The effect of word length is also reviewed by constructing 16-bit, 32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are slightly faster than 32-bit indexes despite being nearly twice as large.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=aSjl17WPbzI:UACxj8rpX84:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=aSjl17WPbzI:UACxj8rpX84:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=aSjl17WPbzI:UACxj8rpX84:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=aSjl17WPbzI:UACxj8rpX84:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=aSjl17WPbzI:UACxj8rpX84:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=aSjl17WPbzI:UACxj8rpX84:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=aSjl17WPbzI:UACxj8rpX84:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=aSjl17WPbzI:UACxj8rpX84:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=aSjl17WPbzI:UACxj8rpX84:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/aSjl17WPbzI" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0901.3751v4</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0906.0910v1</id>
    <updated>2009-06-04T09:28:51-04:00</updated>
    <published>2009-06-04T09:28:51-04:00</published>
    <title>On the Challenges of Collaborative Data Processing</title>
    
    <author>
      <name>Sylvie Noel</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">to appear as a chapter in an upcoming book (Collaborative Information Behavior)</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/0LVda_yFjOY/0906.0910v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0906.0910v1" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.HC" scheme="http://arxiv.org/schemas/atom" label="Human-Computer Interaction (cs.HC)" />
  <content type="html">The last 30 years have seen the creation of a variety of electronic collaboration tools for science and business. Some of the best-known collaboration tools support text editing (e.g., wikis). Wikipedia's success shows that large-scale collaboration can produce highly valuable content. Meanwhile much structured data is being collected and made publicly available. We have never had access to more powerful databases and statistical packages. Is large-scale collaborative data analysis now possible? Using a quantitative analysis of Web 2.0 data visualization sites, we find evidence that at least moderate open collaboration occurs. We then explore some of the limiting factors of collaboration over data.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=0LVda_yFjOY:x2zA5orw86A:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=0LVda_yFjOY:x2zA5orw86A:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=0LVda_yFjOY:x2zA5orw86A:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=0LVda_yFjOY:x2zA5orw86A:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=0LVda_yFjOY:x2zA5orw86A:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=0LVda_yFjOY:x2zA5orw86A:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=0LVda_yFjOY:x2zA5orw86A:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=0LVda_yFjOY:x2zA5orw86A:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=0LVda_yFjOY:x2zA5orw86A:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/0LVda_yFjOY" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0906.0910v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0905.2657v1</id>
    <updated>2009-05-16T01:57:06-04:00</updated>
    <published>2009-05-16T01:57:06-04:00</published>
    <title>Web 2.0 OLAP: From Data Cubes to Tag Clouds</title>
    
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Robert Godin</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1007/978-3-642-01344-7_5</arxiv:doi>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Kamel Aouiche, Daniel Lemire and Robert Godin, Web 2.0 OLAP: From Data Cubes to Tag Clouds, Lecture Notes in Business Information Processing Vol. 18, pages 51-64, 2009</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/wOLl4h07Eho/0905.2657v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0905.2657v1" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1007/978-3-642-01344-7_5" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a case study, we implemented an application where users can upload data and immediately navigate through its ad hoc dimensions. To support social networking, views can be easily shared and embedded in other Web sites. Algorithmically, our tag-cloud views are approximate range top-k queries over spontaneous data cubes. We present experimental evidence that iceberg cuboids provide adequate online approximations. We benchmark several browser-oblivious tag-cloud layout optimizations.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wOLl4h07Eho:09fhsrBNk_8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wOLl4h07Eho:09fhsrBNk_8:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=wOLl4h07Eho:09fhsrBNk_8:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wOLl4h07Eho:09fhsrBNk_8:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wOLl4h07Eho:09fhsrBNk_8:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=wOLl4h07Eho:09fhsrBNk_8:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wOLl4h07Eho:09fhsrBNk_8:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=wOLl4h07Eho:09fhsrBNk_8:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=wOLl4h07Eho:09fhsrBNk_8:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/wOLl4h07Eho" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0905.2657v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0808.2083v3</id>
    <updated>2009-01-19T10:22:33-05:00</updated>
    <published>2008-08-14T23:14:55-04:00</published>
    <title>Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes</title>
    
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">To appear in proceedings of DOLAP 2008</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/cqctWDoL6QE/0808.2083v3" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0808.2083v3" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="H.3.2; E.1" scheme="http://arxiv.org/schemas/atom" />
  <content type="html">Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=cqctWDoL6QE:8lTsU-DGxL0:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=cqctWDoL6QE:8lTsU-DGxL0:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=cqctWDoL6QE:8lTsU-DGxL0:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=cqctWDoL6QE:8lTsU-DGxL0:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=cqctWDoL6QE:8lTsU-DGxL0:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=cqctWDoL6QE:8lTsU-DGxL0:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=cqctWDoL6QE:8lTsU-DGxL0:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=cqctWDoL6QE:8lTsU-DGxL0:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=cqctWDoL6QE:8lTsU-DGxL0:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/cqctWDoL6QE" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0808.2083v3</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0703058v3</id>
    <updated>2008-12-08T14:26:17-05:00</updated>
    <published>2007-03-13T13:46:11-04:00</published>
    <title>A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP</title>
    
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/WY3oT-hXiTE/0703058v3" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0703058v3" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.PF" scheme="http://arxiv.org/schemas/atom" label="Performance (cs.PF)" />
  <content type="html">A data warehouse cannot materialize all possible views, hence we must estimate quickly, accurately, and reliably the size of views to determine the best candidates for materialization. Many available techniques for view-size estimation make particular statistical assumptions and their error can be large. Comparatively, unassuming probabilistic techniques are slower, but they estimate accurately and reliability very large view sizes using little memory. We compare five unassuming hashing-based view-size estimation techniques including Stochastic Probabilistic Counting and LogLog Probabilistic Counting. Our experiments show that only Generalized Counting, Gibbons-Tirthapura, and Adaptive Counting provide universally tight estimates irrespective of the size of the view; of those, only Adaptive Counting remains constantly fast as we increase the memory budget.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=WY3oT-hXiTE:MBsM18mGc90:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=WY3oT-hXiTE:MBsM18mGc90:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=WY3oT-hXiTE:MBsM18mGc90:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=WY3oT-hXiTE:MBsM18mGc90:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=WY3oT-hXiTE:MBsM18mGc90:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=WY3oT-hXiTE:MBsM18mGc90:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=WY3oT-hXiTE:MBsM18mGc90:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=WY3oT-hXiTE:MBsM18mGc90:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=WY3oT-hXiTE:MBsM18mGc90:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/WY3oT-hXiTE" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0703058v3</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0703056v3</id>
    <updated>2008-12-08T14:23:44-05:00</updated>
    <published>2007-03-12T17:42:59-04:00</published>
    <title>Unasssuming View-Size Estimation Techniques in OLAP</title>
    
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Published in ICEIS 2007</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/7x2C_04Yj6w/0703056v3" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0703056v3" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.PF" scheme="http://arxiv.org/schemas/atom" label="Performance (cs.PF)" />
  <content type="html">Even if storage was infinite, a data warehouse could not materialize all possible views due to the running time and update requirements. Therefore, it is necessary to estimate quickly, accurately, and reliably the size of views. Many available techniques make particular statistical assumptions and their error can be quite large. Unassuming techniques exist, but typically assume we have independent hashing for which there is no known practical implementation. We adapt an unassuming estimator due to Gibbons and Tirthapura: its theoretical bounds do not make unpractical assumptions. We compare this technique experimentally with stochastic probabilistic counting, LogLog probabilistic counting, and multifractal statistical models. Our experiments show that we can reliably and accurately (within 10%, 19 times out 20) estimate view sizes over large data sets (1.5 GB) within minutes, using almost no memory. However, only Gibbons-Tirthapura provides universally tight estimates irrespective of the size of the view. For large views, probabilistic counting has a small edge in accuracy, whereas the competitive sampling-based method (multifractal) we tested is an order of magnitude faster but can sometimes provide poor estimates (relative error of 100%). In our tests, LogLog probabilistic counting is not competitive. Experimental validation on the US Census 1990 data set and on the Transaction Processing Performance (TPC H) data set is provided.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=7x2C_04Yj6w:9nbcGLt1Exo:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=7x2C_04Yj6w:9nbcGLt1Exo:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=7x2C_04Yj6w:9nbcGLt1Exo:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=7x2C_04Yj6w:9nbcGLt1Exo:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=7x2C_04Yj6w:9nbcGLt1Exo:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=7x2C_04Yj6w:9nbcGLt1Exo:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=7x2C_04Yj6w:9nbcGLt1Exo:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=7x2C_04Yj6w:9nbcGLt1Exo:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=7x2C_04Yj6w:9nbcGLt1Exo:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/7x2C_04Yj6w" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0703056v3</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0807.1734v5</id>
    <updated>2008-10-06T20:45:02-04:00</updated>
    <published>2008-07-10T16:26:31-04:00</published>
    <title>Faster Sequential Search with a Two-Pass Dynamic-Time-Warping Lower Bound</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/3LIhuhwFzgY/0807.1734v5" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0807.1734v5" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB_Keogh). We compare LB_Keogh with a tighter lower bound (LB_Improved). We find that LB_Improved-based search is faster for sequential search. As an example, our approach is 3 times faster over random-walk and shape time series. We also review some of the mathematical properties of the DTW. We derive a tight triangle inequality for the DTW. We show that the DTW becomes the l_1 distance when time series are separated by a constant.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=3LIhuhwFzgY:1xZG7JQ2SaE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=3LIhuhwFzgY:1xZG7JQ2SaE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=3LIhuhwFzgY:1xZG7JQ2SaE:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=3LIhuhwFzgY:1xZG7JQ2SaE:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=3LIhuhwFzgY:1xZG7JQ2SaE:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=3LIhuhwFzgY:1xZG7JQ2SaE:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=3LIhuhwFzgY:1xZG7JQ2SaE:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=3LIhuhwFzgY:1xZG7JQ2SaE:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=3LIhuhwFzgY:1xZG7JQ2SaE:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/3LIhuhwFzgY" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0807.1734v5</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0702144v2</id>
    <updated>2008-09-15T15:57:44-04:00</updated>
    <published>2007-02-23T22:16:27-05:00</published>
    <title>Slope One Predictors for Online Rating-Based Collaborative Filtering</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Anna Maclachlan</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">In SIAM Data Mining (SDM'05), Newport Beach, California, April 21-23, 2005</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/xuFB7HggH2k/0702144v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0702144v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Rating-based collaborative filtering is the process of predicting how a user would rate a given item from other user ratings. We propose three related slope one schemes with predictors of the form f(x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both. Slope one algorithms are easy to implement, efficient to query, reasonably accurate, and they support both online queries and dynamic updates, which makes them good candidates for real-world systems. The basic slope one scheme is suggested as a new reference scheme for collaborative filtering. By factoring in items that a user liked separately from items that a user disliked, we achieve results competitive with slower memory-based schemes over the standard benchmark EachMovie and Movielens data sets while better fulfilling the desiderata of CF applications.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=xuFB7HggH2k:LzVWvbLUswg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=xuFB7HggH2k:LzVWvbLUswg:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=xuFB7HggH2k:LzVWvbLUswg:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=xuFB7HggH2k:LzVWvbLUswg:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=xuFB7HggH2k:LzVWvbLUswg:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=xuFB7HggH2k:LzVWvbLUswg:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=xuFB7HggH2k:LzVWvbLUswg:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=xuFB7HggH2k:LzVWvbLUswg:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=xuFB7HggH2k:LzVWvbLUswg:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/xuFB7HggH2k" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0702144v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0805.3339v3</id>
    <updated>2008-08-14T20:08:42-04:00</updated>
    <published>2008-05-21T15:50:46-04:00</published>
    <title>Tri de la table de faits et compression des index bitmaps avec alignement sur les mots</title>
    
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">to appear at BDA'08</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/fCxAslpKBIQ/0805.3339v3" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0805.3339v3" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Bitmap indexes are frequently used to index multidimensional data. They rely mostly on sequential input/output. Bitmaps can be compressed to reduce input/output costs and minimize CPU usage. The most efficient compression techniques are based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. This type of compression accelerates logical operations (AND, OR) over the bitmaps. However, run-length encoding is sensitive to the order of the facts. Thus, we propose to sort the fact tables. We review lexicographic, Gray-code, and block-wise sorting. We found that a lexicographic sort improves compression--sometimes generating indexes twice as small--and make indexes several times faster. While sorting takes time, this is partially offset by the fact that it is faster to index a sorted table. Column order is significant: it is generally preferable to put the columns having more distinct values at the beginning. A block-wise sort is much less efficient than a full sort. Moreover, we found that Gray-code sorting is not better than lexicographic sorting when using word-aligned compression.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCxAslpKBIQ:uZ7hGyqxdwU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCxAslpKBIQ:uZ7hGyqxdwU:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=fCxAslpKBIQ:uZ7hGyqxdwU:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCxAslpKBIQ:uZ7hGyqxdwU:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCxAslpKBIQ:uZ7hGyqxdwU:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=fCxAslpKBIQ:uZ7hGyqxdwU:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCxAslpKBIQ:uZ7hGyqxdwU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=fCxAslpKBIQ:uZ7hGyqxdwU:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=fCxAslpKBIQ:uZ7hGyqxdwU:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/fCxAslpKBIQ" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0805.3339v3</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0805.0747v1</id>
    <updated>2008-05-06T11:45:15-04:00</updated>
    <published>2008-05-06T11:45:15-04:00</published>
    <title>Pruning Attribute Values From Data Cubes with Diamond Dicing</title>
    
    <author>
      <name>Hazel Webb</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/ofGNkVIun6w/0805.0747v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0805.0747v1" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
  <content type="html">Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or--separately--the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts in less than 35 minutes using a standard PC.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=ofGNkVIun6w:mVb2akZP7wc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=ofGNkVIun6w:mVb2akZP7wc:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=ofGNkVIun6w:mVb2akZP7wc:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=ofGNkVIun6w:mVb2akZP7wc:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=ofGNkVIun6w:mVb2akZP7wc:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=ofGNkVIun6w:mVb2akZP7wc:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=ofGNkVIun6w:mVb2akZP7wc:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=ofGNkVIun6w:mVb2akZP7wc:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=ofGNkVIun6w:mVb2akZP7wc:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/ofGNkVIun6w" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0805.0747v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0710.2156v2</id>
    <updated>2008-01-14T16:23:11-05:00</updated>
    <published>2007-10-11T15:48:10-04:00</published>
    <title>Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation</title>
    
    <author>
      <name>Kamel Aouiche</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Robert Godin</name>
    </author>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/Juri0bZLcXM/0710.2156v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0710.2156v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a case study, we implemented an application where users can upload data and immediately navigate through its ad hoc dimensions. To support social networking, views can be easily shared and embedded in other Web sites. Algorithmically, our tag-cloud views are approximate range top-k queries over spontaneous data cubes. We present experimental evidence that iceberg cuboids provide adequate online approximations. We benchmark several browser-oblivious tag-cloud layout optimizations.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Juri0bZLcXM:5L9P2s1qim0:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Juri0bZLcXM:5L9P2s1qim0:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Juri0bZLcXM:5L9P2s1qim0:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Juri0bZLcXM:5L9P2s1qim0:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Juri0bZLcXM:5L9P2s1qim0:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Juri0bZLcXM:5L9P2s1qim0:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Juri0bZLcXM:5L9P2s1qim0:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Juri0bZLcXM:5L9P2s1qim0:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Juri0bZLcXM:5L9P2s1qim0:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/Juri0bZLcXM" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0710.2156v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/0709.1166v1</id>
    <updated>2007-09-07T17:18:57-04:00</updated>
    <published>2007-09-07T17:18:57-04:00</published>
    <title>An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Martin Brooks</name>
    </author>
    <author>
      <name>Yuhong Yan</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1080/00207160701694153</arxiv:doi>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">This is the extended version of our ICDM'05 paper (arXiv:cs/0702142)</arxiv:comment>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire, Martin Brooks and Yuhong Yan, An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation. International Journal of Computer Mathematics 86 (7), 2009.</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/5eweg_Ze45Y/0709.1166v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/0709.1166v1" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1080/00207160701694153" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Monotonicity is a simple yet significant qualitative characteristic. We consider the problem of segmenting a sequence in up to K segments. We want segments to be as monotonic as possible and to alternate signs. We propose a quality metric for this problem using the l_inf norm, and we present an optimal linear time algorithm based on novel formalism. Moreover, given a precomputation in time O(n log n) consisting of a labeling of all extrema, we compute any optimal segmentation in constant time. We compare experimentally its performance to two piecewise linear segmentation heuristics (top-down and bottom-up). We show that our algorithm is faster and more accurate. Applications include pattern recognition and qualitative modeling.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=5eweg_Ze45Y:Ri3D_Tg4rFw:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=5eweg_Ze45Y:Ri3D_Tg4rFw:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=5eweg_Ze45Y:Ri3D_Tg4rFw:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=5eweg_Ze45Y:Ri3D_Tg4rFw:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=5eweg_Ze45Y:Ri3D_Tg4rFw:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=5eweg_Ze45Y:Ri3D_Tg4rFw:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=5eweg_Ze45Y:Ri3D_Tg4rFw:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=5eweg_Ze45Y:Ri3D_Tg4rFw:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=5eweg_Ze45Y:Ri3D_Tg4rFw:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/5eweg_Ze45Y" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/0709.1166v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0610128v2</id>
    <updated>2007-08-24T11:42:52-04:00</updated>
    <published>2006-10-20T20:30:57-04:00</published>
    <title>Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1145/1328911.1328925</arxiv:doi>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire and Owen Kaser, Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays, ACM Transactions on Algorithms 4(1): 14 (2008)</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/hdzNiNMuQPo/0610128v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0610128v2" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1145/1328911.1328925" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="H.3.5; G.1.1" scheme="http://arxiv.org/schemas/atom" />
  <content type="html">Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=hdzNiNMuQPo:BbcDfbu8PHY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=hdzNiNMuQPo:BbcDfbu8PHY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=hdzNiNMuQPo:BbcDfbu8PHY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=hdzNiNMuQPo:BbcDfbu8PHY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=hdzNiNMuQPo:BbcDfbu8PHY:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=hdzNiNMuQPo:BbcDfbu8PHY:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=hdzNiNMuQPo:BbcDfbu8PHY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=hdzNiNMuQPo:BbcDfbu8PHY:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=hdzNiNMuQPo:BbcDfbu8PHY:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/hdzNiNMuQPo" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0610128v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0703109v2</id>
    <updated>2007-05-07T13:59:20-04:00</updated>
    <published>2007-03-22T10:54:48-04:00</published>
    <title>Tag-Cloud Drawing: Algorithms for Cloud Visualization</title>
    
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">To appear in proceedings of Tagging and Metadata for Social Information Organization (WWW 2007)</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/t0FSw269SrE/0703109v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0703109v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
  <content type="html">Tag clouds provide an aggregate of tag-usage statistics. They are typically sent as in-line HTML to browsers. However, display mechanisms suited for ordinary text are not ideal for tags, because font sizes may vary widely on a line. As well, the typical layout does not account for relationships that may be known between tags. This paper presents models and algorithms to improve the display of tag clouds that consist of in-line HTML, as well as algorithms that use nested tables to achieve a more general 2-dimensional layout in which tag relationships are considered. The first algorithms leverage prior work in typesetting and rectangle packing, whereas the second group of algorithms leverage prior work in Electronic Design Automation. Experiments show our algorithms can be efficiently implemented and perform well.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=t0FSw269SrE:wWjgkE986JY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=t0FSw269SrE:wWjgkE986JY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=t0FSw269SrE:wWjgkE986JY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=t0FSw269SrE:wWjgkE986JY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=t0FSw269SrE:wWjgkE986JY:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=t0FSw269SrE:wWjgkE986JY:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=t0FSw269SrE:wWjgkE986JY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=t0FSw269SrE:wWjgkE986JY:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=t0FSw269SrE:wWjgkE986JY:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/t0FSw269SrE" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0703109v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0605103v8</id>
    <updated>2007-04-28T19:13:34-04:00</updated>
    <published>2006-05-24T00:42:53-04:00</published>
    <title>A Better Alternative to Piecewise Linear Time Series Segmentation</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">to appear in SIAM Data Mining 2007</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/_OBrhNTXULI/0605103v8" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0605103v8" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.CV" scheme="http://arxiv.org/schemas/atom" label="Computer Vision and Pattern Recognition (cs.CV)" />
    <category term="H.2.8" scheme="http://arxiv.org/schemas/atom" />
  <content type="html">Time series are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can determine where the data goes up or down and at what rate. Unfortunately, when the data does not follow a linear model, the computation of the local slope creates overfitting. We propose an adaptive time series model where the polynomial degree of each interval vary (constant, linear and so on). Given a number of regressors, the cost of each interval is its polynomial degree: constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so on. Our goal is to minimize the Euclidean (l_2) error for a given model complexity. Experimentally, we investigate the model where intervals can be either constant or linear. Over synthetic random walks, historical stock market prices, and electrocardiograms, the adaptive model provides a more accurate segmentation than the piecewise linear model without increasing the cross-validation error or the running time, while providing a richer vocabulary to applications. Implementation issues, such as numerical stability and real-world performance, are discussed.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=_OBrhNTXULI:OJFSR1mlhJc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=_OBrhNTXULI:OJFSR1mlhJc:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=_OBrhNTXULI:OJFSR1mlhJc:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=_OBrhNTXULI:OJFSR1mlhJc:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=_OBrhNTXULI:OJFSR1mlhJc:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=_OBrhNTXULI:OJFSR1mlhJc:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=_OBrhNTXULI:OJFSR1mlhJc:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=_OBrhNTXULI:OJFSR1mlhJc:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=_OBrhNTXULI:OJFSR1mlhJc:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/_OBrhNTXULI" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0605103v8</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0610046v5</id>
    <updated>2007-03-21T21:19:11-04:00</updated>
    <published>2006-10-09T18:09:42-04:00</published>
    <title>Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">to appear in Nordic Journal of Computing</arxiv:comment>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Daniel Lemire, Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element, Nordic Journal of Computing, Volume 13, Number 4, pages 328-339, 2006</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/X58ymlDd5gk/0610046v5" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0610046v5" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="F.2.1" scheme="http://arxiv.org/schemas/atom" />
  <content type="html">The running maximum-minimum (max-min) filter computes the maxima and minima over running windows of size w. This filter has numerous applications in signal processing and time series analysis. We present an easy-to-implement online algorithm requiring no more than 3 comparisons per element, in the worst case. Comparatively, no algorithm is known to compute the running maximum (or minimum) filter in 1.5 comparisons per element, in the worst case. Our algorithm has reduced latency and memory usage.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=X58ymlDd5gk:d3DFMZJuV-8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=X58ymlDd5gk:d3DFMZJuV-8:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=X58ymlDd5gk:d3DFMZJuV-8:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=X58ymlDd5gk:d3DFMZJuV-8:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=X58ymlDd5gk:d3DFMZJuV-8:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=X58ymlDd5gk:d3DFMZJuV-8:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=X58ymlDd5gk:d3DFMZJuV-8:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=X58ymlDd5gk:d3DFMZJuV-8:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=X58ymlDd5gk:d3DFMZJuV-8:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/X58ymlDd5gk" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0610046v5</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0702143v1</id>
    <updated>2007-02-23T22:11:09-05:00</updated>
    <published>2007-02-23T22:11:09-05:00</published>
    <title>Attribute Value Reordering For Efficient Hybrid OLAP</title>
    
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1016/j.ins.2005.09.005</arxiv:doi>
    <arxiv:journal_ref xmlns:arxiv="http://arxiv.org/schemas/atom">Owen Kaser, Daniel Lemire, Attribute Value Reordering For Efficient Hybrid OLAP, Information Sciences, Volume 176, Issue 16, 2006, Pages 2304-2336</arxiv:journal_ref>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/btmRzzQGA1g/0702143v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0702143v1" rel="related" type="application/pdf" />
    <link title="doi" href="http://dx.doi.org/10.1016/j.ins.2005.09.005" rel="related" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=btmRzzQGA1g:56S--pIu5s4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=btmRzzQGA1g:56S--pIu5s4:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=btmRzzQGA1g:56S--pIu5s4:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=btmRzzQGA1g:56S--pIu5s4:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=btmRzzQGA1g:56S--pIu5s4:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=btmRzzQGA1g:56S--pIu5s4:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=btmRzzQGA1g:56S--pIu5s4:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=btmRzzQGA1g:56S--pIu5s4:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=btmRzzQGA1g:56S--pIu5s4:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/btmRzzQGA1g" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0702143v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0702142v1</id>
    <updated>2007-02-23T21:29:36-05:00</updated>
    <published>2007-02-23T21:29:36-05:00</published>
    <title>An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Martin Brooks</name>
    </author>
    <author>
      <name>Yuhong Yan</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Appeared in ICDM 2005</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/UBhZUFJiaAc/0702142v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0702142v1" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="cs.DS" scheme="http://arxiv.org/schemas/atom" label="Data Structures and Algorithms (cs.DS)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
  <content type="html">Monotonicity is a simple yet significant qualitative characteristic. We consider the problem of segmenting an array in up to K segments. We want segments to be as monotonic as possible and to alternate signs. We propose a quality metric for this problem, present an optimal linear time algorithm based on novel formalism, and compare experimentally its performance to a linear time top-down regression algorithm. We show that our algorithm is faster and more accurate. Applications include pattern recognition and qualitative modeling.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=UBhZUFJiaAc:rAJEOGk0zJQ:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=UBhZUFJiaAc:rAJEOGk0zJQ:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=UBhZUFJiaAc:rAJEOGk0zJQ:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=UBhZUFJiaAc:rAJEOGk0zJQ:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=UBhZUFJiaAc:rAJEOGk0zJQ:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=UBhZUFJiaAc:rAJEOGk0zJQ:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=UBhZUFJiaAc:rAJEOGk0zJQ:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=UBhZUFJiaAc:rAJEOGk0zJQ:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=UBhZUFJiaAc:rAJEOGk0zJQ:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/UBhZUFJiaAc" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0702142v1</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/math/0701481v2</id>
    <updated>2007-01-24T16:01:45-05:00</updated>
    <published>2007-01-17T10:01:09-05:00</published>
    <title>Monotonicity Analysis over Chains and Curves</title>
    
    <author>
      <name>Dan Kucerovsky</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">to appear in Proceedings of Curves and Surfaces 2006</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/Wg51VV95UXA/0701481v2" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/math/0701481v2" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="math.GM" scheme="http://arxiv.org/schemas/atom" label="General Mathematics (math.GM)" />
    <category term="math.GM" scheme="http://arxiv.org/schemas/atom" label="General Mathematics (math.GM)" />
  <content type="html">Chains are vector-valued signals sampling a curve. They are important to motion signal processing and to many scientific applications including location sensors. We propose a novel measure of smoothness for chains curves by generalizing the scalar-valued concept of monotonicity. Monotonicity can be defined by the connectedness of the inverse image of balls. This definition is coordinate-invariant and can be computed efficiently over chains. Monotone curves can be discontinuous, but continuous monotone curves are differentiable a.e. Over chains, a simple sphere-preserving filter shown to never decrease the degree of monotonicity. It outperforms moving average filters over a synthetic data set. Applications include Time Series Segmentation, chain reconstruction from unordered data points, Optical Character Recognition, and Pattern Matching.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Wg51VV95UXA:FL3pJ-pnq-A:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Wg51VV95UXA:FL3pJ-pnq-A:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Wg51VV95UXA:FL3pJ-pnq-A:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Wg51VV95UXA:FL3pJ-pnq-A:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Wg51VV95UXA:FL3pJ-pnq-A:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Wg51VV95UXA:FL3pJ-pnq-A:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Wg51VV95UXA:FL3pJ-pnq-A:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=Wg51VV95UXA:FL3pJ-pnq-A:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=Wg51VV95UXA:FL3pJ-pnq-A:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/Wg51VV95UXA" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/math/0701481v2</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0610010v3</id>
    <updated>2006-10-17T14:05:22-04:00</updated>
    <published>2006-10-03T14:04:22-04:00</published>
    <title>One-Pass, One-Hash n-Gram Statistics Estimation</title>
    
    <author>
      <name>Daniel Lemire</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">The bibliography was not properly updated in the previous submission</arxiv:comment>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/JHZ5L6N-VR8/0610010v3" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0610010v3" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.CL" scheme="http://arxiv.org/schemas/atom" label="Computation and Language (cs.CL)" />
  <content type="html">In multimedia, text or bioinformatics databases, applications query sequences of n consecutive symbols called n-grams. Estimating the number of distinct n-grams is a view-size estimation problem. While view sizes can be estimated by sampling under statistical assumptions, we desire an unassuming algorithm with universally valid accuracy bounds. Most related work has focused on repeatedly hashing the data, which is prohibitive for large data sources. We prove that a one-pass one-hash algorithm is sufficient for accurate estimates if the hashing is sufficiently independent. To reduce costs further, we investigate recursive random hashing algorithms and show that they are sufficiently independent in practice. We compare our running times with exact counts using suffix arrays and show that, while we use hardly any storage, we are an order of magnitude faster. The approach further is extended to a one-pass/one-hash computation of n-gram entropy and iceberg counts. The experiments use a large collection of English text from the Gutenberg Project as well as synthetic data.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=JHZ5L6N-VR8:G-rshc2wCZU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=JHZ5L6N-VR8:G-rshc2wCZU:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=JHZ5L6N-VR8:G-rshc2wCZU:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=JHZ5L6N-VR8:G-rshc2wCZU:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=JHZ5L6N-VR8:G-rshc2wCZU:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=JHZ5L6N-VR8:G-rshc2wCZU:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=JHZ5L6N-VR8:G-rshc2wCZU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=JHZ5L6N-VR8:G-rshc2wCZU:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=JHZ5L6N-VR8:G-rshc2wCZU:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/JHZ5L6N-VR8" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0610010v3</feedburner:origLink></entry>
  <entry>
    <id>http://arxiv.org/abs/cs/0605127v1</id>
    <updated>2006-05-26T20:51:46-04:00</updated>
    <published>2006-05-26T20:51:46-04:00</published>
    <title>Analyzing Large Collections of Electronic Text Using OLAP</title>
    
    <author>
      <name>Steven Keith</name>
    </author>
    <author>
      <name>Owen Kaser</name>
    </author>
    <author>
      <name>Daniel Lemire</name>
    </author>
    <link href="http://feedproxy.google.com/~r/DanielLemiresArticlesOnArxiv/~3/CQcv9oK0NHk/0605127v1" rel="alternate" type="text/html" />
    <link title="pdf" href="http://arxiv.org/pdf/cs/0605127v1" rel="related" type="application/pdf" />
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DB" scheme="http://arxiv.org/schemas/atom" label="Databases (cs.DB)" />
    <category term="cs.DL" scheme="http://arxiv.org/schemas/atom" label="Digital Libraries (cs.DL)" />
  <content type="html">Computer-assisted reading and analysis of text has various applications in the humanities and social sciences. The increasing size of many electronic text archives has the advantage of a more complete analysis but the disadvantage of taking longer to obtain results. On-Line Analytical Processing is a method used to store and quickly analyze multidimensional data. By storing text analysis information in an OLAP system, a user can obtain solutions to inquiries in a matter of seconds as opposed to minutes, hours, or even days. This analysis is user-driven allowing various users the freedom to pursue their own direction of research.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=CQcv9oK0NHk:QviTP4DuJCA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=CQcv9oK0NHk:QviTP4DuJCA:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=CQcv9oK0NHk:QviTP4DuJCA:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=CQcv9oK0NHk:QviTP4DuJCA:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=CQcv9oK0NHk:QviTP4DuJCA:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=CQcv9oK0NHk:QviTP4DuJCA:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=CQcv9oK0NHk:QviTP4DuJCA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?a=CQcv9oK0NHk:QviTP4DuJCA:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/DanielLemiresArticlesOnArxiv?i=CQcv9oK0NHk:QviTP4DuJCA:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/DanielLemiresArticlesOnArxiv/~4/CQcv9oK0NHk" height="1" width="1"/&gt;</content><feedburner:origLink>http://arxiv.org/abs/cs/0605127v1</feedburner:origLink></entry>
</feed>

