<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" gd:etag="W/&quot;Ck8ERn07eCp7ImA9WhRaE0Q.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439</id><updated>2012-02-16T12:13:27.300+01:00</updated><title type="text">Complete Rewrite</title><subtitle type="html">Viewpoints on database systems and computer programming.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/completerewrite" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="completerewrite" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;DkEARHk6eyp7ImA9Wx5QFEw.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-5829253761483504820</id><published>2010-09-01T15:30:00.014+02:00</published><updated>2010-09-02T10:04:05.713+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-09-02T10:04:05.713+02:00</app:edited><title>Moving On</title><content type="html">As you can see, the flow of blog posts here stopped short quite a while ago. Not too early for some interesting content to appear, I hope, but the intended story arc of insights relating to database systems was never fulfilled.&lt;br /&gt;&lt;br /&gt;

I am going to round off this blog now (this will probably be the last post) without fulfilling any of my old “to be covered in future posts” promises. Sorry. Things have moved on, and I have moved on. Instead, I am going to finish off with taking a step back and making some reflections about the role of database systems in people’s minds and in the industry.&lt;br /&gt;&lt;br /&gt;

But first, I just want to explain a few things about this blog, the company that I was working for while writing the previous posts, and which I am no longer working for (&lt;a href="http://www.apptus.com/"&gt;Apptus Technologies&lt;/a&gt;), and what I am doing now and in the future.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Why This Is the End&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

The reason for the decline of this blog was changes at Apptus Technologies that shifted the focus for me and my colleagues a little bit. The company took on a new direction after the international financial crisis of late 2008. We decided to narrow down our scope and build something that would lie as close as possible to the final use our customers typically found for our software, rather than producing a generic platform. Hence, spreading the word of our insights in generic data management, which was the primary mission of this blog, became less of an issue.&lt;br /&gt;&lt;br /&gt;

The new direction, with the &lt;em&gt;Apptus Esales&lt;/em&gt; platform, became quite a success, both in terms of creating a technologically strong product and selling it, and Apptus is again thriving, and hiring new software developers.&lt;br /&gt;&lt;br /&gt;

From now on, however, future development of Apptus will happen mostly without my involvement. After about ten years in the software development business, most of which as Apptus’ head of research, I have decided to give academic research and teaching another shot. In early August, I started as assistant professor at the &lt;a href="http://itu.dk/"&gt;IT University of Copenhagen&lt;/a&gt;. I will certainly make use of what I have learned at Apptus in my new position, and most likely my research will involve developing ideas that have come up at Apptus. Plausibly, my former Apptus colleagues will also be involved in some manner.&lt;br /&gt;&lt;br /&gt;

If you are interested in following what I do, or would like to contact me about research ideas for instance, there are &lt;a href="http://larsson.dogma.net/"&gt;several&lt;/a&gt; entry &lt;a href="http://itu.dk/people/jesl/"&gt;points&lt;/a&gt; for &lt;a href="http://twitter.com/njlarsson"&gt;finding&lt;/a&gt; and contacting me on the Internet.&lt;br /&gt;&lt;br /&gt;

If you are interested in what happens at Apptus, keep track of the fairly recently reworked &lt;a href="http://www.apptus.com/"&gt;Apptus website&lt;/a&gt;. There is some blogging going on there now, and I left some texts behind that you might find there (uncredited) sometime.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Reconsidering Databases&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

The transition I have been going through in the last few months inspired me to make some general reflections on database systems in a wider perspective, which I thought I would share with you. I have asked myself: why do we have database systems, what do people really expect of them, why are people often so dissatisfied with them, and what directions are available for improvement?&lt;br /&gt;&lt;br /&gt;

I figure the things most database systems are intended for are the following:

&lt;ul&gt;
        &lt;li&gt;Persistent storage&lt;/li&gt;
        &lt;li&gt;Secure storage&lt;/li&gt;
        &lt;li&gt;Organizing and structuring data&lt;/li&gt;
        &lt;li&gt;Complex query answering&lt;/li&gt;
        &lt;li&gt;Efficient (quick) query answering&lt;/li&gt;
        &lt;li&gt;Concurrent access&lt;/li&gt;
        &lt;li&gt;Concurrent update&lt;/li&gt;
&lt;/ul&gt;
        
Now, who needs all of that? Hardly anybody. That raises a few questions which I will formulate and give my answers to.&lt;br /&gt;&lt;br /&gt;

&lt;em&gt;1 What are database systems mostly used for?&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;

I haven’t researched  what people use database management systems for, but I think I have enough knowledge and experience to base some thoughts on my personal impression of what most people consider to be the main task of DBMS.&lt;br /&gt;&lt;br /&gt;

I think it’s persistent storage. Period. People who don’t want data to disappear when they shut down the program (or the computer) set up a database system to hold the data. Anyhow, I think that’s what most &lt;em&gt;programmers&lt;/em&gt; (which is what I identify as) consider the DBMS to be mostly about.&lt;br /&gt;&lt;br /&gt;

Everyone doesn’t use something that deserves being called a DBMS for this, but many do. Why, if they need only a single aspect of what the DBMS was designed for? Probably because it’s easy and safe. A DBMS is something they are familiar with how to interact with, and they trust that it won’t lose any data.&lt;br /&gt;&lt;br /&gt;

This is the reason for the &lt;em&gt;object model&lt;/em&gt; for databases that is still out there, puzzling or even annoying advocates of the relational model. People who write object-oriented software want the objects still be there when a program comes back up after being shut down. I might not want to call that a DBMS, but I understand their position. I am not even going to say that it is always a bad idea to base an object persistence library on a general DBMS, but it does seem a bit excessive. It should be possible to create a slimmer and more efficient system designed specifically for object persistence, and I am sure there are such systems out there too.&lt;br /&gt;&lt;br /&gt;

&lt;em&gt;2  What is the main point where database systems don’t deliver?&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;

Maybe I am skewed towards things that have mattered in my own work, but I would say that at least &lt;em&gt;one&lt;/em&gt; major point of dissatisfaction with general database systems is their lack of computational efficiency in delivering query results. We at Apptus found, like many others have, that a standard DBMS is about a factor 100 from the query performance of a well designed search engine, and that there is no principal reason why it has to be.&lt;br /&gt;&lt;br /&gt;

This is the reason why regular DBMS were thrown out of all large-scale search engines on the Internet in the early 2000s. (Well, almost all. Amazon are still using Oracle, I think, but I would presume that the Amazon search engine has evolved into something that is practically hand-written by Oracle technicians by now.)&lt;br /&gt;&lt;br /&gt;

&lt;em&gt;3 Do we really need a single system to do all that a DBMS does?&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;

Since most people are apparently primarily concerned about one, or at least just a few, of the capabilities of a DBMS, can’t we abandon the idea alltogether and split it into a number of systems, or at least components, that are specialized for those few things?&lt;br /&gt;&lt;br /&gt;

My answer to that is that I think we still need the single-system DBMS. It is true that in many cases where it used to be routine to bring in the DBMS, people have started using other things. And rightly so. That development will probably continue. But when you get into concurrency and security, everything gets entangled. I don’t see how that can be lifted out to work by itself. A secure system with concurrency capability just has to have control of everything that is done with the data.&lt;br /&gt;&lt;br /&gt;

So I think there are still businesses where you need a DBMS approximately as we know it. What one could wish for, however, is that DBMS in the future may gain a greater elasticity to satisfy different needs. For instance, it should be easier and more transparent how to sacrifice a bit of update concurrency for query speed, or improving update capabilities by reducing the capability for complex data organization. Today’s SQL systems, at the core still optimized for the needs and hardware resources of the 1980s, are not good at giving us that flexibility.&lt;br /&gt;&lt;br /&gt;

&lt;em&gt;4 For what ascpects does the relational model help?&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;

I myself have frequently tried to sell the importance of a strict data model with a firm basis in logic, and been frustrated by the lack of understanding of the connection between logic and databases that dominates the industry, and actually much of acedemia too. (Although I am less convinced now than I was a few years ago that the relational model is the perfect solution.)&lt;br /&gt;&lt;br /&gt;

But looking at the points above, which of them does a good and logical data model really solve? Actually not many. It is essential only to two of them: organizing and querying complex data. Those that can do without that can do without thinking very much about data models. Some of the other things certainly get a little simpler if you have a good grasp of logic and data modelling, but they are not strictly necessary, and not top priority.&lt;br /&gt;&lt;br /&gt;

This, I think, is reason why most programmers are so indifferent to logical data modelling, even though most of them  should have the ability to both understand and appreciate the elegance of it.&lt;br /&gt;&lt;br /&gt;

That was all I had to say at this point. Not a finished theory to get oriented in the database world, but maybe a few insights that at least I personally am going to keep in mind in choosing the direction of my future research.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-5829253761483504820?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/F2qZhUzPoho" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/5829253761483504820/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=5829253761483504820" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/5829253761483504820?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/5829253761483504820?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2010/09/moving-on.html" title="Moving On" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DU8ARHs4fSp7ImA9WxVUE0s.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-7528716645477083571</id><published>2009-03-17T22:53:00.016+01:00</published><updated>2009-03-18T10:17:25.535+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-18T10:17:25.535+01:00</app:edited><title>The Relational Model Is Not Logical</title><content type="html">by &lt;a href="http://www.blogger.com/profile/05756120755805691277"&gt;Jesper Larsson&lt;/a&gt;
&lt;br /&gt;&lt;br /&gt;

Look up the &lt;a href="http://en.wikipedia.org/wiki/Relational_model"&gt;relational model on Wikipedia&lt;/a&gt;, and the first sentence that hits you (in the current version at least) is “The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar Codd.” It would be nice if that were true. If the dominant model for databases had really taken its starting point in logic, the database field might not be such a mess today.&lt;br /&gt;&lt;br /&gt;

Although there is a clear connection between logic and the relational model, the assertion that the model is &lt;em&gt;based&lt;/em&gt; on logic is at the very least debatable.&lt;br /&gt;&lt;br /&gt;


&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_lqFQ_plhsMs/ScAcL5lY5gI/AAAAAAAAABE/gbbrGdr7VB4/s1600-h/codd_scroll.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 185px;" src="http://2.bp.blogspot.com/_lqFQ_plhsMs/ScAcL5lY5gI/AAAAAAAAABE/gbbrGdr7VB4/s400/codd_scroll.png" border="0" alt="Relational model opening scroll"id="BLOGGER_PHOTO_ID_5314278551058966018" /&gt;&lt;/a&gt;&lt;small&gt;Art by &lt;a href="http://www.flickr.com/photos/30843414@N03/3363784058/"&gt;ctail&lt;/a&gt; (&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Logic and Relations&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

The brilliant idea at the core of the relational model is to use the logical/mathematical concept of relations as the single concept to describe data. In other models, there are typically at
least two concepts: &lt;em&gt;entities&lt;/em&gt; (or objects or whatever) and &lt;em&gt;relationships&lt;/em&gt; (or links, connections, etc.). But in the relational model, one concept is enough.&lt;br /&gt;&lt;br /&gt;

Let us think about a typical real-life example of something that can
be modeled as a relation:
&lt;em&gt;fatherhood&lt;/em&gt;. Fatherhood is a relation between two domains of objects: the set of all possible fathers and the set of all people. If we name the set of fathers F, and use X to denote an element of F (i.e., some father); and similarly call the set of people P and an element of it Y, then we can express the relation something like this:&lt;br /&gt;&lt;br /&gt;

X father_of Y&lt;br /&gt;&lt;br /&gt;

Mathematicians and logicians usually prefer single-letter symbols, and mostly write in prefix notation, so let us shorten father_of to f and write:&lt;br /&gt;&lt;br /&gt;

f(X, Y).&lt;br /&gt;&lt;br /&gt;

We call f a &lt;em&gt;relation symbol&lt;/em&gt;, following Hodges&lt;sup&gt;&lt;a href="#hodges"&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;

There is a point with using single-letter symbols instead of descriptive names, apart from saving space. It stresses that the symbols, as far as their use in logic is concerned, are really just that: abstract symbols that follow formal logical rules that we can define in whatever way we see fit. The &lt;em&gt;meaning&lt;/em&gt; of the symbols is really beyond the scope of the logic. Connections between logical statements and the outside world can only be made through &lt;em&gt;interpretations&lt;/em&gt; – attaching real-world concepts to the symbols.&lt;br /&gt;&lt;br /&gt;

For instance, we can take the statement “X is the father of Y” as the meaning of relation symbol f with arguments X and Y. Such a statement, used to attach a meaning to a relation symbol, is called a
&lt;em&gt;predicate&lt;/em&gt;. (At least according to some authors. There are other uses of the word predicate, a common one being to use predicate more or less in place of relation symbol.)&lt;br /&gt;&lt;br /&gt;

A more direct way to correlate the relation symbol f with the real world, without relying on any &lt;em&gt;meaning&lt;/em&gt; to be evaluated by human judgment, is to provide the exact values for which f evaluates to true. This set of values is a subset of the &lt;a href="http://en.wikipedia.org/wiki/Cartesian_product"&gt;Cartesian product&lt;/a&gt; F&amp;nbsp;×&amp;nbsp;P of the two sets F and P, and an interpretation of f can be given in the form of such a subset of F&amp;nbsp;×&amp;nbsp;P – a set of ordered pairs of values for which f evaluates to true.&lt;br /&gt;&lt;br /&gt;

For instance, one interpretation could be:&lt;br /&gt;&lt;br /&gt;

{ (anakin, leia), (george, george_w), (frank, dweezil) }.&lt;br /&gt;&lt;br /&gt;

There you have an explanation for the common way to define the term relation in mathematics simply as a subset of a Cartesian product.&lt;br /&gt;&lt;br /&gt;

A basic idea of the relational model is that sets such as these           
– relations, or &lt;em&gt;extensions&lt;/em&gt; of the corresponding predicate – should be the only concept used for representing all data in the database,          
although as we shall see, it commonly is not.&lt;br /&gt;&lt;br /&gt;

The values &lt;em&gt;anakin&lt;/em&gt;, &lt;em&gt;leia&lt;/em&gt;, &lt;em&gt;george&lt;/em&gt;, etc. are supposed to be objects from the “real world”, but when displaying the value like this we obviously have to use some sort of representative (in the form of character strings) rather than the actual objects (people). It is to break out of traps like this one that texts on logic sometimes contain peculiar assertions like “constants are interpreted as themselves.”&lt;br /&gt;&lt;br /&gt;

The parenthood relation is a binary, or &lt;em&gt;2-ary&lt;/em&gt;, relation; and its elements are pairs, or &lt;em&gt;2-tuples&lt;/em&gt;. In general, a relation symbol can be n-ary for any natural number n. For instance, we can write&lt;br /&gt;&lt;br /&gt;

p(X, Y, Z)&lt;br /&gt;&lt;br /&gt;

and let p be interpreted by, for instance, the predicate “X is a person who was born in city Y in the year X”, or by the relation&lt;br /&gt;&lt;br /&gt;

&lt;table  border="0" cellpadding="2" cellspacing="0"&gt;
&lt;tr&gt;&lt;td&gt;{&lt;/td&gt; &lt;td&gt;(george, milton, 1924),&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;  &lt;td&gt;&lt;/td&gt; &lt;td&gt;(george_w, new_haven, 1946),&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;  &lt;td&gt;&lt;/td&gt; &lt;td&gt;(frank, baltimore, 1940),&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;  &lt;td&gt;&lt;/td&gt; &lt;td&gt;(dweezil, los_angeles, 1969) }.&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;&lt;br /&gt;

This demonstrates the elegant uniform-concept idea of representing data as relations. In a model that has both entities and relationships, the &lt;em&gt;person with birth city and year&lt;/em&gt; data would typically be perceived as an entity, while &lt;em&gt;parenthood&lt;/em&gt; would be a relationship between records of the person entity. In the relational model, both sets of data are on the same basic form: relation. A fatherhood relation between two person object domains, and a person relation among the three domains
&lt;em&gt;person name&lt;/em&gt;, &lt;em&gt;city&lt;/em&gt;, and &lt;em&gt;year&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_lqFQ_plhsMs/ScAeP937j8I/AAAAAAAAABc/dKaqtRyyz-A/s1600-h/where_is_the_logic.jpg"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 288px; height: 352px;" src="http://1.bp.blogspot.com/_lqFQ_plhsMs/ScAeP937j8I/AAAAAAAAABc/dKaqtRyyz-A/s400/where_is_the_logic.jpg" border="0" alt="Where is the logic?"id="BLOGGER_PHOTO_ID_5314280819953209282" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;small&gt;Photo by &lt;a href="http://www.flickr.com/photos/plindberg/6113436/"&gt;Peter Lindberg&lt;/a&gt; (&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Non-Logical Data&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

A convenient way of visualizing a relation, common in association with the relational database model, is to use a table:&lt;br /&gt;&lt;br /&gt;

&lt;center&gt;
&lt;table  border="1" cellpadding="2" cellspacing="0"&gt;
&lt;caption&gt;parent_of&lt;/caption&gt;
&lt;tr&gt;&lt;th&gt;father (possible_father)&lt;/th&gt;&lt;th&gt;child (person)&lt;/th&gt;&lt;/tr&gt;

&lt;tr&gt;&lt;td&gt;anakin&lt;/td&gt;&lt;td&gt;leia&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;george&lt;/td&gt;&lt;td&gt;george_w&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;frank&lt;/td&gt;&lt;td&gt;dweezil&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;&lt;br /&gt;

In the relational database model, the heading (schema), which is comprised of a &lt;em&gt;name&lt;/em&gt; and &lt;em&gt;data type&lt;/em&gt; for each column in the table (or in an alternative terminology, for each attribute in the relation). These &lt;em&gt;metadata&lt;/em&gt; are not percieved as actual data in the relational database model, even though it is something that the user of the database system is expected to provide.&lt;br /&gt;&lt;br /&gt;

In addition, relational database systems typically comprise &lt;em&gt;constraints&lt;/em&gt; of one kind of another. These are also supplied in ad hoc, non-relational, ways.&lt;br /&gt;&lt;br /&gt;

So, the relational database model breaks its own principle, that all data is represented in the form of relations, by introducing several additional forms of data. Classical logic finds no need for these special forms.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Algebra vs Logic&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

In his original account of the relational model,&lt;sup&gt;&lt;a href="#coddsixtynine"&gt;2&lt;/a&gt;, &lt;a href="#coddseventy"&gt;3&lt;/a&gt;&lt;/sup&gt; Codd did
briefly suggests that a data language based on first order predicate
calculus should be developed, but, unfortunately, that did not make
much of an impression on the database community. The paper also
contained the embryo of another data language: &lt;em&gt;relational
algebra&lt;/em&gt;, a set of “operations on relations” which
was later modified and expanded, and inspired various database
languages. The languages grew more and more ad hoc and complex, and
distanced from the ideas of logic. During the 1970s, the database language SQL was developed, with at least some involvement from Codd, and today SQL in various forms is
the king of illogical relational languages.&lt;br /&gt;&lt;br /&gt;

Obviously, many people have noted the complexity and overhead problems of the relational systems in use today. Unfortunately, many draw the incorrect conclusion that the problems are in the relational model as such, rather than the implementations being stuck in old ways, partly caused by inadequacies buried in the SQL language. Hence, they choose to revert back to pre-relational tree-, network-, or record-oriented schemes.&lt;br /&gt;&lt;br /&gt;

Some however – particularly &lt;a href="http://en.wikipedia.org/wiki/Christopher_J._Date"&gt;C.&amp;nbsp;J. Date&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Hugh_Darwen"&gt;Hugh Darwen&lt;/a&gt;, sanctioned by the whining but brilliant grump &lt;a href="http://en.wikipedia.org/wiki/Fabian_Pascal"&gt;Fabian Pascal&lt;/a&gt; – have
made heroic attempts to bring the relational model back to its roots.
They have defined their own relational language dubbed “&lt;a href="http://en.wikipedia.org/wiki/D_(data_language_specification)"&gt;D&lt;/a&gt;”, which throws out the
worst atrocities of SQL, and strengthens the declarative aspect.&lt;br /&gt;&lt;br /&gt;

They do not, however, go all the way to the elegance and simplicity of
classical logic. (Although they have, as an aside in some of the &lt;a href="http://en.wikipedia.org/wiki/The_Third_Manifesto"&gt;Third Manifesto&lt;/a&gt; writings, taken the first step towards doing so with the alternative algebra “A”.) That is a pity, because expressing data in pure
logical form makes for an excellent &lt;a href"http://completerewrite.blogspot.com/2008/10/practical-stuff-models-theories.html"&gt;model&lt;/a&gt;. In logic, there is nothing special with concepts such as data types, constraints, comparisons, or arithmetic operations. They are just special cases of the basic simple constructs. Arguably, logic can provide the
simplest model possible that allows defining and accessing databases, avoiding a lot of the awkwardness that
comes from the directional &lt;em&gt;operations&lt;/em&gt; of relational algebra.&lt;br /&gt;&lt;br /&gt;

Others have realized this, of course. As early as in the 1970s
there was substantial research on the connection between logic and
databases. Look back, and you will find pioneer work by, among others,
&lt;a href="http://en.wikipedia.org/wiki/Jack_Minker"&gt;Jack Minker&lt;/a&gt; and
&lt;a href="http://en.wikipedia.org/wiki/Raymond_Reiter"&gt;Raymond
Reiter&lt;/a&gt;, and a lively research community. But for some reason, this
was mostly not considered database research, but classified as
&lt;em&gt;artificial intelligence&lt;/em&gt; or &lt;em&gt;logic programming&lt;/em&gt;, and
database people appear to have taken little notice. The field more or
less died out in the late 1990s, without having come through to the
industry or database community.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Simplicity Enables Efficiency&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

Taking a logical approach to databases is not just an exercise in
mathematical aesthetics. One of the most important points is simplifying the interface to database management core,
in keeping the interface to higher levels of the system small. This takes a substantial workload off the shoulders of the system implementer.&lt;br /&gt;&lt;br /&gt;

In particular, a logical description of operations simplifies formulation of &lt;em&gt;optimization&lt;/em&gt; tasks of the database system. Query
optimization for SQL, with all its complexity, quirks, and
inconsistencies, is a nightmare. Furthermore, for various reasons
(which I hope to get back to in a future post), the actual
operation of database applications using SQL systems are mostly not
expressed in the language of the database at all. This takes the job of
capable overall optimization from the level of practically impossible
to &lt;em&gt;theoretically&lt;/em&gt; impossible.&lt;br /&gt;&lt;br /&gt;

Optimization of database operations on the basis of pure logic, on the other hand, allows the system to concentrate on the actual computational complexity issues of translating logical formulae into execution plans. Not that this makes it easy, but arguably it makes it as simple as theoretically possible.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Revival?&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

&lt;a href="http://db.cs.berkeley.edu/claremont/"&gt;The Claremont Report&lt;/a&gt; on Database Research, a document that has come out of a meeting of a number of distinguished database specialists in May 2008, mentions that “new declarative languages, often grounded in &lt;a href="http://en.wikipedia.org/wiki/Datalog"&gt;Datalog&lt;/a&gt;, have recently been developed
for a variety of domain-specific systems.” Datalog is a Prolog-like language for database access, which dominated the database logic research during the 1980s, so this statement could be taken as an indication that a revival for logic in database access is on the horizon. I suppose our platform, Apptus Theca 2.0, can be said to be one such system, although our current interface is based on a graphical user interface rather than a language in text form.&lt;br /&gt;&lt;br /&gt;

I have not, however, come into direct contact with any such projects apart from our own, neither academic or commercial. If you are involved in, or aware of, anything like this, I would love to hear from you.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;&lt;small&gt;References&lt;/small&gt;&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

&lt;a name=hodges&gt;&lt;strong&gt;&lt;small&gt;1&lt;/small&gt;&lt;/strong&gt;&lt;/a&gt;
Wilfrid Hodges,
&lt;em&gt;Classical Logic I: First-Order Logic&lt;/em&gt;, The Blackwell Guide to Philosophical Logic (Lou Goble, ed.),  Blackwell Publishers, 2001.
&lt;br/&gt;
&lt;a name=coddsixtynine&gt;&lt;strong&gt;&lt;small&gt;2&lt;/small&gt;&lt;/strong&gt;&lt;/a&gt;
Edgar F. Codd,
&lt;em&gt;Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks&lt;/em&gt;,
IBM Research Report, 1969.
&lt;br/&gt;
&lt;a name=coddseventy&gt;&lt;strong&gt;&lt;small&gt;3&lt;/small&gt;&lt;/strong&gt;&lt;/a&gt;
Edgar F. Codd,
&lt;em&gt;A Relational Model of Data for Large Shared Data Banks&lt;/em&gt;,
Communications of the ACM 13 (1970), no&amp;nbsp;6, 377–387.
&lt;br/&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-7528716645477083571?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/OsE6xfgk4-k" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/7528716645477083571/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=7528716645477083571" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/7528716645477083571?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/7528716645477083571?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2009/03/relational-model-is-not-logical.html" title="The Relational Model Is Not Logical" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_lqFQ_plhsMs/ScAcL5lY5gI/AAAAAAAAABE/gbbrGdr7VB4/s72-c/codd_scroll.png" height="72" width="72" /><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;AkENSXszeCp7ImA9WxRVEEs.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-3508710078491255189</id><published>2008-11-07T13:53:00.010+01:00</published><updated>2008-11-07T15:38:18.580+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-07T15:38:18.580+01:00</app:edited><title>Fight Features!</title><content type="html">by &lt;a href="http://www.blogger.com/profile/05756120755805691277"&gt;Jesper Larsson&lt;/a&gt;
&lt;br /&gt;&lt;br /&gt;

&lt;blockquote cite="http://www.cs.vu.nl/~ast/brown/"&gt;

I still fervently believe that the only way to make software secure,
reliable, and fast is to make it small. Fight Features.&lt;br /&gt;

&amp;#x2013; &lt;a href="http://www.cs.vu.nl/~ast/brown/"&gt;Andrew S. Tanenbaum&lt;/a&gt;
&lt;/blockquote&gt;
&lt;br /&gt;

&amp;#x201c;Features&amp;#x201d;, &amp;#x201c;power&amp;#x201d;, and &amp;#x201c;richness&amp;#x201d; are generally perceived as good things in computer systems, not as problems. Still, most people who use a computer have some experience of their problematic side. Quoting &lt;a href="http://en.wikipedia.org/wiki/Feature_creep"&gt;Wikipedia&lt;/a&gt;: &amp;#x201c;Extra features go beyond the basic function of the product and so can result in baroque over-complication rather than simple, elegant design.&amp;#x201d;
&lt;br /&gt;&lt;br /&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_lqFQ_plhsMs/SRQ61dRab2I/AAAAAAAAAA0/5DFuz93ieJM/s1600-h/wiring.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 282px;" src="http://2.bp.blogspot.com/_lqFQ_plhsMs/SRQ61dRab2I/AAAAAAAAAA0/5DFuz93ieJM/s400/wiring.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5265898554368094050" /&gt;&lt;/a&gt;
&lt;small&gt;Photo by &lt;a href="http://flickr.com/photos/28481088@N00/292898253/"&gt;tanakawho&lt;/a&gt; (&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;

I don't expect to ever be able to convince a marketing person to use &lt;em&gt;not adding any power&lt;/em&gt; or &lt;em&gt;with the most rudimentary features only&lt;/em&gt; as a sales point. But I do sometimes get disappointed when I see the naive attitude that even experienced computing technology people take to new or exotic functionality. I am going to give you some examples from the field that all programmers have opinions about &amp;#x2013; programming languages &amp;#x2013; before finishing off with a little bit about the issue in other areas, such as database systems. Since most of the text is directed at programmers, it may be a little more technical than the previous posts on this blog. &lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Cool?&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

It was at a developers conference a few years ago that I had my major wake up call as to how uncommon my views seem to be among programmers. There were a number of presentations about new features in various languages and systems, and I was struck by how uncritically people received this. I particularly remember a reaction from the audience on a presentation of a new object language or framework of some kind, where apparently a core feature was that you were free to modify objects dynamically in any way, for instance by adding methods.
&lt;br /&gt;&lt;br /&gt;

&lt;em&gt;Audience member:&lt;/em&gt; Can you &lt;em&gt;remove&lt;/em&gt; a method?
&lt;br /&gt;

&lt;em&gt;Presenter:&lt;/em&gt; Yes.
&lt;br /&gt;

&lt;em&gt;Audience member:&lt;/em&gt; Cool!
&lt;br /&gt;&lt;br /&gt;

I was flabbergasted. What on earth would be the need to remove a method? To me it seemed like an extremely &lt;em&gt;uncool&lt;/em&gt; feature!
&lt;br /&gt;&lt;br /&gt;

The standard reaction to this is &amp;#x201c;So? If you don't like it you don't have to use it.&amp;#x201d; This always makes me want to jump up and down screaming in furious frustration. I am not sure if it is just because it is an incredibly stupid comment, or because it is something I would have been likely to say myself when I was a young and inexperienced programmer. I wish I could go back twenty years and tell myself off about it, but I wonder if I could have convinced myself &amp;#x2013; probably not, I was a pretty obstinate young man on these issues.
&lt;br /&gt;&lt;br /&gt;

Anyway, if I did get a chance to persuade myself, there are four main arguments I would have used. All based on experience I did not have back then.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;1. Blurring the focus of the system and its creator&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

This is rather obvious really, but it is easy to underestimate the extent to which limited time and resources keeps you from doing everything you can think of. If the developers of a system have to write and test code for fifty features that someone might find useful, there is going to be considerably less time to spend on things that are crucial for almost everyone.
&lt;br /&gt;&lt;br /&gt;

In the programming language context, the time spent to make sure that the &lt;em&gt;extra feature&lt;/em&gt; code worked properly could have been spent thinking about code generation for some common language construct. For instance, optimizing to get &lt;a href="http://completerewrite.blogspot.com/2008/10/practical-stuff-models-theories.html"&gt;tight loops&lt;/a&gt; to run really fast.
&lt;br /&gt;&lt;br /&gt;

You may say that this is a problem for the developer that creates the system, not for you who are just using it. But that is not true: it &lt;em&gt;is&lt;/em&gt; a problem for you if you cannot get tight loops to run fast. It becomes a problem for the system vendor only if it is such a big problem for you that you stop using the system.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;2. Blurring the focus of the user&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

I think my first realization that less possibilities increased my efficiency came when I wrote a rather large automatic proof program in &lt;a href="http://en.wikipedia.org/wiki/Lisp_(programming_language)"&gt;Lisp&lt;/a&gt;, without using any looping constructs &amp;#x2013; only recursion for repetition. This severely limited my ways of expressing what the program should do. The result: &lt;em&gt;I no longer had to think about how to express it!&lt;/em&gt; I could concentrate fully on figuring out how to solve the &lt;em&gt;real&lt;/em&gt; problems!
&lt;br /&gt;&lt;br /&gt;

I had a similar feeling moving from C++ to Java. C++ is a mess of language constructs taken from at least three different programming paradigms. You can write C++ code in pretty much whichever way you like, and specify everything in detail. This makes me change the code back and forth, unable to decide. &amp;#x201c;&lt;em&gt;This method should probably be declared virtual. No, wait, I can probably make it static &amp;#x2013; that will save a few machine instructions. No, sorry, it has to be virtual. But if I change this call I can make it static.&lt;/em&gt;&amp;#x201d;
&lt;br /&gt;&lt;br /&gt;

Java is far from a perfect object-oriented language, but at least it is much stricter than C++. If you write Java without thinking in objects, your code tends to kick and squeal until you get it into a reasonably attractive structure.
&lt;br /&gt;&lt;br /&gt;

Working on Ericsson's &lt;a href="http://en.wikipedia.org/wiki/WAP_gateway"&gt;WAP gateway&lt;/a&gt; I was forced back into C++, but then a strict &lt;a href="http://www.doc.ic.ac.uk/lab/cplus/c++.rules/"&gt;set of coding rules&lt;/a&gt; was imposed on me. The rules said things like &amp;#x201c;All classes which are used as base classes and which have virtual functions, must define a virtual destructor.&amp;#x201d; Again, this narrowed down the choices, and made it much easier to create reasonably well-structured code. Of course, it would have been nice if the compiler would have assisted in enforcing those rules.
&lt;br /&gt;&lt;br /&gt;

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_lqFQ_plhsMs/SRQ61RURcyI/AAAAAAAAAA8/dC4EsU6LTsc/s1600-h/sweets.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 240px;" src="http://1.bp.blogspot.com/_lqFQ_plhsMs/SRQ61RURcyI/AAAAAAAAAA8/dC4EsU6LTsc/s400/sweets.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5265898551158862626" /&gt;&lt;/a&gt;

&lt;small&gt;Photo by &lt;a href="http://flickr.com/photos/me_ram/2983616264/"&gt;Ramchandran Maharajapuram&lt;/a&gt; (&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;

&amp;#x201c;Syntactic sugar&amp;#x201d; people call constructs that let them say the same thing slightly more concisely. Saving you from typing the same thing over and over is obviously nice, but what you gain in typing you may lose to the distraction of having to think about which syntax to use. Syntactic sugar may have a smooth taste, but too much of it erodes your programming teeth.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;3. Others will use it&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

If you are a young student or hobbyist creating one little program after another for fun or examination, you may think that software development is just about &lt;em&gt;writing&lt;/em&gt; code. But once you work in a reasonable large project, or come in contact with running code that needs to be maintained, you realize, first, that you are not in complete control of how the code is written and, second, that much of the work is &lt;em&gt;reading&lt;/em&gt; code.
&lt;br /&gt;&lt;br /&gt;

Hence, not using a construct when you write your own code does not save you from being bothered by it. In order to read what others have written, you need to understand what everything means. The more features a language contains, the more difficult it is to learn to read. The more different ways of doing the same thing it allows, the more different other people's code looks from yours &amp;#x2013; again making it difficult for you to decipher.
&lt;br /&gt;&lt;br /&gt;

One way of warding off criticism against potentially harmful features is that they are no problem when used correctly, with disciplin. True as that may be, not everyone who writes the code that you come in contact with has exactly the same disciplin, values, and experience as you do. Losing the possibility of carefully using a feature yourself is often a small price to pay to prevent others from using it hazardously.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;4. Optimization&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

I already mentioned that adding features may have optimization suffer because developers have less time, but there is one more issue concerning efficiency: the possibility of some features being used may keep the compiler from making assumtions that would allow it to generate considerably more efficient code.
&lt;br /&gt;&lt;br /&gt;

An old example from the C language is the option to modify all sorts of variables through pointers. A neat trick in some situations perhaps, but it makes it difficult for the compiler to know if it can keep values in CPU registers or not. Even if the program does not contain any such side effects, that may not be known in compile time. Hence, to make sure that the program runs correctly, values have to be repeatedly written down to main memory and read back, just because of the existance of a rarely used feature of the language.
&lt;br /&gt;&lt;br /&gt;

Another example is when the assembling of program components is highly dynamic, such as with dynamic class loading in Java. As I mentioned in a &lt;a href="http://completerewrite.blogspot.com/2008/10/practical-stuff-models-theories.html"&gt;previous post&lt;/a&gt;, I and my colleagues have been quite impressed with the optimization capabilities of Sun's &lt;a href="http://en.wikipedia.org/wiki/HotSpot"&gt;HotSpot&lt;/a&gt; virtual machine. It has allowed us to produce high-performance systems completely written in Java. But there are some situations where optimization is less effective. Particularly, inlining does not happen as often as we would like it to. In part, this is due to the general difficulty of figuring out which method really gets invoked  in languages that support polymorphism. But it is made worse by dynamic class loading, because the system can never be sure if execution is going to involve code that has not been seen yet.
&lt;br /&gt;&lt;br /&gt;

Polymorphic method invocation and dynamic class loading are wonderful features. I would not want to lose any of them. But they do have their drawbacks for efficiency. They make the optimizer's task more difficult, even in situations where they are not used.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Not Just About Programming&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

I am obviously not the first to have noticed the &lt;a href="http://en.wikipedia.org/wiki/Feature_creep"&gt;feature creep&lt;/a&gt; problem, and it is certainly not limited to programming languages. It is everywhere in computing. The opening quote from Tanenbaum is about operating systems, and the problem is huge in &lt;a href="http://completerewrite.blogspot.com/2008/10/why-database-masters-fail-us.html"&gt;database systems&lt;/a&gt;.
&lt;br /&gt;&lt;br /&gt;

For example, the more recent versions of the &lt;a href="http://en.wikipedia.org/wiki/Sql"&gt;SQL&lt;/a&gt; standard is virtually impossible to implement in full. It is well known that this &amp;#x201c;richness of SQL&amp;#x201d;, as Surajit Chaudhuri diplomatically (or perhaps sarcastically) put it in his recent &lt;a href="http://db.cs.berkeley.edu/claremont/"&gt;Claremont&lt;/a&gt; presentation, is a major obstacle for efficiency in database systems.
&lt;br /&gt;&lt;br /&gt;
 
It is even an issue for &lt;a href=""&gt;data models&lt;/a&gt; themselves. But in that area, there is hope of some progress in the right direction &amp;#x2013; towards simplicity. More on that in future posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-3508710078491255189?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/jytEsv35rPE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/3508710078491255189/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=3508710078491255189" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/3508710078491255189?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/3508710078491255189?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2008/11/fight-features.html" title="Fight Features!" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_lqFQ_plhsMs/SRQ61dRab2I/AAAAAAAAAA0/5DFuz93ieJM/s72-c/wiring.jpg" height="72" width="72" /><thr:total>6</thr:total></entry><entry gd:etag="W/&quot;DkEFQn4yfip7ImA9WxRUFkU.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-9059954231481733099</id><published>2008-10-30T16:59:00.007+01:00</published><updated>2008-11-26T08:30:13.096+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-26T08:30:13.096+01:00</app:edited><title>Models and Efficiency, part 2: Databases</title><content type="html">by &lt;a href="http://www.blogger.com/profile/05756120755805691277"&gt;Jesper Larsson&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;

Databases is one of the few areas in computing where there is a noticeable awareness of &lt;a href="http://completerewrite.blogspot.com/2008/10/practical-stuff-models-theories.html"&gt;&lt;em&gt;models&lt;/em&gt;&lt;/a&gt;. At least, models are talked about. But actually quite few of the people who work with database systems have more than a very vague view of what data models are or what purpose they serve.
&lt;br /&gt;&lt;br /&gt;


&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_lqFQ_plhsMs/SQnapTy3fCI/AAAAAAAAAAc/imXe9XKpNOA/s1600-h/data_port.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://4.bp.blogspot.com/_lqFQ_plhsMs/SQnapTy3fCI/AAAAAAAAAAc/imXe9XKpNOA/s400/data_port.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5262978042782252066" /&gt;&lt;/a&gt;
&lt;small&gt;Photo by &lt;a href="http://flickr.com/photos/marcoarment/1961500621/"&gt;Marco Arment&lt;/a&gt; (&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;

I explained my view of what a model is in the &lt;a href="http://completerewrite.blogspot.com/2008/10/practical-stuff-models-theories.html"&gt;previous post&lt;/a&gt;. A &lt;em&gt;data model&lt;/em&gt; then, is a model for dealing with general-purpose data. Data as such is not that common in most computer users' experience these days. People usually have no need to think of the data that carries the information they work with, and systems are kind enough not to bother the users with the low-level concept of data. Only systems that deal with &lt;em&gt;any&lt;/em&gt; kind of data, regardless of what it represents, need to include the notion of &lt;em&gt;data in itself&lt;/em&gt; in the model they expose to the user. Particularly, data is a concern for database systems &amp;#x2013; which support intricate operations on general-purpose data. Hence the importance of data models in this context. &lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;The Dominating Data Model&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

Out of the plethora of data models in use by database systems (some more vaguely defined than others), the most popular one, to the extent that we can skip all the others for the time being, is the &lt;a href="http://en.wikipedia.org/wiki/Relational_model"&gt;relational model&lt;/a&gt;. Or perhaps I should say &lt;em&gt;subset&lt;/em&gt; of the &lt;em&gt;family&lt;/em&gt; of relational models &amp;#x2013; for there are several. Even the instigator of the term, E.F. Codd, did not present a single coherent model, but seems to have invented a new slightly different variant practically every time he presented it. For this post, however, it will do to to talk about the relational model as if it was one.
&lt;br /&gt;&lt;br /&gt;

An average database professional trying to explain the relational model typically starts by stating that in the relational model, data are stored in &lt;em&gt;tables&lt;/em&gt;. Not a good start. First of all, tables do not really belong in the model, they are visualizations of &lt;em&gt;relations&lt;/em&gt; &amp;#x2013; but that is a minor point.
&lt;br /&gt;&lt;br /&gt;

More importantly, &lt;em&gt;the model says nothing of storage!&lt;/em&gt; It acts as an &lt;em&gt;interface&lt;/em&gt;, for direct users and for surrounding systems that interact with the database. Behind the scenes of the relational model, the system can deal with storage in whichever way it pleases &amp;#x2013; or whichever way the database administrator configures it. There are endless possibilities for data structures and &lt;a href="http://www.dbms2.com/2008/01/31/5-kinds-of-data-structure-and-16-kinds-of-data-access-method/"&gt;access methods&lt;/a&gt;
&lt;br /&gt;&lt;br /&gt;


&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_lqFQ_plhsMs/SQnaprJtBfI/AAAAAAAAAAk/IQtIqpEjN-c/s1600-h/pdp11.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/_lqFQ_plhsMs/SQnaprJtBfI/AAAAAAAAAAk/IQtIqpEjN-c/s400/pdp11.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5262978049052050930" /&gt;&lt;/a&gt;
&lt;small&gt;Photo by &lt;a href="http://flickr.com/photos/philaaronson/2485460774/"&gt;Phil Aaronson&lt;/a&gt; (&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;

The implementers of early database systems, such as &lt;a href="http://en.wikipedia.org/wiki/System_R"&gt;System R&lt;/a&gt;, chose to use a simple one-to-one mapping between the model and the physical storage. It was a natural choice as a first attempt. What they were doing was trying things out, not creating the ultimate system.
&lt;br /&gt;&lt;br /&gt;

But somehow, the idea of a simple connection between model and storage got stuck in people's minds, and in the major database systems on the market.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Logical&amp;#x2013;Physical Separation&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

Recall the inverted text example from &lt;a href="http://completerewrite.blogspot.com/2008/10/text-search-and-relational-model.html"&gt;a previous post&lt;/a&gt;. Here is a variant with the less interesting words removed, and with only the document numbers where the words appear:
&lt;br /&gt;&lt;br /&gt;

&lt;center&gt;
&lt;table border="1" cellpadding="2" cellspacing="0"&gt;
&lt;tr&gt;&lt;th&gt;Word&lt;/th&gt; &lt;th&gt;Doc. no&lt;/th&gt;

&lt;tr&gt;&lt;td&gt;art&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;
&lt;tr&gt;&lt;td&gt;art&lt;/td&gt; &lt;td&gt;3&lt;/td&gt;
&lt;tr&gt;&lt;td&gt;war&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;
&lt;tr&gt;&lt;td&gt;war&lt;/td&gt; &lt;td&gt;2&lt;/td&gt;
&lt;tr&gt;&lt;td&gt;modern&lt;/td&gt; &lt;td&gt;3&lt;/td&gt;
&lt;tr&gt;&lt;td&gt;peace&lt;/td&gt; &lt;td&gt;2&lt;/td&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;

Many people would say that this is an inefficient representation, because it contains multiple copies of the words that appear in more than one document. To make the representation more compact, they might try using a &lt;em&gt;list&lt;/em&gt; of document numbers as the second column, perhaps specified as a &amp;#x201c;LOB&amp;#x201d; &amp;#x2013; a &lt;em&gt;large object&lt;/em&gt;.
&lt;br /&gt;&lt;br /&gt;

This is exactly what major database vendors do for their text indexes, which is one of the many choices they make to go against the basic idea of the relational model. Unnecessarily so.
&lt;br /&gt;&lt;br /&gt;

Because, again, the model is not the storage. It is the user interface. The relational structure is not necessarily the same as that in the physical representation. The following figure illustrates how the logical model representation (at the top) can be tied to a compact physical disk storage (bottom).
&lt;br /&gt;&lt;br /&gt;


&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_lqFQ_plhsMs/SQnaqGo07jI/AAAAAAAAAAs/k_HTQs8qUgQ/s1600-h/logical_separation.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 357px;" src="http://4.bp.blogspot.com/_lqFQ_plhsMs/SQnaqGo07jI/AAAAAAAAAAs/k_HTQs8qUgQ/s400/logical_separation.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5262978056430349874" /&gt;&lt;/a&gt;

This is just one possible representation, of course. Ideally, the physical representation should be chosen so that it makes critical operations efficient in the particular application where the database is to be used. The relational model still lets users access data in any way they please, but queries that the chosen implementation is not directly suited for may be execute less efficiently.
&lt;br /&gt;&lt;br /&gt;

A problem with the basic design of major relational database systems is that they do not have enough information to make a good choice of representation. It is impossible for them to obtain the information about the application that they would need.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;The Basis of Misunderstanding&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;

People get the wrong idea about the role of the data model partly because they start using systems without learning the basic theory, but another reason, which I consider more critical, is how the user is forced to interact with the system in order to get reasonable performance. (By &lt;em&gt;user&lt;/em&gt; in this case, I mean the person who uses the database system for implementing an application.)
&lt;br /&gt;&lt;br /&gt;

In a perfect world, the way it is supposed to work is that the user provides logical definitions of the data, and the system somehow magically chooses the best implementation. This is a tough task, not least because the system must choose the implementation &lt;em&gt;before&lt;/em&gt; it has had a chance of getting any feedback as to its use. In practice, this simply does not work.
&lt;br /&gt;&lt;br /&gt;

Hence, the user, who does have an idea about which operations need to be efficient, needs to convey more information to the system than just the logical definitions. (Or move to a &lt;a href="http://marklogic.blogspot.com/2008/08/specialized-database-argument.html"&gt;specialized database system&lt;/a&gt; for the application, if there is one.) And the only way that the user can interact with the system is through the relational interface!
&lt;br /&gt;&lt;br /&gt;

This has two effects. First, database system vendors add physical aspects to the relational interface that are really not relational or logical at all. &lt;a href="http://en.wikipedia.org/wiki/SQL"&gt;SQL&lt;/a&gt; is full of this &amp;#x2013; &lt;em&gt;indexes&lt;/em&gt; for example. There are no indexes in the relational model; they belong in the implementation.
&lt;br /&gt;&lt;br /&gt;

Second, the user is led into using implicit couplings between the relational definitions and the physical storage for improving performance. &lt;em&gt;Denormalization&lt;/em&gt; is a blatant example. Relational theory tells us that that the relational structure of the database should be &lt;a href="http://en.wikipedia.org/wiki/Database_normalization"&gt;&lt;em&gt;normalized&lt;/em&gt;&lt;/a&gt; to avoid integrity problems. Yet, practitioners choose to do the &lt;em&gt;opposite&lt;/em&gt; because it can give them a performance advantage.
&lt;br /&gt;&lt;br /&gt;

The result of all this is that it is difficult to spot the actual relational model in the major relational database systems.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Throwing Out the Model&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

People have started to move away from relational database systems, mainly because of performance issues. A range of &lt;a href="http://www.databasecolumn.com/2008/02/responding-to-monash-1.html"&gt;more specialized systems&lt;/a&gt; for narrower applications is emerging. I have no principle objection to that; using &lt;a href="http://www.databasecolumn.com/2007/09/one-size-fits-all.html"&gt;the same database system for everything&lt;/a&gt; is no end in itself.
&lt;br /&gt;&lt;br /&gt;

However, I am convinced that there will still be a place for systems that are general-purpose enough to explicitly expose &lt;em&gt;data&lt;/em&gt; to the user, without narrowing down what the data may represent, and accept complex specifications on relationships and retrieval operations. For this, a simple logical model that represents data as relations is extremely powerful.
&lt;br /&gt;&lt;br /&gt;

Unfortunately, the relational model has become so closely associated with SQL that people tend to discard the model because they are not satisfied with SQL. &lt;a href="http://cloudn.com/?p=44"&gt;The end of the relational era&lt;/a&gt; is proclaimed, when it should really be just the end of the &lt;em&gt;SQL&lt;/em&gt; era.
&lt;br /&gt;&lt;br /&gt;

I agree with the &lt;a href="http://completerewrite.blogspot.com/2008/10/why-database-masters-fail-us.html"&gt;relational evangelists&lt;/a&gt; that the reason why SQL databases are so displeasing is not that they are relational. Rather, one of their shortcomings is that they are &lt;em&gt;not&lt;/em&gt; really relational. However, I am not convinced that the &lt;a href="http://www.thethirdmanifesto.com/"&gt;&lt;em&gt;third manifesto&lt;/em&gt;&lt;/a&gt; systems that the &lt;em&gt;true relational&lt;/em&gt; lobby brings forward can take their place. The relational model itself does have some drawbacks, which I leave as a subject for future posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-9059954231481733099?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/AQ1arThu7qc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/9059954231481733099/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=9059954231481733099" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/9059954231481733099?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/9059954231481733099?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2008/10/models-and-efficiency-part-2-databases.html" title="Models and Efficiency, part 2: Databases" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_lqFQ_plhsMs/SQnapTy3fCI/AAAAAAAAAAc/imXe9XKpNOA/s72-c/data_port.jpg" height="72" width="72" /><thr:total>3</thr:total></entry><entry gd:etag="W/&quot;D0IFSHc5eCp7ImA9WxRVGUQ.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-5643093427949941838</id><published>2008-10-24T15:46:00.011+02:00</published><updated>2008-11-18T09:05:19.920+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-18T09:05:19.920+01:00</app:edited><title>Practical Stuff: Models, Theories, Abstractions</title><content type="html">by &lt;a href="http://www.blogger.com/profile/05756120755805691277"&gt;Jesper Larsson&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;

They are usually mentioned only in a few fields, but
&lt;em&gt;models&lt;/em&gt; are actually everywhere in computing. In fact, it is
the key concept that lets computers do things for people to find
interesting. &lt;br /&gt;&lt;br /&gt;

The word &lt;em&gt;model&lt;/em&gt; can mean many things, some of which are shown
in the collage below. I am not talking about any of those. Nor
am I talking about the technical use in logic (an assignment of
constants to variables that makes a sentence true), despite the close
tie between logic and computing &amp;#x2013; particularly databases.
&lt;br /&gt;&lt;br /&gt;


&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_lqFQ_plhsMs/SQHSrcfIXBI/AAAAAAAAAAM/eS0ckIeFQeM/s1600-h/collage.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 200px;" src="http://1.bp.blogspot.com/_lqFQ_plhsMs/SQHSrcfIXBI/AAAAAAAAAAM/eS0ckIeFQeM/s400/collage.jpg" border="0" alt="Models"id="BLOGGER_PHOTO_ID_5260717483568421906" /&gt;&lt;/a&gt;
&lt;small&gt;Photos by 

&lt;a href="http://flickr.com/photos/ntlam/2323563730/"&gt;Nguyễn Thành Lam&lt;/a&gt;,
&lt;a href="http://flickr.com/photos/whiteafrican/2868174022/"&gt;Erik Hersman&lt;/a&gt;,
&lt;a href="http://flickr.com/photos/smith/81141/"&gt;Richard Smith&lt;/a&gt;,
&lt;a href="http://flickr.com/photos/mazzuk/1481457389/"&gt;MaZzuk&lt;/a&gt;

(&lt;a href="http://creativecommons.org/licenses/by/2.0/deed.en"&gt;some
rights reserved&lt;/a&gt;).&lt;/small&gt;&lt;br /&gt; &lt;br /&gt;

No, I am talking about a model in the sense of an
&lt;em&gt;abstraction&lt;/em&gt;. A simplified representation of something. A
framework for naming and envisioning non-physical objects to allow
reasoning about them and shaping &lt;em&gt;theories&lt;/em&gt; about them. Often
by use of &lt;em&gt;metaphors&lt;/em&gt; &amp;#x2013; analogous concepts picked from a
well known domain.&lt;br /&gt;&lt;br /&gt;

The picture below is an example of a sort of abstract model that most
computer users encounter. It shows a set of &lt;em&gt;folders&lt;/em&gt;, or
&lt;em&gt;files&lt;/em&gt;, which in turn are the contents of &lt;em&gt;another&lt;/em&gt;
folder. But of course, they are not really folders. Folders are made
of paper and kept in file cabinets. These are just images on the
screen that &lt;em&gt;symbolize&lt;/em&gt; something that we &lt;em&gt;think&lt;/em&gt; about
as &lt;em&gt;analogous&lt;/em&gt; to real folders. &lt;br /&gt;&lt;br /&gt;

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_lqFQ_plhsMs/SQHSr73_6yI/AAAAAAAAAAU/FOzeeCH5lZ4/s1600-h/folders.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 285px;" src="http://2.bp.blogspot.com/_lqFQ_plhsMs/SQHSr73_6yI/AAAAAAAAAAU/FOzeeCH5lZ4/s400/folders.gif" border="0" alt="Folders"id="BLOGGER_PHOTO_ID_5260717491994225442" /&gt;&lt;/a&gt;



The physical constructs beyond the abstraction is not something the
average user needs to be concerned with. In fact, the folders can, and
do in this case, represent quite different things. Some of the folders
in the picture are things that are kept on my local hard drive, but
some are network connections. Since they, from a certain aspect, all
support the same set of operations, we can interact with them using
the simplified view that they are a homogeneous set of objects. That
is part of the model of this file browser interface. &lt;br /&gt;&lt;br /&gt;

Computers operate through level upon level of models, starting (unless
you want to get metaphysical) with voltages in electrical circuits,
progressing to boolean logic, machine language, and so on, and perhaps
via one or several &lt;a
href="http://en.wikipedia.org/wiki/Virtual_machine"&gt;&lt;em&gt;virtual
machines&lt;/em&gt;&lt;/a&gt;, up to, finally, the model that the user of the
program understands. &lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Intuitive Models&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

Models were not invented for computing. Everybody thinks intuitively
in levels of abstraction. Generally, people have no problem
accepting that if you look under the surface of the simplified
concepts we accept as objects or concepts &amp;#x2013; &lt;em&gt;things&lt;/em&gt; of
various kinds &amp;#x2013; you find a more fine-grained physical level of
existence, which is quite different from what we usually perceive.
&lt;br /&gt;&lt;br /&gt;

But strangely enough, some people, computing professionals in fact,
sometimes fail to understand that the model can be independent
of the lower physical level. I have suffered from this myself, and
perhaps I still do in some areas, but I have gotten a lot better over
the last few years. &lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Programming Languages&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

When I have told people that I develop high-performance systems in
Java, I have often (less now than a few years ago) had the surprised
reaction &amp;#x201c;but Java is so slow, how can you get it
fast enough?&amp;#x201d; &lt;br /&gt;&lt;br /&gt;

Java is a language which, like other programming languages, expresses
sequences of logical (in a wider sense) operations. How could a
language possibly be slow, or fast for that matter?
Consider the following Java code that calculates a number in the
Fibonacci sequence: &lt;br /&gt;

&lt;pre&gt;    long s = 0, t = 1;
    for (int i = 0; i &lt; n; ++i) {
        long u = s + t;
        s = t;
        t = u;
    }&lt;/pre&gt;
&lt;br /&gt;

Now, take a look at the equivalent code in C++: &lt;br /&gt;

&lt;pre&gt;    long s = 0, t = 1;
    for (int i = 0; i &lt; n; ++i) {
        long u = s + t;
        s = t;
        t = u;
    }&lt;/pre&gt;
&lt;br /&gt;

Which one is faster? How can &lt;em&gt;any&lt;/em&gt; of them be faster? They are
exactly the same! (Java and C++ are closely related languages with the
same basic syntactic elements.)
&lt;br /&gt;&lt;br /&gt;

Now, perhaps you think I am just being difficult. Obviously, what
people mean when they say that Java is slower than C++ is not that the
language per se is slow. It is that code compiled with a Java compiler
and executed in a Java runtime environment is slower than the same
code compiled by a C++ compiler and run by the operating system
directly. This is because the Java runtime environment is a &lt;a
href="http://en.wikipedia.org/wiki/Virtual_machine"&gt;&lt;em&gt;virtual
machine&lt;/em&gt;&lt;/a&gt; &amp;#x2013; an &lt;em&gt;interpreted&lt;/em&gt; system. It is a
&lt;em&gt;program&lt;/em&gt; that looks at one instruction at the time and changes
its state accordingly, just like the physical machine that the
operating system is running on. This extra level adds overhead. &lt;br
/&gt;&lt;br /&gt;

Is this true, then?
&lt;br /&gt;&lt;br /&gt;

It certainly &lt;em&gt;was&lt;/em&gt; true &amp;#x2013; about ten years ago. But then
&lt;em&gt;just-in-time compiling&lt;/em&gt; came into use. Virtual machines with
this feature did not execute instructions directly, but compiled them
to machine code the first time it saw them, and fed the machine code
into the physical machine. This could get almost as fast as compiling
directly, but it still had the overhead of compiling in runtime.
&lt;br /&gt;&lt;br /&gt;

Finally, the &lt;a
href="http://en.wikipedia.org/wiki/HotSpot"&gt;HotSpot&lt;/a&gt; virtual
machine arrived. It did not just compile everything it saw. It
gathered statistics about which parts of the code was run more often,
and made a greater effort to produce efficient code for the more
frequent parts.
&lt;br /&gt;&lt;br /&gt;

About five years ago, when I got into a discussion with someone who
claimed that Java was inevitably slower than C++ for tight loops, I
did a comparison between Java and C++ execution, on tight-loop code
similar to the Fibonacci calculation code above. The Java code
executed in the HotSpot virtual machine was significantly faster than
the C++ code compiled by &lt;a
href="http://en.wikipedia.org/wiki/GNU_Compiler_Collection"&gt;GCC&lt;/a&gt;
with full optimization and run directly on the physical machine.
&lt;br /&gt;&lt;br /&gt;

The explanation is that the Java environment, doing the compiling in
runtime, has more information at its disposal than the C++ compiler.
Hence, it is able to generate more efficient machine code.
&lt;br /&gt;&lt;br /&gt;

This was (and still is, I imagine) astonishing to a lot of people,
stuck with the interpretation overhead stigma.
&lt;br /&gt;&lt;br /&gt;

The point of all this is that the language &amp;#x2013; the model, so to
speak &amp;#x2013; does not determine whether executing code is efficient
or not. The language is just a way of expressing transitions from one
machine state to the next. What ultimately should matter for
efficiency is the computational complexity of computing the next
state. There is, in theory, absolutely no reason why code expressed in
one model should be less efficient than another. If the system is
smart enough, it can execute it in the most efficient way anyway. This
is the true meaning of the word &lt;em&gt;optimization&lt;/em&gt;. &lt;br /&gt;&lt;br /&gt;

What is called optimization in practice is not really optimization,
because it rarely finds the optimal code. It just finds code that is,
hopefully, better than a naive translation from high level language to
machine code. I suppose true optimization in general is impossible. (I
have not found an actual proof, but it feels like it would be related
to computing &lt;a
href="http://en.wikipedia.org/wiki/Kolmogorov_complexity"&gt;Kolmogorov
complexity&lt;/a&gt;.) However, in every case where the system can recognize
a piece of code that expresses something it has an efficient execution
method for, it can use that method. &lt;br /&gt;&lt;br /&gt;

The evolution of Java systems is an excellent example of how an
improved implementation can increase performance without changing the
model.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Next Time: Data Models&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

The main point that I want to make about misunderstood models has, as
you might expect (and as promised at the end of the &lt;a
href="http://completerewrite.blogspot.com/2008/10/text-search-and-relational-model.html"&gt;previous
post&lt;/a&gt;), to do with database systems. But this post is already quite
long, and I do not want to cram in the most important subject at the
end. You may already have guessed the essence of it, but for the
details I again have to refer you to &lt;a href="http://completerewrite.blogspot.com/2008/10/models-and-efficiency-part-2-databases.html"&gt;a future post&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-5643093427949941838?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/ry-3HJfj6nU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/5643093427949941838/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=5643093427949941838" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/5643093427949941838?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/5643093427949941838?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2008/10/practical-stuff-models-theories.html" title="Practical Stuff: Models, Theories, Abstractions" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_lqFQ_plhsMs/SQHSrcfIXBI/AAAAAAAAAAM/eS0ckIeFQeM/s72-c/collage.jpg" height="72" width="72" /><thr:total>6</thr:total></entry><entry gd:etag="W/&quot;D0QHRn47eSp7ImA9WxRVGUQ.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-4021104874982905769</id><published>2008-10-20T11:20:00.007+02:00</published><updated>2008-11-18T09:02:17.001+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-11-18T09:02:17.001+01:00</app:edited><title>Text Search and the Relational Model</title><content type="html">by &lt;a href="http://www.blogger.com/profile/05756120755805691277"&gt;Jesper Larsson&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;

About seven years ago, I gave a presentation to a large group of
potential customers about the text search product that was our big
thing at the time. I explained that our storage was more compact than
in the &lt;a
href="http://en.wikipedia.org/wiki/Relational_model"&gt;relational
model&lt;/a&gt;. I was &lt;em&gt;so&lt;/em&gt; wrong.&lt;br /&gt;&lt;br /&gt;

Now I know better. Unfortunately, most of the industry and much of
academia does not. People get away with saying the things that I did
without being laughed at, just as easily as ever.&lt;br /&gt;&lt;br /&gt;

Not understanding what the relational model really is, people take it
to be what the major relational database systems do. Hence, when
people cannot get what they need from those systems, they conclude
that the relational model is not suitable for their needs. So, they
throw away one of the most powerful abstractions for data processing
without even looking at it.&lt;br /&gt;&lt;br /&gt;

It is particularly common to write off the relational model in the
&lt;em&gt;text search&lt;/em&gt; field. You can frequently hear people say that
things like &lt;em&gt;the relational model is not suitable for capturing the
structure of text&lt;/em&gt;, or referring to text by the peculiar term
&amp;#x201c;&lt;a
href="http://en.wikipedia.org/wiki/Unstructured_data"&gt;unstructured
data&lt;/a&gt;&amp;#x201d;. Some writers talk about &lt;em&gt;combining&lt;/em&gt; text and
relational data, as if there were a contradiction. As if
&lt;em&gt;relational data&lt;/em&gt; were a special kind of data. A more correct
account, like that of &lt;a
href="http://www.dbms2.com/2005/12/09/relational-dbms-versus-text-data/"&gt;Curt
Monash&lt;/a&gt;, is to simply note that text does not work very well with
the architecture of mainstream relational products &amp;#x2013; which is
true. Text search applications based on major database systems usually
turn out painfully inefficient. One of many points where the
architecture of those systems does not match the demands of the Internet
world. Database system vendors are unable to adopt, which has created
a separate market for search engines. &lt;br /&gt;&lt;br /&gt;

But that avoids the core issue. Since text search is one of my top
areas of expertise, I hope I can explain to you why the relational
model is perfectly capable of capturing the structure of text. I'll
start at the very bottom, explaining what text search really is.
&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Searching for Words&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

Boiled down to the most fundamental formulation, the text search
problem is to answer the query &amp;#x201c;which documents match the text
I give you?&amp;#x201d; The result is a set of document identifiers of
some kind &amp;#x2013; titles, file names, internet addresses or
something, depending on the application. Let us assume, for
simplicity, that each document has a unique number that you use for
identifying it.&lt;br /&gt;&lt;br /&gt;

Then you might have, for example, a database such as this one:
&lt;br /&gt;&lt;br /&gt;

&lt;center&gt;
&lt;table border="1" cellpadding="2" cellspacing="0"&gt;
&lt;tr&gt;&lt;th&gt;Text&lt;/th&gt;  &lt;th&gt;Doc. no&lt;/th&gt;&lt;/tr&gt;

&lt;tr&gt;&lt;td&gt;war and peace&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;art of war&lt;/td&gt;  &lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;modern art&lt;/td&gt;  &lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;

The &lt;em&gt;text&lt;/em&gt; data here may not be something that you would
normally consider a &lt;em&gt;document&lt;/em&gt;, but that is just because I want
to keep the example small. The same principles apply if the texts are
hundreds, thousands, or millions of words long, and if if there are
millions of documents.&lt;br /&gt;&lt;br /&gt;

Say that you want to search for a single word, &lt;em&gt;war&lt;/em&gt; for
instance. Scanning through all the text of all the documents would
probably not be efficient enough for a real-life database, so you need
some sort of &lt;em&gt;data structure&lt;/em&gt; to make searching a quicker task.
&lt;br /&gt;&lt;br /&gt;

A method that you are sure to be familiar with, even if you know
nothing about computing, is to line up items alphabetically to allow
searching for something without looking through all the data. A
computer program can use &lt;a
href="http://en.wikipedia.org/wiki/Binary_search"&gt;&lt;em&gt;binary
search&lt;/em&gt;&lt;/a&gt; to locate something quickly in an ordered list, or it
can build a &lt;em&gt;search tree&lt;/em&gt; that allows the same sequences of
comparisons without having to physically align the sorted items. (One
out of a bunch of standard search methods at the programmer's
disposal.)&lt;br /&gt;&lt;br /&gt;

When you create an &lt;em&gt;index&lt;/em&gt; for a data field (a table column) in
a common &lt;a
href="http://en.wikipedia.org/wiki/Relational_database"&gt;relational
database&lt;/a&gt; system, this is essentially what happens. Most commonly,
some sort of &lt;a href="http://en.wikipedia.org/wiki/B-tree"&gt;B-tree&lt;/a&gt;
is created, and subsequent requests for records based on that field
can use the tree to speed up finding the right records.&lt;br /&gt;&lt;br /&gt;

So, an index for a text field can be visualized as another table, where the rows are ordered alphabetically:
&lt;br /&gt;&lt;br /&gt;

&lt;center&gt;
&lt;table border="1" cellpadding="2" cellspacing="0"&gt;
&lt;tr&gt;&lt;th&gt;Text&lt;/th&gt;  &lt;th&gt;Doc. no&lt;/th&gt;&lt;/tr&gt;

&lt;tr&gt;&lt;td&gt;art of war&lt;/td&gt;  &lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;modern art&lt;/td&gt;  &lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;war and peace&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;

This lets us find document no&amp;nbsp;1 quickly by looking for
&lt;em&gt;war&lt;/em&gt; in the ordered text field. But what about
document&amp;nbsp;2? It contains &lt;em&gt;war&lt;/em&gt; as well, but the index is no
help since the word is not at the beginning.&lt;br /&gt;&lt;br /&gt;

This is where common database systems give up and tell you to stop
using the relational model. Instead, they supply you with a special
kind of &lt;em&gt;text index&lt;/em&gt;, that can be invoked either with a special
operator to specify that a record &lt;em&gt;contains&lt;/em&gt; a word, or using a
hierarchical programming interface.&lt;br /&gt;&lt;br /&gt;

You may find it a bit surprising that they give up so easily, because
it should be obvious to almost anyone, including major database
vendors, that the standard text search practice of &lt;em&gt;inverting&lt;/em&gt;
the data can be applied like this:&lt;br /&gt;&lt;br /&gt;

&lt;center&gt;
&lt;table border="1" cellpadding="2" cellspacing="0"&gt;
&lt;tr&gt;&lt;th&gt;Word&lt;/th&gt; &lt;th&gt;Doc. no&lt;/th&gt; &lt;th&gt;Position&lt;/th&gt;&lt;/tr&gt;

&lt;tr&gt;&lt;td&gt;and&lt;/td&gt; &lt;td&gt;1&lt;/td&gt; &lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;art&lt;/td&gt; &lt;td&gt;2&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;art&lt;/td&gt; &lt;td&gt;3&lt;/td&gt; &lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;modern&lt;/td&gt; &lt;td&gt;3&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;of&lt;/td&gt;  &lt;td&gt;2&lt;/td&gt; &lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;peace&lt;/td&gt; &lt;td&gt;1&lt;/td&gt; &lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;war&lt;/td&gt; &lt;td&gt;1&lt;/td&gt; &lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;war&lt;/td&gt; &lt;td&gt;2&lt;/td&gt; &lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;

This table tells us, for each word, in what position in what document
the word can be found. In fact, it conveys &lt;em&gt;exactly the same
information&lt;/em&gt; as the original table. The difference is that with
this organization, it is simple to express queries such as
&amp;#x201c;which documents contain the word &lt;em&gt;war&lt;/em&gt;&amp;#x201d; using
just a normal relational language. No special operators, no
non-relational interface.&lt;br /&gt;&lt;br /&gt;

In addition, unlike a specialized text index, it allows you to express
a bunch of other queries about the words and the documents. This is
one of the strengths of the relational model: it uses a small, fixed,
set of operators, but still lets you formulate practically any query
that you can think of.&lt;br /&gt;&lt;br /&gt;

Say that you want to find which documents contain both &lt;em&gt;war&lt;/em&gt;
and &lt;em&gt;art&lt;/em&gt;. In a relational language, this is easily expressed
as the intersection between the documents containing &lt;em&gt;war&lt;/em&gt; and
those containing &lt;em&gt;art&lt;/em&gt;. (There is actually also a more
intriguing way to express it that I will get to that in a future
post.) By contrast, if your word query construct is a special
&lt;em&gt;contains&lt;/em&gt; operator, you need to add support for a more complex
argument than just a word. For instance, you could say something like
&amp;#x201c;contains(war and art).&amp;#x201d; The argument string,
&amp;#x201c;war and art&amp;#x201d;, is something that the implementation of
the &lt;em&gt;contains&lt;/em&gt; operator has to parse and make its own kind of
sense of. In effect, this means that you have to have a whole separate
query language for the &lt;em&gt;contains&lt;/em&gt; argument alone!&lt;br /&gt;&lt;br /&gt;

This should really be enough to convince anyone that the relational
model &lt;em&gt;is&lt;/em&gt; suitable for capturing the structure of text.
&lt;br /&gt;&lt;br /&gt;

To be fair, we should note that there is no way to obtain
word-oriented organization from the document-oriented one using
standard relational syntax. This is because it depends on a particular
interpretation of the &lt;em&gt;text&lt;/em&gt; field in the original table
&amp;#x2013; that it consists of space-delimited words. So you either have
to present the system with the word-oriented data as input, or include
some way of specifying how the &lt;em&gt;text&lt;/em&gt; field is to be broken
down to multiple word values.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Model vs Implementation&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

Indexing by inverting text data is the basis of pretty much all
information retrieval. It is how the text indexes of the database
products works below the surface, and it is the underlying method for
most search engines you find on the Internet &amp;#x2013; large and small.
(Although there is a rumor, supported by a peculiar change of
behavior, that Google recently changed their basic search method to
something completely different.)&lt;br /&gt;&lt;br /&gt;

So why don't the major relational database systems expose their text
indexes in a relational way? The answer lies in the core problem
shared by all those systems, as far back as &lt;a
href="http://en.wikipedia.org/wiki/System_R"&gt;System R&lt;/a&gt;: the lack of
separation between that which is logical and that which is physical,
between model and implementation.&lt;br /&gt;&lt;br /&gt;

That is a subject that deserves to be treated as something more than a
side note. It will be the subject of &lt;a href="http://completerewrite.blogspot.com/2008/10/models-and-efficiency-part-2-databases.html"&gt;an upcoming post&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-4021104874982905769?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/usEg638TKZI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/4021104874982905769/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=4021104874982905769" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/4021104874982905769?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/4021104874982905769?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2008/10/text-search-and-relational-model.html" title="Text Search and the Relational Model" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DU8AQHs5cSp7ImA9WxRXGEg.&quot;"><id>tag:blogger.com,1999:blog-2492664706785922439.post-5711035817389523489</id><published>2008-10-03T15:54:00.013+02:00</published><updated>2008-10-24T16:17:21.529+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-10-24T16:17:21.529+02:00</app:edited><title>Why the Database Masters Fail Us</title><content type="html">by &lt;a href="http://www.blogger.com/profile/05756120755805691277"&gt;Jesper Larsson&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;

There is hardly any field in computing more plagued by religious wars
than database systems. For nearly 40 years, the battle has raged
between various architechtures, occasionally with some combattants
replaced &amp;#x2013; or at least renamed. Still, it seems that we are
further away than ever from a sort of database that we can be
satisfied with. (I am not going into the details of the problems right
now, that will be a subject of future posts. But if you are in the
business I am sure you are familiar with some of them.)&lt;br /&gt;&lt;br /&gt;

Let us take a couple of steps back and take a look at the
situation. Let us forget, for the moment, our personal stance in the
fight on data models, platforms etc., and ask ourselves: who do we
depend on to design database systems? What is driving them? Could they
do better? Could &lt;em&gt;we&lt;/em&gt; do better, with a different kind of
effort?&lt;br /&gt;&lt;br /&gt;

I have come up with three groups of operators that influence the
design of database platforms. I call them the &lt;em&gt;paper writers&lt;/em&gt;,
the &lt;em&gt;evangelists&lt;/em&gt;, and the &lt;em&gt;merchants&lt;/em&gt;. Anyone who
creates the actual code that makes up the database systems is a
servant of one or more of these masters. Let us go over them one by
one.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;The Paper Writers&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

What would be a better source of knowledge to base our system design
on than science? According to popular view, scientists, or more
specifically academic researchers, have the task of advancing human
knowledge. Using objective observation and rational reasoning, they
find the ultimate truth, unaffected by fads or short-sighted
economics.&lt;br /&gt;&lt;br /&gt;

Unfortunately, it does not quite work that way.&lt;br /&gt;&lt;br /&gt;

The primary concern of most academic researchers is to produce
&lt;em&gt;papers&lt;/em&gt; &amp;#x2013; articles published in conference proceedings
or scientific journals. Published papers are what they are judged by,
the most important merit for their Ph.D. degrees and research grants.
&lt;br /&gt;&lt;br /&gt;

Consequently, academic researchers learn to become experts on getting
papers published. They have an ambition to advance human knowledge
too, but that is a far-fetched goal that only established stars can
afford to have as their first priority. In daily work, the immediate
focus of most researchers is to impress &lt;em&gt;peer reviewers&lt;/em&gt;
&amp;#x2013; people in their own field who decide whether papers get
accepted or not.&lt;br /&gt;&lt;br /&gt;

For several reasons, this makes scientific work less useful for
practitioners than you might expect.&lt;br /&gt;&lt;br /&gt;

First, it influences which subjects get explored. Researchers tend to
pick subjects that are currently in fashion, for which papers are in
demand. The result is that current trends have a large impact on what
subjects people choose to work with.&lt;br /&gt;&lt;br /&gt;

Second, it has an effect on the language of published research. Since
writers and reviewers are actually the same people &amp;#x2013; the
researchers involved with the field &amp;#x2013; writers use language
meant to be understood and judged as appropriate by other people like
themselves. This has a self-amplifying effect, with the result that
publications often seem impenetrable or irrelevant to people outside
the field.&lt;br /&gt;&lt;br /&gt;

Third, once the papers are published, the work is finished as far as
the researcher is concerned. Few researchers bother to take their
findings any further.&lt;br /&gt;&lt;br /&gt;

The consequence is that academic research rarely gives us
comprehensible knowledge of how to design a system. What we mostly get
are thousands of fragments of potentially useful knowledge, clumped
together in bursts around subjects that are popular over a few years,
and usually presented in a language that is difficult to penetrate.
&lt;br /&gt;&lt;br /&gt;

Many research projects include implementation of actual software
systems to test or demonstrate research findings. Sometimes they
develop into industrially useful ones, but only rarely. Research
systems are typically not fully functional or efficient enough for
general practical use, and it would hardly be fair to expect them to
be, especially for software as large and complex as modern database
systems are. After all, the point of academic research is not to
produce ready-made systems. &lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;The Evangelists&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

There are a number of people out there who claim that they have the
correct view of how a database should be constructed, and that
essentially everyone else is wrong. If people would just listen to
them, everything would turn out fine. Of course, oddballs, fanatics,
and charlatans exist in every field, but none of these labels quite
captures the database evangelists &amp;#x2013; at least not all of
them.&lt;br /&gt;&lt;br /&gt;

In particular there is a group of people who persistently promote the
relational model. But, the relational model is already dominant in the
database world, isn't it? Is not everything fine, then? No, this
&lt;em&gt;true&lt;/em&gt; relational model lobby, with esteemed relational
database pioneer &lt;a
href="http://en.wikipedia.org/wiki/Christopher_J._Date"&gt;C.J. Date&lt;/a&gt;
as their figurehead, claims that the version of relational databases
that dominates the industry is distorted; that the
relational model is misunderstood or mistreated by practically
everyone, even to some extent its inventor &lt;a
href="http://en.wikipedia.org/wiki/Edgar_F._Codd"&gt;E.F. Codd&lt;/a&gt;!
&lt;br /&gt;&lt;br /&gt;

I may have made this sound a little more eccentric than it deserves.
The fact is, I principally agree with most of what Date and his allies
say. However, even if they are correct about the data model, they do
not have all the solutions needed to create a full database system. In
their fervor to promote the &lt;em&gt;true relational&lt;/em&gt; model, there are
a number of problems that they de-emphasize or do not address at all.
Hardware utilization and efficiency, for instance, they write off as
&lt;a
href="http://en.wikipedia.org/wiki/Somebody_Else%27s_Problem"&gt;somebody
elses problem&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;

On the other hand, an extreme pragmatic lobby has recently emerged,
centered around &lt;a
href="http://en.wikipedia.org/wiki/Michael_Stonebraker"&gt;Michael
Stonebraker&lt;/a&gt;, another veteran of the database field. Although
Stonebraker's thesis that &lt;a
href="http://dblp.uni-trier.de/rec/bibtex/conf/vldb/StonebrakerMAHHH07"&gt;it's
time for a complete rewrite&lt;/a&gt; of database products could plausibly
be supported by C.J. Date, Stonebraker could not care less about
purifying the relational model. He currently endorses abandoning the
very idea of general-purpose database systems for specialized,
application-specific solutions.&lt;br /&gt;&lt;br /&gt;

The database evangilists have an influence through their writings as
well as through their contacts with implementation projects in
academia or in the industry. Most of the time, their direct impact is
minor, but they may have important roles as architects of future
systems.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;The Merchants&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

The vendors that make their living producing database systems
obviously want to sell their products or services to as many customers
as possible. This makes them keenly monitor what the market seems to
want, and declare that this is just what they have &amp;#x2013; sometimes
adjusting their products accordingly.&lt;br /&gt;&lt;br /&gt;

On the one hand, merchants tend to be conservative, at least on the
main issues. They are frightened by radical new ideas that threaten to
be costly both to them and to their customers. Database management
platforms are heavy components in most IT infrastructures, coupled
with large investments and legacy issues. Rather than improving their
systems at the core, merchants prefer to add peripheral components,
covering more and more of customers' software needs.&lt;br /&gt;&lt;br /&gt;

On the other hand, the merchants must be extremely sensitive to
trends. They have to keep up with the latest buzzwords, not to appear
to be falling behind the competition. Also, they are always on the
lookout for new &lt;em&gt;features&lt;/em&gt; &amp;#x2013; minor extensions that they
can use as selling points.&lt;br /&gt;&lt;br /&gt;

The result, when merchants get their way, is that once their product
has established a decent market share, it more or less stops
developing, and starts growing instead.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;Who Will Create the Perfect Database System?&lt;/strong&gt;
&lt;br /&gt;&lt;br /&gt;

As you can see, if you roughly agree with my outline of the operators,
nobody really has both the will and the capability to create a good
database management system. How is this different from other areas of
the software industry? Simply because a database platform is such a
monumental piece of software, not only to produce, but also to use.
Changing your database system can be more demanding than replacing
your operating system. To both create and sell a major product is a
gargantuan task.&lt;br /&gt;&lt;br /&gt;

It is unlikely for any company or academic institute to successfully
take on the task of designing and producing from scratch, and then
selling to the market, a database system that is different enough to
take a major leap forward. It can happen &amp;#x2013; it has
happened before &amp;#x2013; but the way the industry has developed, it
has become a lot more difficult since the last major shift with the
relational database breakthrough in the early 1980s.&lt;br /&gt;&lt;br /&gt;

More likely, new systems will evolve slowly. People will abandon the
monolithic one-size-fits-all database systems, as Stonebraker
predicts, and create specialized lightweight solutions for various
applications. This has already happened in some areas.&lt;br /&gt;&lt;br /&gt;

However, I am convinced that there is a place for general data
management platforms in the future. It is simply too much work for
everyone to roll their own.&lt;br /&gt;&lt;br /&gt;

&lt;strong&gt;So What am I Selling?&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;

Who am I to say all this then, and what is my interest in the
business? I am head of research at a medium-sized Scandinavian company
named Apptus Technologies, which started out offering services around
existing major database systems. We suffered from the sluggishness of
the platforms, adopted to it, and ultimately made our business from
it. &lt;br /&gt;&lt;br /&gt;

For a number of years we have produced search systems for large and
complex data sets, to be accessed over the internet by millions of
users. We achieved efficiency and flexibility by gradually moving away
from major database platforms. Ultimately, we created our own
full-fledged database management system. We have never sold it as a
standalone system (nor do we intend to in the foreseeable future),
only used it as a base for more specialized systems. Therefore, we
have enjoyed a freedom in choosing how to develop our platform that
database system vendors do not have &amp;#x2013; including the freedom to
change our minds.&lt;br /&gt;&lt;br /&gt;

We have learned a lot during the last eight years, gained a lot of
insights, and developed some strong opinions in the process. Now, we
have decided to stick our heads out of the laboratory a bit, and try
to start a conversation with the world outside. Not just about the
advantages that our technology can produce for our customers (we have
a sales department for that), but about the technology and ideas
behind it, as well as our visions for the future of information
systems. It is going to be a lot about databases, but also about
programming and computer science in general.&lt;br /&gt;&lt;br /&gt;

I hope you will join us. Stay tuned to this blog. Subjects coming up:
why you can benefit from rewriting your system rather than patching
it; how we started out misunderstanding relational databases, and why
most people still do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2492664706785922439-5711035817389523489?l=completerewrite.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/completerewrite/~4/SY0y2vTTebU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://completerewrite.blogspot.com/feeds/5711035817389523489/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=2492664706785922439&amp;postID=5711035817389523489" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/5711035817389523489?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2492664706785922439/posts/default/5711035817389523489?v=2" /><link rel="alternate" type="text/html" href="http://completerewrite.blogspot.com/2008/10/why-database-masters-fail-us.html" title="Why the Database Masters Fail Us" /><author><name>Jesper Larsson</name><uri>http://www.blogger.com/profile/05756120755805691277</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total></entry></feed>

