<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
  <title>Cambridge Semantics</title>
  <link rel="alternate" href="http://www.cambridgesemantics.com/blog/-/blogs/rss" />
  <subtitle>Cambridge Semantics</subtitle>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/EnterpriseSemantics" /><feedburner:info uri="enterprisesemantics" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>EnterpriseSemantics</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><entry>
    <title>Explaining the Semantic Web: 3 Recent Examples</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/1PAMP2nr0eM/explaining-the-semantic-web-3-recent-examples" />
    <author>
      <name>Lee Feigenbaum</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/explaining-the-semantic-web-3-recent-examples</id>
    <updated>2012-07-20T15:53:44Z</updated>
    <published>2012-07-19T13:34:47Z</published>
    <summary type="html">&lt;p&gt;
	Part of building a software business based on Semantic Web technologies is working continuously to understand and elucidate the benefits of Semantic Web technologies both in general and specifically for enterprise information management.&lt;/p&gt;
&lt;p&gt;
	My colleagues and I have recently had a few different opportunities to explain the Semantic Web and its value for different audiences:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		&lt;a href="http://www.cmswire.com/cms/information-management/the-semantic-web-and-the-modern-enterprise-016571.php"&gt;The Semantic Web and the Modern Enterprise&lt;/a&gt; - This is an introductory article written for the &lt;a href="http://www.cmswire.com"&gt;CMSWire.com&lt;/a&gt;, a blog / news site / magazine with an audience interested in all aspects of content and information management, both on the Web and inside enterprises.&lt;/li&gt;
	&lt;li&gt;
		&lt;a href="http://www.dbta.com/Articles/Editorial/Trends-and-Applications/Big-Data-or-Right-Data-What-Really-Matters-83694.aspx"&gt;Big Data or Right Data: What Really Matters?&lt;/a&gt; - Jeff Stamen, our Executive Vice Chairman at Cambridge Semantics and a co-founder of Progress Software, expanded on his &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/big-data-or-right-data-"&gt;earlier blog post&lt;/a&gt; with this article in &lt;a href="http://www.dbta.com/"&gt;DBTA (Database Trends and Applications)&lt;/a&gt;, an online and print magazine for data professionals.&lt;/li&gt;
	&lt;li&gt;
		&lt;a href="http://tv.slashdot.org/video/?embed=B2azE3NTqsriugDdCihGrJMDfbeyEYt9"&gt;Lee Feigenbaum Explains the Semantic Web&lt;/a&gt; - I had the chance to give this interview for Slashdot a couple of months ago as a video introduction to what the Semantic Web is (and isn't).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
	Enjoy!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/1PAMP2nr0eM" height="1" width="1"/&gt;</summary>
    <dc:creator>Lee Feigenbaum</dc:creator>
    <dc:date>2012-07-19T13:34:47Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/explaining-the-semantic-web-3-recent-examples</feedburner:origLink></entry>
  <entry>
    <title>Ontologies as Conceptual Models</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/q1iaukSHajo/ontologies-as-conceptual-models" />
    <author>
      <name>Sean Martin</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/ontologies-as-conceptual-models</id>
    <updated>2012-07-09T15:24:26Z</updated>
    <published>2012-07-09T15:17:02Z</published>
    <summary type="html">&lt;p&gt;
	At Cambridge Semantics we use the W3C semantic web standards to create conceptual canonical data models, in particularly using the web ontology modeling language called OWL. The conceptual models are declarative and express information in the way that the domain expert or business user, understands it – usually as a series of interlinked concepts and properties. Unlike most traditional technologies, these conceptual models are independent of how data is stored and provide an abstraction sometimes called “a semantic layer” for our Anzo software.&lt;/p&gt;
&lt;p&gt;
	&lt;b&gt;The result is unprecedented flexibility.&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;
	The conceptual models can reflect different versions of the truth if necessary; either evolved over time or in how different groups of users understand different concepts just sharing what is common between them, since they are independent of storage system constraints. OWL names concepts uniquely across foreign languages and even multiple human readable names or labels for properties that describe the same concepts. Using W3C open data standards ensures that the conceptual models can encode a common vocabulary for different parts of an enterprise or members of an information supply chain to talk jointly about and share their data far more easily.&lt;/p&gt;
&lt;p&gt;
	The conceptual models can encode property restrictions, controlled vocabularies and can be annotated with information useful to users, ETL processes, query &amp;amp; form building tools, downstream data consumer applications; validation logic etc. Again this is all in expressed in declarative fashion, independent of all the storage systems and information consuming/producing applications but reusable by any of them.&lt;/p&gt;
&lt;p&gt;
	The conceptual models themselves are often based or can import elements of a growing set of existing industry or domain models expressed in OWL or other representations and since they are standards based the models can easily be shared for reuse by partners, customers, vendors or anyone else that would like to align their information conceptually.&lt;/p&gt;
&lt;p&gt;
	The OWL ontology language facilitates the tagging of data instances from multiple data sources with their meanings, to form a single integrated view of that data built on a multitude of simple factual statements called RDF triples. RDF is an open standards based graph oriented data representation in which objects or graph nodes have properties, some of which have data values and others are pointers to further nodes in the graph. The graph model is an intuitive one for humans who tend to think by associations of objects and their properties. It is far easier for most of us to traverse a series of interlinked concepts to figure out what data we have or need and how it is related, than think about say the interlinked table structures in the data schema model offered by the relational data technology for example.&lt;/p&gt;
&lt;h2&gt;
	Operationalizing the Conceptual Model&lt;/h2&gt;
&lt;p&gt;
	Once ontologies are established, these conceptual models can easily be operationalized. Together middleware and tooling software that supports the family of W3C standards designed to “play nice together”, the models can drive and underpin nearly every aspect of a system. Here are some examples supported in our own Anzo software:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		To import and integrate/conceptually align multiple data sets to the Anzo system by mapping it from the private proprietary models to the standards based conceptual one;&lt;/li&gt;
	&lt;li&gt;
		To allow end users to find on-the-fly the data they need through searching by both concepts and content in a manner that abstracts away from individual source and format issues;&lt;/li&gt;
	&lt;li&gt;
		To guide the creation of queries and forms for access and manipulation of data – queries and form builders work by allowing users to transverse familiar concepts to find what they want without being impeded by artifacts of storage (e.g. sources of data, formats of data, SQL joins etc.)&lt;/li&gt;
	&lt;li&gt;
		Simplify data integration – at the conceptual level for structured, semi-structured &amp;amp; unstructured data. Data from any source is mapped to the same concepts drawn from the model.&lt;/li&gt;
	&lt;li&gt;
		To automatically create expressions of the how the data may be accessed or manipulated e.g. code generation for Web Services access or the generation of programmable business objects; or the generation of relational database schema that reflect the ontologies;&lt;/li&gt;
	&lt;li&gt;
		Validation &amp;amp; data quality;&lt;/li&gt;
	&lt;li&gt;
		Concept based access control;&lt;/li&gt;
	&lt;li&gt;
		Inference and reasoning;&lt;/li&gt;
	&lt;li&gt;
		Can increasingly be used as a basis for expressing human readable/editable logic rules for making data driven decisions and manipulating data – contrast this with the traditional approach that requires business requirements be passed to developers and DBA’s who will scatter that logic into source code across multiple tiers of a system as well as its database schema, making it impossible for the business user to truly understand the actual logic of the system and creating an ongoing maintenance resource sink ;&lt;/li&gt;
	&lt;li&gt;
		If data is transferred out of the Anzo system, the conceptual model that describes that data can travel with it included in the data stream using open standards like OWL and RDF. This greatly facilitates reuse because information is automatically de-silo’ed and downstream applications can also read the conceptual models and adjust themselves accordingly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
	Conceptual Models underpin the Anzo Software&lt;/h2&gt;
&lt;p&gt;
	The Anzo software is entirely driven by standards based conceptual models. It includes many software components that are all designed to work together driven by the same conceptual models:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		&lt;em&gt;&lt;a href="http://www.cambridgesemantics.com/products/easy-data-integration"&gt;Anzo Connect&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;Anzo for Microsoft Excel&lt;/em&gt; provide the ability to import structured and semi-structured data into the Anzo system (transforming it as necessary) and described using the conceptual models.&lt;/li&gt;
	&lt;li&gt;
		&lt;em&gt;&lt;a href="http://www.cambridgesemantics.com/products/anzo-unstructured"&gt;Anzo Unstructured&lt;/a&gt;&lt;/em&gt; can be used by end users to create their own multi-vendor based Natural Language Processing (NLP) pipelines that map data extracted from documents (emails, web pages, pdf’s, etc.) onto the same conceptual models, thereby integrating both structured and unstructured data for blended uses.&lt;/li&gt;
	&lt;li&gt;
		&lt;em&gt;&lt;a href="http://www.cambridgesemantics.com/products/excel-data-analysis"&gt;Anzo on the Web&lt;/a&gt;&lt;/em&gt; is an end user focused BI dashboard and Form builder tool. It can be used to locate the data users want and visually mash it up, by using the models to understand what data is available to them and abstracting the complex query building to the relatively simple task of traversing linked concepts. Users can create simple “info apps” for themselves and other less skilled users that include forms for changing data, charts, tables and advanced filters.&lt;/li&gt;
	&lt;li&gt;
		&lt;em&gt;&lt;a href="http://www.cambridgesemantics.com/products/excel-collaboration"&gt;Anzo for Microsoft Excel&lt;/a&gt;&lt;/em&gt; provides two-way interaction with data made available through access to the conceptual model. Spreadsheet data can be mapped to the model to make it easy to collect and automatically integrate data using worksheets or to build forms for entering and reporting data as worksheets.&lt;/li&gt;
	&lt;li&gt;
		&lt;em&gt;&lt;a href="http://www.cambridgesemantics.com/products/workflow-management-system"&gt;Anzo Workflow&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;&lt;a href="http://www.cambridgesemantics.com/products/rules-engine"&gt;Anzo Rules&lt;/a&gt;&lt;/em&gt; are also driven by the conceptual data model and used to codify data flows and automated data driven decision making.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
	Flexibility Achieved&lt;/h2&gt;
&lt;p&gt;
	The reason the Anzo approach is so flexible and dynamic is that it takes a holistic approach to all these software components providing different functions and has arranged that all coordinate as a cohesive system using the common understanding provided by the shared conceptual model. In the traditional world, each of these components would be different piece parts, often provided by different vendors, requiring a system integrator to configure or program what is necessary to tie them into a single system.&lt;/p&gt;
&lt;p&gt;
	In Anzo, a change to the conceptual model or the creation of a new model, as the business changes, is better understood or a new need develops, is reflected everywhere immediately and repairs to dashboards and ETL maps can quickly be affected to reflect the new reality. Indeed the old reality can often be left to co-exist in the same system if there are downstream applications that still rely on it. Often it will be the end users themselves who make these changes since access to data has been so simplified through the use of the conceptual models.&lt;/p&gt;
&lt;p&gt;
	Contrast this conceptual semantic layer approach with traditionally built systems where for every alteration you will need the different people skilled in understanding the interactions of all the different piece part components of a solution, that do not share an abstracted model, to modify the logic used to glue those parts together – generally a long and costly business that soaks up the greatest proportion of the overall IT spend.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/q1iaukSHajo" height="1" width="1"/&gt;</summary>
    <dc:creator>Sean Martin</dc:creator>
    <dc:date>2012-07-09T15:17:02Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/ontologies-as-conceptual-models</feedburner:origLink></entry>
  <entry>
    <title>Why is the Semantic Web valuable?</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/BhCs6j0yTgU/why-is-the-semantic-web-valuable-" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/why-is-the-semantic-web-valuable-</id>
    <updated>2012-07-26T23:26:02Z</updated>
    <published>2012-06-22T14:49:45Z</published>
    <summary type="html">&lt;p&gt;
	Why is the Semantic Web valuable? What makes it different or special? In the past month, I've heard this question asked of me more than at any time in the past. The source of the questions is from the surprising success of &lt;a href="http://www.cambridgesemantics.com/semantic-university"&gt;Semantic University&lt;/a&gt;. I was humbled at SemTech to be congratulated not by old hands in the community, but by fresh members. One person told me she had opened up 7 lessons in her browser to read on the flight over.&lt;/p&gt;
&lt;p&gt;
	We've received requests for on-site corporate introductions to Semantic Web technologies. There are evidently large numbers of groups across industries that are curious about it and want to explore the tech more deeply to get a handle on its core value. Hidden potential adopters that have been silent because they're having trouble figuring how how, when, and where to apply it. And they don't want to just focus on building ontologies, which is what many other courses focus on; they need to have a deep understanding of &lt;i&gt;why&lt;/i&gt; before investing that kind of time. There is a missing link of conversation that starts &lt;i&gt;before&lt;/i&gt; ontologies and all that. Material that helps answer: &lt;b&gt;why does this matter?&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;
	Which brings me to the point of this post.&lt;/p&gt;
&lt;h2&gt;
	As a community, we Semantic Webbers have done a poor job communicating our value clearly and concisely.&lt;/h2&gt;
&lt;p&gt;
	Lately I've been trying to put out some &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/semantics-in-the-real-world-where-to-begin-"&gt;more accessible material that tries to explain the value to an audience that isn't intimately familiar with the ins and outs of Semantic Web tech&lt;/a&gt;, and I wish there were more people openly trying to do that. We don't lack in technical reading or capability, and we even have some cool public case studies. But we do lack sorely in quick tutorials, quick wins, readable success stories, and cheat sheets that help well-meaning folks figure out where the rubber can meet the road.&lt;/p&gt;
&lt;p&gt;
	I've &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/what-happened-to-nosql-for-the-enterprise-"&gt;previously compared&lt;/a&gt; our community to the NoSQL community. We have a great many advantages over NoSQL technologies for solving big data challenges, but we do a poor job of showing it.&lt;/p&gt;
&lt;p&gt;
	Want to dip your toes in NoSQL? Go to MongoDB's site and you can have a running app in 15 minutes. I did. It was awesome. I really admire what 10gen has done to make their simple database accessible to newcomers.&lt;/p&gt;
&lt;p&gt;
	Want to dip your toes in Semantic Web? Well then. Where to begin? Most tutorials get you to the point of writing triples and reading triples via some API, or building an ontology using Protege—great tool, intimidating as heck to newcomers. So what? To what end? For what purpose? 10gen's value is obvious: simple scaling via shards if your site grows, and a flexible data model that is just fantastic for rapid prototyping. Those values are technical, but so is the audience to which they are marketing.&lt;/p&gt;
&lt;p&gt;
	So question to the community: in one line, what is it that makes Semantic Web special? What is the one sentence that describes why you would bother to go through the hurdles of learning it vs. some other new technology out there?&lt;/p&gt;
&lt;p&gt;
	As a community, this is our greatest challenge. Even as a business, at Cambridge Semantics we have to make sure that we're clear about our point of view when we talk to prospects. My sentence: you use Semantic Web technologies for any application that will face frequent changing requirements over time. The requirements could be types of data, data sources, view of data, or people consuming the data. If those things aren't held constant, Semantic Web technologies will be quicker to get started with, and cheaper and easier to maintain in the long run.&lt;/p&gt;
&lt;p style="align:center; font-style:italic;"&gt;
	Shameless plug: if you're interested in hosting a 1/2 day, in-depth introduction to the Semantic Web at your company, please contact &lt;a href="mailto:SemanticUniversity@cambridgesemantics.com"&gt;Semantic University&lt;/a&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/BhCs6j0yTgU" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-06-22T14:49:45Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/why-is-the-semantic-web-valuable-</feedburner:origLink></entry>
  <entry>
    <title>RDF is the Universal Data Solvent</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/cjYvrj4--xw/rdf-is-the-universal-data-solvent" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/rdf-is-the-universal-data-solvent</id>
    <updated>2012-06-14T17:52:44Z</updated>
    <published>2012-06-14T17:52:44Z</published>
    <summary type="html">&lt;p&gt;
	Mike Bergman had a &lt;a href="http://www.mkbergman.com/1006/pragmatic-approaches-to-the-semantic-web/"&gt;a recent post about Linked Data and RDF serialization&lt;/a&gt;. It's a long read, so I'll just excerpt what I think is the key idea, and one that I agree with (emphasis is mine).&lt;/p&gt;
&lt;blockquote&gt;
	&lt;p&gt;
		…&lt;b&gt;effective data exchange does not require RDF&lt;/b&gt;. Most instance records are already expressed as simple entity-value pairs, and any data transfer serialization — from key-value pairs to JSON to CSV spreadsheets — can be readily transformed to RDF.&lt;/p&gt;
	&lt;p&gt;
		This understanding is important because &lt;b&gt;the fundamental contribution of RDF is not as a data exchange format, but as a foundational data model&lt;/b&gt;. The simple triple model of RDF can easily express the information assertions in any form of content, from completely unstructured text (after information extraction or metadata characterization) to the most structured data sources. Triples can themselves be built up into complete languages (such as OWL) that also capture the expressiveness necessary to represent any extant data or information schema.&lt;/p&gt;
	&lt;p&gt;
		The ability of RDF to capture any form of data or any existing schema makes it a “universal solvent” for information. This means that the real role of RDF is as a canonical data model at the core of the entire information architecture. Linked data, with its emphasis on data publishing and exchange, gets this focus exactly wrong. Linked data emphasizes RDF at the wrong end of the telescope.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
	This is a fundamental difference between the Semantic Web and other technology stacks. XML, for example, focuses first and foremost on data interchange. That is, it's a wire format first, with only some very belated attempts (XML Databases &amp;amp; XQuery) to do more than that.&lt;/p&gt;
&lt;p&gt;
	However, the comparison to me is much more interesting when you're thinking about relational databases (and, for you NoSQL lovers out there, this logic also applies to most popular NoSQL databases). With relational databases there is a distinct and strong difference between the structure of the data as it's stored in the database and the canonical model. To see what I mean, think of modeling a many-to-many relationship in a SQL database. Got it?&lt;/p&gt;
&lt;p&gt;
	With RDF there is a much greater focus on enabling the creation of a universal canonical model. That is, if the data itself is in Excel, or CSVs, or a relational database, or in a NoSQL database, a simple mapping to RDF can be attained, making that data much more digestible and repurposable than it otherwise would be.&lt;/p&gt;
&lt;p&gt;
	So RDF is indeed the universal data solvent.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/cjYvrj4--xw" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-06-14T17:52:44Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/rdf-is-the-universal-data-solvent</feedburner:origLink></entry>
  <entry>
    <title>Unstructured Data and Knowledge Representation</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/0XtbGXN4mA0/unstructured-data-and-knowledge-representation" />
    <author>
      <name>Richard Mallah</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/unstructured-data-and-knowledge-representation</id>
    <updated>2012-06-02T14:37:41Z</updated>
    <published>2012-06-01T15:15:25Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img alt="Unstructured Data and Knowledge Representation" src="http://www.cambridgesemantics.com/documents/16985/17287/unstructured-data-knowledge-representation.png" style="float:right; padding:0 0 20px 20px;" /&gt; There are many natural language technologies, some of which I touched on last time in &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/introduction-to-unstructured-data" target="_blank"&gt;Introduction to Unstructured Data&lt;/a&gt;. This time I’ll talk about the tradeoffs to consider when turning unstructured data into &lt;em&gt;knowledge&lt;/em&gt; that can be understood by computers. The key idea I’ll delve into is &lt;em&gt;how knowledge that comes from unstructured data should be represented in a computer in the first place&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;KNOWLEDGE REPRESENTATION&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	&lt;em&gt;Knowledge representation&lt;/em&gt; is the area of computer science concerned with assigning meaning (i.e., a collection of meaningful and potentially actionable facts or beliefs) to symbols, or a language, often to facilitate inferencing from those knowledge elements. Knowledge and representation are both very broad terms, so one might think that there are many options in approaching the task, and they'd be right. Over the decades, differing approaches have given rise to dichotomies of sorts.&lt;/p&gt;
&lt;p&gt;
	The goal of this article is to take a step back and, with &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/introduction-to-unstructured-data" target="_blank"&gt;our general business use cases from last time&lt;/a&gt; in mind, summarize the primary dichotomies in order to suggest a way forward to best use knowledge from unstructured data.&lt;/p&gt;
&lt;p&gt;
	Familiarity with knowledge representation methods and tradeoffs is key to using natural language processing technologies correctly. The way that knowledge is structured can determine its downstream levels of utility, accessibility, and reuse. It's an important consideration for integrating information from many sources, from many technologies, across different times, and being able to automatically infer new knowledge from the pieces you get.&lt;/p&gt;
&lt;p style="margin-left:.5in;"&gt;
	Note: This is not a comprehensive survey of knowledge representation techniques. As an academic field, knowledge representation is both broad and deep. For that kind of survey, see &lt;a href="http://www.amazon.com/Knowledge-Representation-Reasoning-Artificial-Intelligence/dp/1558609326/" target="_blank"&gt;&lt;em&gt;Knowledge Representation and Reasoning&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;APPROACHES, DECISIONS, TRADEOFFS&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	&lt;img alt="Unstructured Data and Knowledge Representation: Approaches, Decisions, and Tradeoffs" src="http://www.cambridgesemantics.com/documents/16985/17287/unstructured-data-knowledge-representation-2.png" style="float:left; padding:0 20px 20px 0;" /&gt; Over the years there have been a variety of methods for representing knowledge originally expressed in natural language. Decisions have been made in defining these approaches, each with a wide variety of tradeoffs and downstream implications. Three of the more fundamental of these decisions are:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;
		relative vs. explicit&lt;/li&gt;
	&lt;li&gt;
		word vs. concept&lt;/li&gt;
	&lt;li&gt;
		procedural vs. declarative&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
	We’ll cover each of these in turn.&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;RELATIVE MEANING VS. EXPLICIT MEANING&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	What in the world of information science is unstructured data, text, most similar to, most blendable with, and therefore understandable with?: other unstructured data.&lt;/p&gt;
&lt;p&gt;
	One decision is whether different parts of unstructured data should just be understood relative to other parts, i.e. finding statistical relationships among fragments of the document set, or corpus, or should require some other, external grounding context, such as a concept structure for explicit meaning.&lt;/p&gt;
&lt;p&gt;
	Some methods assume the former. In many of the common ways that unstructured data are handled these days, such as in search-, clustering-, and classification-based systems, a statistical relationship is calculated between and among documents. That is their primary means of representing a kind of knowledge, rough signatures for rough concepts. Exploration of the corpus is either done through searches, where each query is treated as a little document of its own and the statistical space is queried for similar data, or through faceted navigation where clusters of documents are represented by differentiating phrases.&lt;/p&gt;
&lt;p&gt;
	A drawback of this approach is that it's usually only good for a single hop from one query phrase or document to another phrase or document: it's lossy in that there are many hits of varying, middling strength, to the relevant chunks of knowledge, each of which having a very different meaning. Only the rough topic is preserved, and a human then needs to select from among the choices, the documents.&lt;/p&gt;
&lt;p&gt;
	&lt;a href="http://www.correlationconcepts.com" target="_blank"&gt;Some folks&lt;/a&gt; take this approach some steps further, using a finer 'knowledge' granularity, with the aim of building a web of relationships between small document fragments. They treat each fragment as a context, or a knowledge nugget, again related to others statistically, and they try to find paths of multiple hops to connect one concept to another. The drawback to this is that the number of paths grows exponentially, the quality of links become tenuous, and humans or some very specific human-defined procedures need to determine relevance, limiting automation.&lt;/p&gt;
&lt;p&gt;
	&lt;a href="http://www.autonomy.com/content/Technology/autonomys-technology-a-different-approach/index.en.html" target="_blank"&gt;Many companies&lt;/a&gt;, have taken this approach of intracorpus relationships because they started out as search companies. They try to improve their concept of a concept by augmenting keyword-type relationships with statistical expansions of words to those that often appear together. A key hypothesis in this is that concepts that appear together enough are then very related in meaning, or even that set or class membership can be learned from terms appearing near each. It would make sense that broad diffuse &lt;em&gt;topics&lt;/em&gt; and their relationships can be derived that way, because the granularity of the context in content would be en par with that, but, ‘red’ and ‘orange’ appear near each other very infrequently. Anything closer to knowledge and meaning will require another approach.&lt;/p&gt;
&lt;p&gt;
	Viewed this way, relative-meaning technologies are fundamentally search technologies, not knowledge representation technologies. Fixed points of reference of some sort are therefore required for grounding in more definite, concrete, and reusable knowledge.&lt;/p&gt;
&lt;p&gt;
	So what form can these reference points—which are required for knowledge representation—take? Are something like concepts necessary, and if so, what does 'concept' mean in this context? Perhaps words (e.g. in a large hyperlinked reference dictionary of some kind) are all that are needed?&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;WORD VS. CONCEPT&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	Unstructured data consists of words. It would seem elegant and intuitive to have words stand for their own meanings.&lt;/p&gt;
&lt;p&gt;
	Many proponents of the decision to go with words for meaning use something like &lt;a href="http://wordnet.princeton.edu/" target="_blank"&gt;WordNet&lt;/a&gt;—essentially a dictionary and thesaurus—as a general knowledgebase of what meanings and roles each word can have and how they are related.&lt;/p&gt;
&lt;p&gt;
	For example, by referencing something like WordNet the phrase 'chocolate chips' can be recognized as a phrase that's also a proto-concept, as 'chips', little pieces, that are 'chocolate'.&lt;/p&gt;
&lt;p&gt;
	Unfortunately, few of the higher-level concepts people care about in real-world use cases are actually in the dictionary. The problem is easily seen when jargon in a specific field is prevalent. For example, in a document containing clinical trial information, an attempt to look up 'drug', 'pipeline', and 'phase' together individually in the dictionary will not help your word-based system realize that 'pre-clinical' is something relevant to the concept at hand.&lt;/p&gt;
&lt;p&gt;
	Trying to make a more comprehensive dictionary with more linkages between words and phrases and attempting to tie together anything potentially related in meaningful ways would lead to an intractable combinatorial explosion.&lt;/p&gt;
&lt;p&gt;
	Context, with some core &lt;em&gt;knowledge&lt;/em&gt; to bootstrap with, are the crucial components of meaning missing when words are just treated as words. The alternative, when these are present, is a means to classify and represent &lt;em&gt;concepts&lt;/em&gt; themselves. These can be as broad or specific as is appropriate, in any conceptual dimension, and, critically, independent of English or Chinese or the way something looks.&lt;/p&gt;
&lt;p&gt;
	This is why even Google, formerly the biggest proponent of both word-based knowledge and relative-meaning technologies, &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/google-knowledge-graph-and-the-semantic-web" target="_blank"&gt;just this past month started moving to and championing the approach of the conceptual model of knowledge&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
	But how do we get computers to work with &lt;em&gt;concepts&lt;/em&gt;? How do computers normally reason? Programs. Computers seem to &lt;em&gt;understand&lt;/em&gt; programs much better than they do data.&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;PROCEDURAL KNOWLEDGE VS. DECLARATIVE CONCEPTS&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	&lt;img alt="Unstructured Data and Knowledge Representation: Procedural Knowledge vs. Declarative Concepts" src="http://www.cambridgesemantics.com/documents/16985/17287/unstructured-data-knowledge-representation-3.png" style="float:right; padding: 0 0 20px 20px;" /&gt; People understand meaning through the process of interacting with their world. Stories, processes, and sensation are inextricably embedded in the substrate of that understanding.&lt;/p&gt;
&lt;p&gt;
	The closest analog in the computing world might be the neural network of a robot that learns to improve its effectiveness at interacting with its environment over time. Intelligent agents like this robot can form their own internal representations relevant for the specific goals it has, relative to both the input or sensory channels it receives and the ways it can respond to its environment.&lt;/p&gt;
&lt;p&gt;
	An analogue in the world of unstructured data would be an &lt;a href="http://en.wikipedia.org/wiki/Procedural_Reasoning_System" target="_blank"&gt;interpreter&lt;/a&gt; that reads and establishes beliefs, changes the way it reads depending on those beliefs, and learns where to go to find what it wants.&lt;/p&gt;
&lt;p&gt;
	Knowledge of this kind is called &lt;a href="http://en.wikipedia.org/wiki/Procedural_knowledge" target="_blank"&gt;&lt;em&gt;procedural knowledge&lt;/em&gt;&lt;/a&gt;: conceptual knowledge directly applicable to the real world.&lt;/p&gt;
&lt;p&gt;
	Unfortunately for our uses in natural language processes, text analytics, and workflow optimization, it is situated and tacit knowledge, rather than explicit knowledge. It is job-dependent and agent-specific, and is much less general, much less reusable, than declarative knowledge. When knowledge is developed as &lt;em&gt;procedures&lt;/em&gt; in this way, rather than explicit knowledge structure, it is siloed, and thus as a model cannot scale well. There are too many non-portable assumptions and structural dependencies in place.&lt;/p&gt;
&lt;p&gt;
	In contrast, with &lt;em&gt;declarative knowledge&lt;/em&gt;, concepts are expressed as interpretable facts, sentences, or propositions, often associated with a context. The lack of procedural logic in them is a strength, allowing multiple types of algorithms to operate over the same explicit, declarative knowledge, potentially cooperating, even without coordination.&lt;/p&gt;
&lt;p&gt;
	A metaphor in the relational database world comparing procedural to declarative knowledge would be an app that's heavily dependent on black-box stored procedures, versus a purely data-driven application.&lt;/p&gt;
&lt;p&gt;
	Declarative knowledge is able to blend principles, conceptual models, causal networks, reasoning, and configurable hierarchical planning, and incrementally add knowledge from different but relevant contexts.&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;THE DECLARATIVE LANDSCAPE&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	Declarative knowledge includes the structure of information—its schema or domain model—in the way that people, subject matter experts, think of the landscape of what they know.&lt;/p&gt;
&lt;p&gt;
	However, even declarative knowledge representation has its own schisms of sorts:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		more rule-oriented vs. more object-oriented approaches&lt;/li&gt;
	&lt;li&gt;
		pure frame representations vs. richer-ontology semantic networks&lt;/li&gt;
	&lt;li&gt;
		consistency-guaranteed vs. inconsistency-tolerant representations&lt;/li&gt;
	&lt;li&gt;
		purely symbolic vs. probability-enhanced semantics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
	Regardless of the schisms, the important thing is that these variations can still work together.&lt;/p&gt;
&lt;p&gt;
	All of these representations are explicit, are openly accessible to logic, and are able to share concepts with each other. The better modern semantic knowledge integration platforms support all of these variations in tandem. The lingua franca of these different approaches and capabilities are the explicit &lt;a href="https://www.google.com/search?q=define%3Aintension&amp;amp;ie=utf-8&amp;amp;oe=utf-8&amp;amp;aq=t" target="_blank"&gt;intension&lt;/a&gt;al and irreducible semantic concepts.&lt;/p&gt;
&lt;p&gt;
	There are many types of logic, many types of rules, many types of reasoners, but when they utilize the same concept space they become compatible, providing some great leverage. With portable rules, new facts and even new rules can be deduced, or reasoned, by potentially multiple different reasoners with their own specializations.&lt;/p&gt;
&lt;p&gt;
	Because all of these varieties can live together, because the knowledge is structured in the way a subject matter expert thinks of things, and because the knowledge is so reusable, we can actually think of declarative knowledge as a whole as a single approach in its own right. In fact, declarative knowledge is the sweet spot between structure and flexibility in knowledge representation.&lt;/p&gt;
&lt;p&gt;
	The framework for handling all of the above declarative variations, semantic network technology, is thereby the lowest level of groundwork needed to raise information up and imbue sufficient &lt;em&gt;meaning&lt;/em&gt; to a level that breaks what we might call the &lt;em&gt;meaning barrier&lt;/em&gt;, where the knowledge can very easily be leveraged in ever better ways, and is not trapped or overly lossy.&lt;/p&gt;
&lt;p&gt;
	A declarative framework can also easily connect related contexts and provenance, like connecting an equity's calculated buy/sell recommendation, to the target price that went into that calculation, to the research report it was stated in, to the author of that report, to that author’s previous work history. The line between data and metadata drops away.&lt;/p&gt;
&lt;p&gt;
	With this base, we can move on to more value-added considerations.&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="font-size:14px;"&gt;&lt;span style="color: #0055bb;"&gt;IN SUMMARY&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;
	We've found that explicit meaning is needed for anything we'd want to call knowledge, concepts are much better than words for making things workable, and declarative knowledge opens up new dimensions of flexibility and reusability.&lt;/p&gt;
&lt;p&gt;
	Next time in our third installment on unstructured data, we will directly address the convergence of unstructured and structured data, the union of text analytics and semantic technology, where core knowledge can come from, and how cooperation with little or no coordination empowers the next generation of &lt;a href="http://semanticweb.com/two-kinds-of-big-dat_b21925" target="_blank"&gt;horizontal big data&lt;/a&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/0XtbGXN4mA0" height="1" width="1"/&gt;</summary>
    <dc:creator>Richard Mallah</dc:creator>
    <dc:date>2012-06-01T15:15:25Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/unstructured-data-and-knowledge-representation</feedburner:origLink></entry>
  <entry>
    <title>Relational Database Advantages</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/nVeSxpky5Vg/relational-database-advantages" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/relational-database-advantages</id>
    <updated>2012-05-31T15:00:49Z</updated>
    <published>2012-05-31T14:26:36Z</published>
    <summary type="html">&lt;p&gt;
	With all the NoSQL buzz out there, I think many of us are forgetting how important the advantages of relational databases have been.&lt;/p&gt;
&lt;p&gt;
	I had a conversation over email with Greg Bean in which he summarized many of the key advantages that drove people &lt;em&gt;to&lt;/em&gt; relational databases &lt;em&gt;from&lt;/em&gt; key-value stores, hirearchical stores, and other kinds of weird storage in general back in the day.&amp;nbsp; I find the reasons very relevant today in the context of the Semantic Web vs. Relational conversation, so I'm including them here, with Greg's permission (though the emphasis is mine):&lt;/p&gt;
&lt;blockquote&gt;
	&lt;p&gt;
		For some of us, the resistance to relational technology is still clearly remembered.&lt;/p&gt;
	&lt;p&gt;
		The arguments against it were all the classic resistance to change arguments with the bogeyman being that relational technology would not be able to perform; DB2 would fail, Oracle was marketing hype, DEC’s Rdb was unproven, etc etc.&lt;/p&gt;
	&lt;p&gt;
		&lt;strong&gt;That was 1987. By 1990 the world was embracing relational with a passion. Why? Because it was based on sound theory and had a set of standards that everyone understood and they made damn good sense to anyone who had struggled with the many file systems and utilities that were in existence at the time.&lt;/strong&gt;&lt;/p&gt;
	&lt;p&gt;
		The Bank I was working with probably had over a dozen common file systems in use and many more peripheral ones. There was no data interchange standard and even worse, there was not even a definitive data dictionary for many of the file systems. I don’t know if you’ve ever worked with file systems like KSAM or ISAM or non-relational databases like IDMS or Total/Image but they were each a world unto themselves and damn hard to move data between. Each had at least one utility and in some cases many utilities simply to manage the file system. And every utility needed a skill set that put demands on the IT department.&lt;/p&gt;
	&lt;p&gt;
		Relational changed all that.&lt;/p&gt;
	&lt;p&gt;
		Don’t misunderstand, the resistance was from non-relational practitioners who had not taken the time to understand the theory and therefore only saw it as another file system, not The File System that would replace all file systems. In short, they were not like yourself, making astute assessments and setting expectations, they were the nay-sayers.&lt;/p&gt;
	&lt;p&gt;
		However, I learned a valuable lesson, basically &lt;strong&gt;when the theory behind a technology makes sense you can ignore the peripheral (performance, etc) issues (somewhat) as the benefits far outweigh the cost of a bit more processing power&lt;/strong&gt;. There is a timing consideration, relational was good theory long before it became good technology, but I get the feeling that Sem Web in the last few years has moved from good theory to good technology; still quite leading edge but with many commercially viable offerings and being taken up by the market.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
	There are some gems in here, but I'll just pick on one: timing and performance.&lt;/p&gt;
&lt;p&gt;
	I remember thinking in 2006 that Semantic Web technology will never take off because you really couldn't store anything in an RDF database. Query performance on even small databases was just unusable for anything that wasn't a toy, and forget federated performance.&lt;/p&gt;
&lt;p&gt;
	That's no longer the case.&lt;/p&gt;
&lt;p&gt;
	As Greg said:&lt;/p&gt;
&lt;blockquote&gt;
	That was 1987. By 1990 the world was embracing relational with a passion. Why? Because it was based on sound theory and had a set of standards that everyone understood and they made damn good sense to anyone who had struggled with the many file systems and utilities that were in existence at the time.&lt;/blockquote&gt;
&lt;p&gt;
	In today's enterprise we're in that same situation...except, ironically, instead of relational databases being the unifying factor they create silos. ETL projects are required for any integration, and once a mesh of ETL pipeline exist connecting a bunch of databases and applications any change anywhere is very, very expensive.&lt;/p&gt;
&lt;p&gt;
	With Semantic Web technologies, databases can be more loosely coupled, enabling lightweight integrations without requiring ETL pipelines. New information sources can be connected to the semantic fabric easily, and local changes are less holistically disastrous.&lt;/p&gt;
&lt;p&gt;
	The principles are sound. The technology is standardized. The maturity, performance and scalability of the technologies today is more than adequate. That's why we're starting to see great successes in the space.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/nVeSxpky5Vg" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-05-31T14:26:36Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/relational-database-advantages</feedburner:origLink></entry>
  <entry>
    <title>Google Knowledge Graph and the Semantic Web</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/7DCdubiO0eY/google-knowledge-graph-and-the-semantic-web" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/google-knowledge-graph-and-the-semantic-web</id>
    <updated>2012-05-17T17:49:39Z</updated>
    <published>2012-05-17T17:01:39Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img alt="Knowledge Graph and the Semantic Web" src="http://www.google.com/insidesearch/images/knowledge/knowledge-van-gogh.png" style="float:right; padding: 0 0 20px 20px;" /&gt; This week Google announced the &lt;a href="http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html"&gt;Knowledge Graph&lt;/a&gt;. First of all, I &lt;i&gt;love&lt;/i&gt; the slogan "Things Not Strings." That about sums it up. When you can take advantage of the &lt;i&gt;meaning&lt;/i&gt; trapped in text, you can &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/introduction-to-unstructured-data"&gt;do quite a bit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
	Even though Google is late to the party on this particular trend, it's great to see them making real progress. I had previously used Google rich search results as an &lt;a href="http://www.cambridgesemantics.com/semantic-university/semantic-search-and-the-semantic-web"&gt;example of semantic search&lt;/a&gt; on the Web, and the Knowledge Graph takes it to the next level.&lt;/p&gt;
&lt;p&gt;
	From the description:&lt;/p&gt;
&lt;blockquote&gt;
	Search is a lot about discovery—the basic human need to learn and broaden your horizons. But searching still requires a lot of hard work by you, the user. So today I’m really excited to launch the Knowledge Graph, which will help you discover new information quickly and easily.&lt;/blockquote&gt;
&lt;blockquote&gt;
	Take a query like [taj mahal]. For more than four decades, search has essentially been about matching keywords to queries. To a search engine the words [taj mahal] have been just that—two words.&lt;/blockquote&gt;
&lt;blockquote&gt;
	But we all know that [taj mahal] has a much richer meaning. You might think of one of the world’s most beautiful monuments, or a Grammy Award-winning musician, or possibly even a casino in Atlantic City, NJ. Or, depending on when you last ate, the nearest Indian restaurant. It’s why we’ve been working on an intelligent model—in geek-speak, a “graph”—that understands real-world entities and their relationships to one another: things, not strings.&lt;/blockquote&gt;
&lt;h2&gt;
	The Knowledge Graph and the Semantic Web&lt;/h2&gt;
&lt;p&gt;
	So what does this mean for the Semantic Web? It depends.&lt;/p&gt;
&lt;p&gt;
	It sounds like Google has a curated Knowledge Graph that they control, and they use it to map documents on the web to concepts in the Knowledge Graph. It can then use the Knowledge Graph to expand the search results. &amp;nbsp;For example, searching for something like "Tim Berners-Lee student" Google would &lt;em&gt;know&lt;/em&gt;&amp;nbsp;that TBL is a person and would also know who his students have been over the years, and return information on them&amp;nbsp;&lt;em&gt;instead of on TBL&lt;/em&gt;. &amp;nbsp;That kind of thing is basically impossible without something like a Knowledge Graph, and enables much richer querying behaviors.&lt;/p&gt;
&lt;p&gt;
	If this is the case, then &lt;i&gt;hopefully&lt;/i&gt; metadata in web pages exposed via Schema.org (also backed by Google) vocabularies or popular Semantic Web ontologies such as FOAF makes it easier for Google to index and consume rich data on Web pages. Those that add the extra metadata are treated better in search results, and everyone would be encouraged to add more metadata to their pages. That would be a big step forward in the right direction. Explicit metadata on pages takes much of the guesswork out.&lt;/p&gt;
&lt;p&gt;
	Another way it could benefit the Semantic Web is if Google were to publish the Knowledge Graph (via a SPARQL endpoint naturally!) making it part of the Linked Open Data Cloud. I can see it very quickly surpassing DBpedia in popularity. More importantly, I can imagine plugins that enable you to mouse over words on web pages and have a Knowledge Graph-powered panel popping up for more detail and other such low hanging usability improvements to information pages.&lt;/p&gt;
&lt;h2&gt;
	Conclusions&lt;/h2&gt;
&lt;p&gt;
	What do you all think?&lt;/p&gt;
&lt;p&gt;
	Regardless of what happens next, this underscores the continuing trend beyond keyword search to actual meaning, and that is another step towards the vision of the Semantic Web.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/7DCdubiO0eY" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-05-17T17:01:39Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/google-knowledge-graph-and-the-semantic-web</feedburner:origLink></entry>
  <entry>
    <title>How is semantic technology more flexible than relational technology?</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/4WAPDwuyklw/how-is-semantic-technology-more-flexible-than-relational-technology-" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/how-is-semantic-technology-more-flexible-than-relational-technology-</id>
    <updated>2012-05-09T21:54:13Z</updated>
    <published>2012-05-09T21:46:46Z</published>
    <summary type="html">&lt;p&gt;
	Michael from &lt;a href="http://www.semanticarts.com/"&gt;SemanticArts&lt;/a&gt; started &lt;a href="http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&amp;amp;discussionID=110588815&amp;amp;gid=49970&amp;amp;commentID=79977924&amp;amp;trk=view_disc&amp;amp;ut=0dBOqoRZkXWBc1"&gt;this great thread&lt;/a&gt; on Linked In a couple weeks back.&amp;nbsp; The question: How is semantic technology more flexible than relational technology?&lt;/p&gt;
&lt;p&gt;
	There is a lot of researchy fluff in there, but I wanted to highlight a &lt;a href="http://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions#ontology-agreement"&gt;misconception&lt;/a&gt; that is widely held in the community.&lt;/p&gt;
&lt;p&gt;
	He starts with an initial overview that I liked quite a bit (emphasis on the good stuff is mine):&lt;/p&gt;
&lt;p style="margin-left: 40px;"&gt;
	&lt;cite&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;For me the flexibility of semantic technologies comes from when you classify or organize data.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;In a traditional relational database you define a schema (a classification/organizational tool), and you put data into the database organized according to that schema. This has one advantage -- it allows for speedy data access. But it also has one disadvantage. Your database contains data organized according to your understanding of your business at a fixed point in time -- but since things change, your business will immediately diverge from that understanding.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;(BTW the same argument applies to OO databases, and OO programming languages, indeed any situation where you chose the organization scheme first)&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;Contrast this to semantic technologies. You put data into a triple store in a somewhat unorganized manner (it's not completely unorganized as you have to chose names for properties, classes etc, and decide what you want to record). &lt;strong&gt;You do very little classification/organization on data insertion. Only when you pull data out of the triple store do you organize it according to some ontology&lt;/strong&gt; (here's where reasoning helps!). As a result you are always pulling data out organized according to your most recent need for and understanding of your business. Of course there is a downside -- speed.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;A good analogy for the semantic approach is an archeologist at a new dig. He lays out a grid so he can say where objects are located, and he has some basic vocabulary like "made of wood", "made of ceramic", etc. He then simply writes down facts about the things he finds -- "Artifact 12 made of wood", "Artifact 12 found at location XYZ". He doesn't organize his data beyond this. Later he may add other facts like "Artifact 12 dated to 650AD", etc. Later when he wants to study some of the contents of the dig, he defines what he is interested in and then gets his army of graduate students (his reasoner!), to organize the relevant artifacts (i.e. to build his schema) according to his definition of what he wants. The archeologist has to work this way since a priori he doesn't know how to organize his findings -- he only knows the organization he wants as new facts come to light.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;Another way to look at the schema vs no-schema debate is as points on a spectrum. At one end is flexible, at the other is speed (the constant tension in IT ...). If you really need speed go for a schema based approach. If you really need flexibility go for a no-schema based approach. If you need a mix of both, chose some point in the middle -- it's an engineering decision.&lt;/span&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;
	But then makes the fallacy (emphasis mine):&lt;/p&gt;
&lt;p style="margin-left: 40px;"&gt;
	&lt;cite&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;I must admit that the analogy isn't wholly my idea. I gave a talk about semantic web for OO programmers at the San Diego JUG and an actual archeologist was in the audience. He came up to me afterwards to ask lots of questions. His questions sparked the idea for the analogy.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;&lt;strong&gt;It raises a question though -- if you "mess up" your choice of initial vocabulary, then semantic systems have similar sorts of problems to relational systems, in that you may have to restructure the vocabulary at a later date (just like you have to restructure relational schemas), and hence restructure the data.&lt;/strong&gt;&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;&lt;strong&gt;My experience suggests that such restructuring happens (far?) less often with semantic systems than with relational systems, and that the restructuring is easier, since you can always treat your data as one big list of triples.&lt;/strong&gt;&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;But how do you avoid "messing up" the initial choice of vocabulary. Somehow you have to choose your vocabulary to capture only the "atomic" ideas of interest -- i.e. ideas that cannot be derived from other ideas. The archeologist knows how to do this from years of experience of course, not just his, but the collective years of experience of the field as a whole.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;A professional ontologist somehow manages to choose a vocabulary in such a way as to largely avoid restructuring issues -- choosing it to avoid ontological commitment as much as possible. Are there any metrics/thoughts on how they do this? Also are there any metrics/thoughts on why restructuring semantic systems is easier?&lt;/span&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;
	I think this is very key, and this is how I responded:&lt;/p&gt;
&lt;p style="margin-left: 40px;"&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;I actually take a different view. I believe that messing up ontologies is unavoidable. A simple example is two people publishing similar data sets at different times without reusing each other's ontologies. This happens all the time.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;The power that RDF, OWL, and SPARQL give you in this circumstance is to quickly build maps between related constructs in a way that is not possible in the relational world (you'd be in ETL hell). In some cases, a simple owl:sameAs gets you there. In others, a basic SPARQL CONSTRUCT expression and you're golden. Either way, the translation layer is cheap.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;br style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); " /&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;To get back to your example, if somehow you mess up the model in some fundamental way, it's relatively (as compared to the relational world) cheap to make the modification, do the translation, and (*this* part is key) maintain an interface that is consistent for existing consumers.&lt;span class="Apple-converted-space"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;
	&lt;span style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, 'Nimbus Sans L', sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 15px; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: with Semantics change is cheap, so mistakes are OK.&lt;/span&gt;&lt;/h2&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/4WAPDwuyklw" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-05-09T21:46:46Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/how-is-semantic-technology-more-flexible-than-relational-technology-</feedburner:origLink></entry>
  <entry>
    <title>Introduction to Unstructured Data</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/zR2S9BG-n74/introduction-to-unstructured-data" />
    <author>
      <name>Richard Mallah</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/introduction-to-unstructured-data</id>
    <updated>2012-04-30T18:50:13Z</updated>
    <published>2012-04-30T18:45:15Z</published>
    <summary type="html">&lt;p&gt;
	Expressing, communicating, and understanding meaning is natural to us but opaque to computers. Computers lack shared understandings, norms, languages, and common sense. They are, after all, merely tools. Today, people have to communicate in ways a computer understands for it to be able to work with the meaning, but there are small signs of this changing, like with Apple's Siri and IBM's Watson, but those examples are few and they are siloed.&lt;/p&gt;
&lt;p&gt;
	Where files and streams are not generally intelligible by computers, they are &lt;strong&gt;&lt;em&gt;unstructured data&lt;/em&gt;&lt;/strong&gt; in IT parlance.&lt;/p&gt;
&lt;p&gt;
	&lt;img alt="Introduction to Unstructured Data" src="http://www.cambridgesemantics.com/documents/16985/17287/unstructured-data.png" style="float:left; padding:0 20px 20px 0;" /&gt; Unstructured data represents the majority of enterprise data and it continues to grow faster than people can consume it. It is emails, snailmails, and voicemails. It is handwritten document scans, photographs, and video. It's Word documents, PDF files, and PowerPoint presentations. It's tweets, message board postings, and online reviews. It runs broad and deep.&lt;br /&gt;
	And it holds a lot of locked-away value to organizations.&lt;/p&gt;
&lt;p&gt;
	Most importantly, it contains the most adaptive, fluid information, just at the edge of being gleaned organizationally.&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		Market sentiment: whether for stocks or stockings.&lt;/li&gt;
	&lt;li&gt;
		Breaking news: everything affects you in one way or another.&lt;/li&gt;
	&lt;li&gt;
		The voice of your customer: will you listen?&lt;/li&gt;
	&lt;li&gt;
		Competitors' announcements and the respective buzz: will you compete?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
	More often than structured data does, it contains leading indicators.&lt;/p&gt;
&lt;p&gt;
	Manual processing of unstructured data happens implicitly, most often without anyone realizing that's what's being done. Summarizing a customer's diatribe, forwarding on a letter to the appropriate department, being cognizant of relevant product recalls, or entering dealers' quotes into a spreadsheet.&lt;/p&gt;
&lt;p&gt;
	However, manual processes are seldom scalable, repeatable, or that fast. But why hope that a machine can understand these sources at all?&lt;/p&gt;
&lt;p&gt;
	Using unstructured data properly, computers help bring order to messes, bring people together, and bring ideas and resources together. They can help people share and build on knowledge, can help people find links between things, and can help people care about the implications of things. Conceptually-linked structured or semi-structured metadata and related content could be explicitly linked, enabling enhanced navigation as well as some level of automated inferencing. The endgame is not replacing knowledge workers, but letting them work smarter, with more knowledge being made available to them faster: people can focus more on more value-added contributions rather than spend time on the mechanics.&lt;/p&gt;
&lt;p&gt;
	Making this possible, artificial intelligence and related fields have made great strides in the past couple of decades. Computational linguistics, natural language processing, machine learning, knowledge representation, and big data analytics have all been breaking new ground in theory and practice; but there is still no magic bullet to software understanding everything that a person would. Software can however make sense of large quantities of unstructured data with specific goals in mind like for the above-mentioned, now manual, processes.&lt;/p&gt;
&lt;h2&gt;
	Methods of Dealing with Unstructured Data&lt;/h2&gt;
&lt;aside class="feature-box box callout" style="width:175px; margin:20px 20px 20px 0; float:left; padding:20px; height: auto;"&gt;
	&lt;p style="margin:0; padding:0; text-align:justify; font-style:italic;"&gt;
		Obtaining meaning from unstructured data is often referred to as 'little-s semantics', as differentiated from the 'big-S Semantics' of the Semantic Web and knowledge management. See &lt;a href="http://www.cambridgesemantics.com/semantic-university/nlp-and-the-semantic-web"&gt;Semantic University's "NLP and the Semantic Web"&lt;/a&gt; for an intro into the distinction and relationship between these areas. In the third installment of this series, we will delve into much more detail on successfully melding structured and unstructured data with the convergence of text analytics and semantic knowledge management.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;
	We'll survey contemporary functionality and then we'll peek at what's on the horizon. Common things done with unstructured data today include search, faceting, clustering, summarization, tagging, and information extraction. Collectively, these are often called &lt;em&gt;text analytics&lt;/em&gt;.&lt;br /&gt;
	&lt;br /&gt;
	Search is the most common way we interact with unstructured content today. Sometimes we search on a phrase, like finding relevant pages with traditional web search engines. Sometimes we search, or facet, on a topic, like when we go to the Health news section of a modern news aggregator. Sometimes we implicitly search for what people actually do with the subject at hand, like when we view a product on Amazon just so we can see what people who viewed that item actually bought. Different technologies underlie these examples, but they are all focused on organizing a corpus, or body, of documents, in a way that makes sense in a particular context.&lt;/p&gt;
&lt;p&gt;
	Documents can be automatically summarized, where the software either figures out the most representative sentences within the document, or alternatively, generates an entirely new paraphrasing from what it was able to understand using linguistics, context, topic mentions, and some level of semantic framing. A similar operation is automatically finding apropos topics to tag onto a document, which can then be fed back to faceting workflows.&lt;/p&gt;
&lt;p&gt;
	&lt;em&gt;Information extraction&lt;/em&gt; is the extraction of structured concepts or facts from unstructured data. Things like people, phone numbers, organizations, web addresses, and medications, are examples of classes of what are collectively considered &lt;em&gt;entities&lt;/em&gt;.&lt;img alt="Unstructured semantic processing" src="http://www.cambridgesemantics.com/documents/16985/17287/unstructured-semantic-processing.png" style="float:right; padding:1em 0 1em 1em;" /&gt; &lt;em&gt;Relationships&lt;/em&gt; between entities, such as hirings, product releases, protein-protein interactions, and more complicated events, are another important general class of information that can be extracted and put to direct use. &lt;a href="https://framenet.icsi.berkeley.edu/fndrupal/about"&gt;Semantic frames&lt;/a&gt;, not to be confused with other &lt;a href="http://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies"&gt;semantic technologies&lt;/a&gt;, are advanced meaning-oriented and somewhat context-aware tags and structure on words and phrases (but this raw technique is nearly impossible for an end-user to use in isolation). For applications like brand monitoring on social media, sentiment analysis, whether broad-stroked, or nuanced and attributable to certain things, is another common type of extraction.&lt;/p&gt;
&lt;p&gt;
	When information becomes structured, and knowledge is represented, in ways that can contextually and semantically make sense to a program, it can be reasoned over with automatic inference and user-defined rules, often an impetus for information extraction. Taking the concept of information extraction to its logical conclusion, one may wonder why software can't simply understand everything in any arbitrary text with some kind of general knowledge modeling. To really address this, we need to consider the field of knowledge representation. I have purposely kept the technical discussion of what underlies these approaches out of this article, so the topic of our second installment in this series will be on the various approaches to knowledge representation as it relates to unstructured data.&lt;/p&gt;
&lt;h2&gt;
	Greater Than The Sum Of Its Parts&lt;/h2&gt;
&lt;p&gt;
	Solutions that combine basic unstructured functionalities in novel ways provide more robust options than any technique in isolation. Some impressive text mining programs, &lt;a href="http://www.linguamatics.com/welcome/software/I2E.html"&gt;for instance&lt;/a&gt;, do upfront indexing of entities, and let users search for specific new classes of relationship on the fly, combining information extraction and search.&lt;/p&gt;
&lt;p&gt;
	Taking this power of combination to the n&lt;sup&gt;th&lt;/sup&gt; level, a powerful approach on the horizon is to leverage as many appropriate techniques and technologies as possible, in conjunction, from natural language processing, computational linguistics, information retrieval, information extraction, and machine learning, and to do this in a flexible and easy way that lets you adapt to the goal at hand.&lt;/p&gt;
&lt;p&gt;
	When you can plug in any combination of unstructured technologies and enable them to cooperate in novel ways to solve your problems, technologies can focus on their respective strengths, the system can relate the functionality and output of disparate technologies together, accuracy can improve, you can relate unstructured data back to your structured data, you can automate or structure workflows, and your users gain multiple poignant and empowering perspectives on newly unlocked information.&lt;/p&gt;
&lt;p&gt;
	When information is in the terms that make sense to you, it is knowledge. When it is salient, it becomes important. What you do with it becomes insight. Systems must be able to manage unstructured information throughout its lifecycle—from origination to action—to really make use of it, and to do that requires a combination and correlation of existing techniques to a level unavailable in a single product today.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/zR2S9BG-n74" height="1" width="1"/&gt;</summary>
    <dc:creator>Richard Mallah</dc:creator>
    <dc:date>2012-04-30T18:45:15Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/introduction-to-unstructured-data</feedburner:origLink></entry>
  <entry>
    <title>NoSQL Equals NoSecurity: Sometimes</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/0leLnibcjao/nosql-equals-nosecurity-sometimes" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/nosql-equals-nosecurity-sometimes</id>
    <updated>2012-04-09T18:35:25Z</updated>
    <published>2012-04-09T18:10:49Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img alt="NoSQL Equals NoSecurity" height="324" src="http://www.cambridgesemantics.com/documents/16985/17287/nosql-equals-nosecurity.jpg?version=1.0&amp;amp;t=1333995372250" style="padding:0 0 20px 20px; float:right;" width="313" /&gt; Over at InformationWeek, Michael Davis &lt;a href="http://www.informationweek.com/news/storage/portable/232700412"&gt;wrote an impassioned post&lt;/a&gt; lambasting the lack of proper security in the NoSQL World.&lt;/p&gt;
&lt;blockquote&gt;
	Clearly, the developers driving the NoSQL bus just don't get it. The only thing we've gotten from years of pushing to secure Hadoop and other big data technologies is integration with authentication frameworks such as Kerberos. Excuse us if we don't swoon with gratitude.&lt;br /&gt;
	&lt;br /&gt;
	As technologies like Hadoop and NoSQL go mainstream, this situation must be addressed.&lt;/blockquote&gt;
&lt;p&gt;
	His big concern is that people use NoSQL to store and manage &lt;b&gt;financial data&lt;/b&gt;, such as transactions, which occur in volumes too large to be effectively managed by traditional database technologies.&lt;/p&gt;
&lt;p&gt;
	I've written about this kind of issue before in &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/what-happened-to-nosql-for-the-enterprise-"&gt;What Happened to NoSQL for the Enterprise&lt;/a&gt;. Developers are using NoSQL systems to solve specific problems, but leave lots of traditional database features on the table to make that bargain. To get to significant scale and easy cluster management, they give up on transactions or, in this case, security.&lt;/p&gt;
&lt;p&gt;
	Proper security is not only tricky to implement, but typically has a performance cost.&amp;nbsp; This cost goes against the main reason to use NoSQL databases: blazing scale and performance. What Michael identifies is that as soon as you start storing sensitive information—personal data on customers, financial information, medical data—you shouldn't make this bargain. And I agree.&lt;/p&gt;
&lt;p&gt;
	As I've said before, Semantic Web technologies represent a very interesting NoSQL solution for the enterprise exactly because they don't jettison database best practices in order to get NoSQL benefits. &lt;a href="http://www.cambridgesemantics.com/products/role-based-access-control"&gt;Anzo&lt;/a&gt;, for example, supports transactions, logging, data provenance, encryption, fact-level security (think cell-level), etc. And we're not alone in this space. Revelytix has done a great deal of work at the US DoD, for example, and the DoD certainly takes security seriously.&lt;/p&gt;
&lt;p&gt;
	There are certainly lots of applications where this is not important. But we can't just ignore security requirements to play with trendy technologies. For enterprises looking for serious alternatives to SQL systems&amp;mdash;especially where flexibility and the ability to use unstructured data are concerned&amp;mdash;Semantic Web systems represent the most mature databases around today.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/0leLnibcjao" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-04-09T18:10:49Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/nosql-equals-nosecurity-sometimes</feedburner:origLink></entry>
  <entry>
    <title>SWiPE: An Example of Easier Semantic Web Software</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/EXfWCW32BsE/swipe-an-example-of-easier-semantic-web-software" />
    <author>
      <name>Lee Feigenbaum</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/swipe-an-example-of-easier-semantic-web-software</id>
    <updated>2012-04-04T01:59:13Z</updated>
    <published>2012-04-04T01:36:15Z</published>
    <summary type="html">&lt;p&gt;
	I was thrilled to come across ZDNet's &lt;a href="http://www.zdnet.com/blog/feeds/swipe-allows-deep-search-semantic-queries-using-the-wikipedia-ui/4698"&gt;coverage of SWiPE, a query-by-example approach to structured searching of wikipedia&lt;/a&gt;. SWiPE is being developed by &lt;a href="http://riemann.unica.it/~atzori/"&gt;Maurizio Atzori&lt;/a&gt; and &lt;a href="http://www.cs.ucla.edu/~zaniolo/"&gt;Carlo Zaniolo&lt;/a&gt; and will be presented later thismonth at the demo track of the &lt;a href="http://www2012.wwwconference.org/"&gt;WWW2012 conference&lt;/a&gt; in Lyon. While information about the SWiPE design and implementation are presented in excruciating detail in Maurizio and Carlo's &lt;a href="http://www2012.wwwconference.org/"&gt;conference paper&lt;/a&gt;, the formality and detail of the paper belies the simplicity and ease-of-use of the tool.&lt;/p&gt;
&lt;p&gt;
	Maurizio and Carlo are building on top of &lt;a href="http://dbpedia.org/About"&gt;DBPedia&lt;/a&gt;, the Semantic Web representation of information (mostly) from Wikipedia infoboxes, but you wouldn't know that from watching the tool in action. There's no URIs, no mention of RDF, no mention of SPARQL; finding specific answers to a question is straightforward. Go ahead and watch the demo of SWiPE in action. It's only 23 seconds, I'll wait:&lt;/p&gt;
&lt;div style="text-align:center"&gt;
	&lt;iframe allowfullscreen="" frameborder="0" height="315" src="http://www.youtube.com/embed/McEp7B7kxLY" width="560"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;p&gt;
	Got it? SWiPE lets you fill in values in existing infoboxes to find any entries on wikipedia that match the information you supply. It's simple and obvious to use, and it doesn't require any new context or user interface beyond the infoboxes that we all know very well already.&lt;/p&gt;
&lt;p&gt;
	SWiPE is a fantastic example of what I've written about before: &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/why-semantic-web-software-must-be-easy-er-to-use"&gt;the need for Semantic Web software that is easy to use&lt;/a&gt;. As long as people need to learn SPARQL or learn how to use a linked data browser or learn how to use a circles-and-arrows-based query tool, the vast majority of people who could benefit from the power of DBPedia's structured representation of Wikipedia data were going to be missing out. By making Wikipedia search simple, though, SWiPE has the potential to bring this benefit to a far greater audience. I, for one, can't wait to try it out myself.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/EXfWCW32BsE" height="1" width="1"/&gt;</summary>
    <dc:creator>Lee Feigenbaum</dc:creator>
    <dc:date>2012-04-04T01:36:15Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/swipe-an-example-of-easier-semantic-web-software</feedburner:origLink></entry>
  <entry>
    <title>Semantics in the Real World: Where to Begin?</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/PmYqenY2Vr8/semantics-in-the-real-world-where-to-begin-" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/semantics-in-the-real-world-where-to-begin-</id>
    <updated>2012-04-03T19:47:23Z</updated>
    <published>2012-04-03T15:25:27Z</published>
    <summary type="html">&lt;p&gt;
	A lot of the chatter in the semantics community is jargon-heavy, very technical, or both.&amp;nbsp; To a newcomer trying to evaluate what problem, if any, might benefit from semantic technologies (and even which to apply!), this can be very confusing.&lt;/p&gt;
&lt;p&gt;
	As part of &lt;a href="http://www.cambridgesemantics.com/semantic-university"&gt;Semantic University&lt;/a&gt;, we decided to include some introductory content to help people identify what kinds of appilcations could benefit from Semantic Web technologies, as free from jargon as possible.&amp;nbsp; This set of four articles starts with what characteristics a use case might have to make it a good target for semantic technology application, and then continues to provide surveys of semantic technology applications on the web and in the enterprise, finishing with some short case studies.&lt;/p&gt;
&lt;p&gt;
	Feel free to send these along to anyone that says, "So what?" or "Why?"&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		&lt;a href="http://www.cambridgesemantics.com/semantic-university/what-makes-a-good-semantic-web-application"&gt;What Makes a &lt;em&gt;Good&lt;/em&gt; Semantic Web Application?&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;
		&lt;a href="http://www.cambridgesemantics.com/semantic-university/semantic-web-on-the-web"&gt;Semantic Web on the Web&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;
		&lt;a href="http://www.cambridgesemantics.com/semantic-university/semantic-web-in-the-enterprise"&gt;Semantic Web in the Enterprise&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;
		&lt;a href="http://www.cambridgesemantics.com/semantic-university/example-semantic-web-applications"&gt;Example Semantic Web Applications&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/PmYqenY2Vr8" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-04-03T15:25:27Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/semantics-in-the-real-world-where-to-begin-</feedburner:origLink></entry>
  <entry>
    <title>Introducing Semantic University</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/cxRJk7_tVz8/introducing-semantic-university" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/introducing-semantic-university</id>
    <updated>2012-04-02T14:12:53Z</updated>
    <published>2012-03-13T13:05:28Z</published>
    <summary type="html">&lt;p&gt;
	&lt;a href="http://www.cambridgesemantics.com/semantic-university" style="width:200px; height:69px; float:right; padding:0 0 20px 20px;"&gt;&lt;img height="69" src="http://www.cambridgesemantics.com/documents/16985/30245/semantic-university-logo-200x69.png?version=1.0&amp;amp;t=1331328190000" width="200" /&gt;&lt;/a&gt; Today I'm very proud to announce the launch of &lt;a href="http://www.cambridgesemantics.com/semantic-university"&gt;Semantic University&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
	We are creating Semantic University to be the most accessible and most complete place to learn about Semantic Web and other semantic technologies. Although there is only one lesson posted today, we have dozens in various stages of production, with many more ready to go. &lt;a href="http://eepurl.com/kzaaL"&gt;Subscribe to the mailing list&lt;/a&gt; to get updates when new lessons come out.&lt;/p&gt;
&lt;h2&gt;
	The Need&lt;/h2&gt;
&lt;p&gt;
	Let's face it: there needs to be better curated material about semantic technologies. This is especially true for people new to the space, but also applies to those with a basic understanding and looking to &lt;em&gt;do something&lt;/em&gt; using them.&lt;/p&gt;
&lt;p&gt;
	Put yourself in the shoes of someone who just heard about semantics—a colleague told you, or you saw a presentation or demo at a conference, or someone forwarded &lt;a href="http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html"&gt;Tim Berners-Lee's TED talk&lt;/a&gt; to you. &amp;nbsp;You would go to Google and search for "What is the Semantic Web?" How many pages would you have to go through to get to something suitable for a real beginner?&lt;/p&gt;
&lt;p&gt;
	Or suppose you're a non-technical buyer who is being told that Microsoft's SQLServer 2012 has a "semantic modeling layer," how are you to know how that's different?&lt;/p&gt;
&lt;p&gt;
	We experience this as a vendor and as a business all the time. We recently hired a company to help us out with our website, and in the second meeting they said to us, "Listen, we spent a couple hours trying to figure out what the heck the Semantic Web is, and are still a little confused. We think we need to at least have some idea to work with you; can you take some time to give us a tutorial?" They're not wrong.&lt;/p&gt;
&lt;p&gt;
	The learning curve today is too steep. Although there are blogs and articles that tackle pieces of the problem here and there, you really have to be motivated to sift through it all yourself and piece together an understanding. &amp;nbsp;Getting to the point where you know when to apply the tools and when not to is a daunting task.&lt;/p&gt;
&lt;h2&gt;
	Compared to Other Communities&lt;/h2&gt;
&lt;p&gt;
	When I look at other communities that are taking off, such as Node.js, MongoDB, Ruby on Rails, etc., they all have really terrific places for people to &lt;em&gt;learn&lt;/em&gt;. They also do a great job integrating into existing ecosystems and patterns of behavior &lt;em&gt;in their examples&lt;/em&gt;. If you look at MongoDB, for example, they have tutorials on using Mongo with many different web CMS systems, and in almost any language, and the tutorials get you up and running on a simple app in under an hour.&lt;/p&gt;
&lt;p&gt;
	Where can you go for this experience in Semantic Web technologies? A book isn't sufficient, since a person is already committed to learning the technology by the time he makes the purchase.&lt;/p&gt;
&lt;h2&gt;
	Beyond the Basics&lt;/h2&gt;
&lt;p&gt;
	Even when you get beyond an initial understanding, the RDF tutorials out tend to focus on RDF/XML syntax, which is a tough place to start. There are few SPARQL tutorials that walk you through the process from the very beginning—download a SPARQL client, try this query, etc. There are more debates about RDFa vs. Microdata than there are helpful guides on getting started with RDFa.&lt;/p&gt;
&lt;p&gt;
	Not enough focus on doing something &lt;em&gt;real&lt;/em&gt;. They don't do a great job of putting the technologies into a relevant-&lt;em&gt;enough&lt;/em&gt; context to let people take them out of the lab and into the real world, except through big vendor deals.&lt;/p&gt;
&lt;h2&gt;
	The Mission&lt;/h2&gt;
&lt;p&gt;
	This has to change. Our community has to grow. The time is right for the Semantic Web to take off. &amp;nbsp;Schema.org, Facebook Open Graph, the growth of the Linked Data Cloud: it feels like we are so close and just need something to tip.&lt;/p&gt;
&lt;p&gt;
	An accessible place to get started is key to that.&lt;/p&gt;
&lt;p&gt;
	Today we launch the beginning of Semantic University. If you would like to contribute, we are actively looking for more writers and people to create hands-on exercises. Just &lt;a href="mailto:SemanticUniversity@cambridgesemantics.com"&gt;drop us a line&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
	Existing Materials&lt;/h2&gt;
&lt;p&gt;
	This is not to say that there are not existing tutorials, presentations, videos, blog posts, and documentation that is high quality and relevant to the Semantic Web. &amp;nbsp;There absolute is. &amp;nbsp;The major problems are that they tend to be tough for beginners to grok, and that they are scattered all over the place. &amp;nbsp;Having a central location from which&amp;nbsp;&lt;em&gt;to start&lt;/em&gt;&amp;nbsp;is, itself, valuable.&lt;/p&gt;
&lt;p&gt;
	Where existing materials exist and are relevant, we will link to them from the "Further Reading" bar on the side of each lesson. &amp;nbsp;If you have a blog post or tutorial that is not mentioned and that you think is relevant to a topic, please &lt;a href="mailto:SemanticUniversity@cambridgesemantics.com"&gt;let us know&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
	One More Thing&lt;/h2&gt;
&lt;p&gt;
	Lastly, while we are a company that sells a software product based on Semantic Web technology, this space is not a place where we're going to be advertising Anzo. We strongly believe that a rising tide will raise all ships in this community. Learning material associated with our own product line will remain separated in our own online documentation and forums.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/cxRJk7_tVz8" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-03-13T13:05:28Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/introducing-semantic-university</feedburner:origLink></entry>
  <entry>
    <title>Local vs. Global Semantics</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/Ps4Vh5qWkMM/local-vs-global-semantics" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/local-vs-global-semantics</id>
    <updated>2012-03-12T14:17:22Z</updated>
    <published>2012-03-12T14:10:50Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img height="194" src="http://www.cambridgesemantics.com/documents/16985/17287/local-vs-global.jpeg" style="width:259px; height:194px; margin:0 0 20px 20px; float:right;" width="259" /&gt; One of the big stories around the Microsoft’s SQL Server 2012 release is the inclusion of its new Business Intelligence Semantic Model (BISM).&amp;nbsp; This isn’t anything groundbreaking in and of itself, as traditional BI tools such as Business Objects have incorporated a “Semantic Model” of sorts for years.&amp;nbsp; For the record, these BI “semantic models” are not based on Semantic Web technologies, but that’s not what’s interesting to me.&amp;nbsp; What’s interesting is that it got me thinking about the &lt;em&gt;Continuum of Locality of Semantics&lt;/em&gt; in general.&lt;/p&gt;
&lt;p&gt;
	The Continuum ranges from Local to Global semantics.&amp;nbsp; Where you are on the continuum dramatically impacts your ability to communicate with other systems without requiring a translation layer.&lt;/p&gt;
&lt;p&gt;
	On one end of the Continuum, the semantic layer is extremely localized and really only exists to support a single application or set of data.&amp;nbsp; BISM is an example of this kind of local semantics: it’s meant to allow a multi-dimensional BI model to co-exist &lt;em&gt;in a single server&lt;/em&gt; with the standard tabular data one expects in a relational database.&lt;/p&gt;
&lt;p&gt;
	On the other end of the Continuum are semantic standards that are national or international, and often mandated.&amp;nbsp; VCard is one example of this.&amp;nbsp; Even if you don’t like some aspect of VCard, it’s not like you would choose any another contact format to accomplish something similar; that’s foolish.&amp;nbsp; VCard is everywhere and has won.&lt;/p&gt;
&lt;p&gt;
	In the middle are things like MDM systems, which are closer to the global semantics than local in spirit in that they attempt to define one set of semantics for an entire ecosystem.&amp;nbsp; MDM systems are, in a way, top-down and authoritarian in their demands, and they somewhat naively assume that such mandates can be maintained over time.&amp;nbsp; Businesses, partnerships, contracts, and competition move faster than IT can keep up with an MDM system that is used by multiple departments, and at some point it becomes dated, and so users produce data that doesn’t conform.&lt;/p&gt;
&lt;p&gt;
	My observation here is that most technologies are built specifically to perform well at one specific location on the continuum.&amp;nbsp; MDM doesn’t make much sense at the very local end or at a global level beyond an organization, just as BISM doesn’t make any sense outside of a single SQL Server instance.&lt;/p&gt;
&lt;p&gt;
	This all brings me to my realization: &lt;em&gt;Semantic Web technologies can play all along the continuum!&lt;/em&gt;&amp;nbsp; By their nature, RDF-based applications can be both top-down &lt;em&gt;and&lt;/em&gt; bottom-up at the same time.&lt;/p&gt;
&lt;p&gt;
	For example, &lt;a href="http://labs.mondeca.com/dataset/lov/"&gt;Linked Open Vocabularies (LOV)&lt;/a&gt;, Schema.org, or the Facebook Open Graph vocabularies are very much global in scope, and top-down in nature.&amp;nbsp; However—and this is the cool part—if I have a local CRM system built on Semantic Web technologies (which &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/our-own-dog-food-tastes-pretty-darn-good"&gt;we do&lt;/a&gt;) I can choose to reuse LOV concepts, or simply describe how my data relates to them.&amp;nbsp; I’m not mandated to follow a set of rules, but I am aiding in data interchange by following locally those that make sense.&lt;/p&gt;
&lt;p&gt;
	It gets even better.&amp;nbsp; A Semantic Web application that exists at one point in the continuum can very easily talk to applications that are either more global or more local with much cheaper translations than are required between, say, an MDM implementation and SQL Server.&amp;nbsp; For example, SPARQL is perfect for this sort of lightweight, fast translation that’s required between two systems built by two people in two contexts but that deal with &lt;em&gt;similar-enough&lt;/em&gt; data.&lt;/p&gt;
&lt;p&gt;
	So the Continuum of Locality of Semantics is real, but is easily traversable by Semantic Web technologies and not by traditional “semantic” models.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/Ps4Vh5qWkMM" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-03-12T14:10:50Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/local-vs-global-semantics</feedburner:origLink></entry>
  <entry>
    <title>The New Era of Just-in-Time Compliance</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/Gpa2HUztiak/the-new-era-of-just-in-time-compliance" />
    <author>
      <name>Alok Prasad</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/the-new-era-of-just-in-time-compliance</id>
    <updated>2012-03-06T22:18:40Z</updated>
    <published>2012-03-06T20:47:03Z</published>
    <summary type="html">&lt;p style="margin-left:.5in;"&gt;
	&lt;strong&gt;Federal Reserve Chairman Ben Bernanke:&lt;/strong&gt;&lt;em&gt;&amp;nbsp; “We have asked the banks to essentially do stress tests and ask, looking at all their positions, all their hedges, what would be the effect on their capital be if Greece defaulted?”&lt;/em&gt;&lt;/p&gt;
&lt;p style="margin-left:.5in;"&gt;
	&lt;strong&gt;US FDA:&lt;/strong&gt;&lt;em&gt; All drug companies have to address “safety changes in labeling for some cholesterol –lowering drugs”. &lt;/em&gt;(One of many growing compliance requirements...)&lt;/p&gt;
&lt;p style="margin-left:.5in;"&gt;
	&lt;strong&gt;US Health &amp;amp; Human Services:&lt;/strong&gt; &lt;em&gt;"We have heard from many in the provider community who have concerns about the administrative burdens they face in the years ahead” &lt;/em&gt;(...as it relates to complying with the implementation of the new ICD-10 standard and other compliance rules...)&lt;/p&gt;
&lt;h2&gt;
	The Need for Just-in-Time Compliance&lt;/h2&gt;
&lt;p&gt;
	&lt;img alt="the compliance information lifecycle" src="http://www.cambridgesemantics.com/documents/16985/1b494167-b72c-4715-918c-99a9bfb4c51c" style="width: 350px; height: 257px; float:right; padding:0 0 2em 2em;" /&gt; I recently met with Richard Soley, CEO of OMG, one of the world’s leading standards setting groups. Richard mentioned, “This is the year for semantic technology.” I was pleased to hear that but asked, “Why?”&lt;/p&gt;
&lt;p&gt;
	Richard felt that companies in the financial services, healthcare, and government all are looking at semantic technology to help them rapidly search, aggregate and manage data from varied sources. He felt that compliance is a key area where Semantic technology is being used. One data point: banks are increasingly creating the &lt;strong&gt;Chief Data Officer&lt;/strong&gt; role, often reporting directly to the CEO. A key job requirement for these CDOs is to answer such questions such as Chairman Bernanke’s Greece exposure question. All these officers (or at least the ones we have talked with) feel that semantic technology is the technology to address global risk and event impact questions.&lt;/p&gt;
&lt;p style="margin-left: 40px; "&gt;
	Note: You can hear directly from these Chief Data Officers and other senior business executives from several major financial institutions at OMG’s March 13&lt;sup&gt;th&lt;/sup&gt; &lt;a href="http://www.omg.org/news/meetings/FS-CONF/index.htm"&gt;Financial Services Semantics Conference&lt;/a&gt;. &amp;nbsp;At the conference, business leaders will discuss how semantic technology can help with governance, risk, compliance management and other issues. (See Lee Feigenbaum’s &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/semantics-in-financial-services-on-march-13-in-nyc"&gt;post from March 2&lt;sup&gt;nd&lt;/sup&gt;&lt;/a&gt; for more details, including a discount registration code).&lt;/p&gt;
&lt;p&gt;
	What is the issue? When regulatory agencies issue new or updated rules, it sets in motion a complex chain of activities and custom development projects for affected organizations. These are almost entirely manual activities, and so they are costly, time-consuming, and error-prone. While there is no shortage of traditional off-the-shelf compliance software packages, none of them allow for easy integration and management of changing environments.&amp;nbsp; Current available compliance software tend to be inflexible: they are not meant for responding to unanticipated regulator requests (at least that is what one of the major money centered banks told us). The result is excessive regulatory management expense, incomplete data, inefficient processes and periodic fire-fighting exercises.&lt;/p&gt;
&lt;h2&gt;
	What’s the Solution?&lt;/h2&gt;
&lt;p&gt;
	As we’ve seen, just-in-time compliance requires a great deal of flexibility.&amp;nbsp; We’ve found that Semantic Web technologies provide the necessary flexibility for this kind of problem.&lt;/p&gt;
&lt;p&gt;
	Semantics allows end users to create their own data model to access, aggregate and analyze data from different databases and text sources. If the user needs to collaborate with someone else, then they can easily map their data models to a new jointly agreed data model and all people can share and use the data.&lt;/p&gt;
&lt;p&gt;
	In the compliance realm, this means that different business units can have different models for what constitutes a policy, a detective control, or a preventative control, yet &lt;em&gt;these different models can still all map back up to corporate compliance’s notion of a control&lt;/em&gt; (&amp;amp; so of the compliance results that come from that control). Similarly, the flexibility of the semantic model means that compliance teams can start to link together information on rules, policies, controls, tests, compliance, geographies, business units and more to facilitate exploratory analyses to more quickly detect potential compliance issues.&lt;/p&gt;
&lt;p&gt;
	Net Result: Compliance management in the world of ever-changing regulation does not look so scary.&lt;/p&gt;
&lt;h2&gt;
	What Next?&lt;/h2&gt;
&lt;p&gt;
	To get a great, deep dive into how semantic technology is being used in financial services, check out&amp;nbsp;OMG’s March 13&lt;sup&gt;th&lt;/sup&gt;&amp;nbsp;&lt;a href="http://www.omg.org/news/meetings/FS-CONF/index.htm"&gt;Financial Services Semantics Conference&lt;/a&gt;. (Make sure to use discount code FSSC for a registration discount.) &amp;nbsp;I'll be around to continue this conversation in person.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/Gpa2HUztiak" height="1" width="1"/&gt;</summary>
    <dc:creator>Alok Prasad</dc:creator>
    <dc:date>2012-03-06T20:47:03Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/the-new-era-of-just-in-time-compliance</feedburner:origLink></entry>
  <entry>
    <title>Semantics in Financial Services on March 13 in NYC</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/a6jVce1RGhw/semantics-in-financial-services-on-march-13-in-nyc" />
    <author>
      <name>Lee Feigenbaum</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/semantics-in-financial-services-on-march-13-in-nyc</id>
    <updated>2012-03-02T15:57:54Z</updated>
    <published>2012-03-02T15:33:47Z</published>
    <summary type="html">&lt;p&gt;
	&lt;a href="http://www.omg.org/"&gt;OMG &lt;/a&gt;is hosting &lt;a href="http://www.omg.org/news/meetings/FS-CONF/index.htm"&gt;a one-day conference on semantics in financial services&lt;/a&gt; on March 13. We'll be participating to share some of our thoughts and experiences about why semantic technologies (in general) and Semantic Web technologies (specifically) are well-suited to address many of the challenges facing the financial services industry.&lt;/p&gt;
&lt;p&gt;
	We've seen a marked uptick in interest in using semantics to tackle data challenges in financial companies in the past 12 months, and this conference is another data point in the area. Speakers at the event are from Citi, Bank of America, Wells Fargo, and HSBC, along with the US Treasury and the US Department of Defense. The topics we'll be discussing at the conference are fundamental to the industry and include:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		The increasing role of Chief Data Officers and their need for semantics&lt;/li&gt;
	&lt;li&gt;
		The interplay between semantics and Big Data (a la the thoughts expressed &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/big-data-or-right-data-"&gt;here&lt;/a&gt; and &lt;a href="http://semanticweb.com/two-kinds-of-big-dat_b21925"&gt;here&lt;/a&gt;)&lt;/li&gt;
	&lt;li&gt;
		The use of semantic models to enable business interoperability between multiple financial services organizations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
	The conference will also focus on use cases and case studies for semantics in financial services. We'll be talking about using semantics for compliance information management and competitive/customer intelligence (more on those topics in future posts), and there will also be talks about trade lifecycle management, vocabulary management, &lt;a href="http://xbrl.us/Pages/default.aspx"&gt;business reporting&lt;/a&gt;,&lt;/p&gt;
&lt;p&gt;
	If the topic interests you and you'll be in the NYC area in a couple of weeks, consider &lt;a href="http://www.omg.org/news/meetings/FS-CONF/registration.htm"&gt;registering&lt;/a&gt;. You can use promotion code &lt;span class="abstract"&gt;FSSC for a 30% discount when registering.&lt;/span&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/a6jVce1RGhw" height="1" width="1"/&gt;</summary>
    <dc:creator>Lee Feigenbaum</dc:creator>
    <dc:date>2012-03-02T15:33:47Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/semantics-in-financial-services-on-march-13-in-nyc</feedburner:origLink></entry>
  <entry>
    <title>WhySQL?  Evernote and Boring Old Reliable Architecture</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/h2DO8nn-h7g/whysql-evernote-and-boring-old-reliable-architecture" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/whysql-evernote-and-boring-old-reliable-architecture</id>
    <updated>2012-07-26T23:26:49Z</updated>
    <published>2012-02-27T15:27:51Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img alt="" src="http://www.cambridgesemantics.com/documents/16985/a42dbdee-084b-4913-b717-d0b9bc975493" style="width: 400px; height: 300px; float: right; padding:0 0 2em 2em;" /&gt;In my &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/what-happened-to-nosql-for-the-enterprise-"&gt;last blog post&lt;/a&gt; I argued that Semantic Web databases have the flexibility inherent in NoSQL systems plus the transactional semantics of a relational database systems, and I argued this was a major reason for their growing adoption by enterprises.&lt;/p&gt;
&lt;p&gt;
	Hours later, Evernote’s blog had &lt;a href="http://blog.evernote.com/tech/2012/02/23/whysql/"&gt;a post called “WhySQL?”&lt;/a&gt; outlining why they &lt;em&gt;didn’t&lt;/em&gt; go with a NoSQL system.&lt;/p&gt;
&lt;p&gt;
	For those of you who don’t know, Evernote is a hot web startup that’s been growing like crazy with 8-figure annual revenue in 2011.&amp;nbsp; However, unlike (seemingly) &lt;em&gt;every other web startup in the world&lt;/em&gt; they are using a boring old SQL relational database (in their particular case it was MySQL&amp;nbsp;&lt;strike&gt;PostgreSQL&lt;/strike&gt;).&amp;nbsp; Not only that, but they’re not even using cloud hosting!&amp;nbsp; What is this, 1999, you ask?&lt;/p&gt;
&lt;p&gt;
	I thought it was a fascinating case of a high traffic web company going with traditional technology for exactly the reason I called out in my last piece: relational databases are &lt;em&gt;reliable&lt;/em&gt; and &lt;em&gt;predictable&lt;/em&gt;.&amp;nbsp; You know what you get with their transactional guarantees.&amp;nbsp; In the article, Dave says (emphasis mine):&lt;/p&gt;
&lt;p style="margin-left:.5in;"&gt;
	Each of these coarse-grained API calls is implemented through single SQL transaction, which &lt;strong&gt;ensures that a client can completely trust any reply given by the server&lt;/strong&gt;. The ACID-compliant database ensures…[&lt;strong&gt;Atomicity&lt;/strong&gt;, &lt;strong&gt;Consistency&lt;/strong&gt;, &lt;strong&gt;Durability&lt;/strong&gt;]…&lt;/p&gt;
&lt;p&gt;
	It gets even more interesting.&amp;nbsp; Evernote holds tons of data, much of it multimedia.&amp;nbsp; The data is not very structured.&amp;nbsp; It serves (what I would guess) is a large amount of web traffic.&amp;nbsp; Wouldn’t this be a perfect place to employ a hot, whizbang NoSQL database that offers greater performance?&lt;/p&gt;
&lt;p&gt;
	In fact, Dave even outlines &lt;em&gt;why&lt;/em&gt; NoSQL databases have such a great appeal:&lt;/p&gt;
&lt;p style="margin-left:.5in;"&gt;
	The ACID benefits of a transactional database make it very hard to scale out a data set beyond the confines of a single server. Database clustering and multi-master replication are scary dark arts, and key-value data stores provide a much simpler approach to scale a single storage pool out across commodity boxes.&lt;/p&gt;
&lt;p&gt;
	Right before saying that they can &lt;em&gt;avoid it altogether&lt;/em&gt; by using a clever partitioning scheme (emphasis mine):&lt;/p&gt;
&lt;p style="margin-left:.5in;"&gt;
	Fortunately, this is a problem that Evernote doesn’t currently need to solve. Even though we have nearly a billion Notes and almost 2 billion Resource files within our servers, these aren’t actually a single big data set. &amp;nbsp;&lt;strong&gt;They’re cleanly partitioned into 20 million separate data sets, one per user&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;
	That is, Dave &amp;amp; Evernote would rather deal with managing &lt;em&gt;20 million separate data sets&lt;/em&gt; than go with, what he admits, is a “much simpler approach to scale” (the NoSQL way) because SQL systems have a stronger transactional guarantee.&lt;/p&gt;
&lt;p&gt;
	This is &lt;em&gt;exactly&lt;/em&gt; what I was getting at in the &lt;a href="http://www.cambridgesemantics.com/blog/-/blogs/what-happened-to-nosql-for-the-enterprise-"&gt;previous post&lt;/a&gt;.&amp;nbsp; The advantages of NoSQL systems to date are wonderful, but the lack of strong transactional guarantees make them impossible for enterprises to use for storing mission critical information.&lt;/p&gt;
&lt;p&gt;
	&lt;em&gt;Full disclosure: I’m a huge fan of Evernote, and am an Evernote Premium subscriber.&lt;/em&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/h2DO8nn-h7g" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-02-27T15:27:51Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/whysql-evernote-and-boring-old-reliable-architecture</feedburner:origLink></entry>
  <entry>
    <title>What Happened to NoSQL for the Enterprise?</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/YCMnl0FJlmo/what-happened-to-nosql-for-the-enterprise-" />
    <author>
      <name>Rob Gonzalez</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/what-happened-to-nosql-for-the-enterprise-</id>
    <updated>2012-02-27T09:10:29Z</updated>
    <published>2012-02-24T15:28:51Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img alt="" src="http://www.cambridgesemantics.com/documents/16985/29d588b6-47d6-43d9-8663-a3b0b3aeefc8" style="width: 400px; height: 208px; float: right;" /&gt;It’s no big secret that people have found better substitutes for the traditional relational (SQL) database for all kinds of use cases.&amp;nbsp; My absolute favorite public example of this—just based on number of technologies involved—is &lt;a href="http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html"&gt;Instagram’s infrastructure&lt;/a&gt;, which uses PostgreSQL on the backend, but also employs Redis on the front-end.&lt;/p&gt;
&lt;p&gt;
	Anyone who’s trying to build a scalable website today relies heavily on various NoSQL databases, such as &lt;a href="http://www.mongodb.org/"&gt;MongoDB&lt;/a&gt;, &lt;a href="http://redis.com/"&gt;Redis&lt;/a&gt;, &lt;a href="http://basho.com/"&gt;Riak&lt;/a&gt;, &lt;a href="http://cassandra.apache.org/"&gt;Cassandra&lt;/a&gt;, and &lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;Amazon’s Dynamo&lt;/a&gt;, to name just a few of the most popular ones.&lt;/p&gt;
&lt;p&gt;
	However, enterprise penetration has been limited.&amp;nbsp; I want to talk about that.&lt;/p&gt;
&lt;h2&gt;
	The Benefits of NoSQL: Scale &amp;amp; Performance&lt;/h2&gt;
&lt;p&gt;
	The reason that the usage of NoSQL databases has exploded on the web is that they execute some operations &lt;em&gt;blazingly fast&lt;/em&gt;, such as atomic document lookups without joins.&amp;nbsp; Furthermore, many NoSQL databases attack the Big Data problem head-on by coming with out-of-the-box support for distributing a database across a cluster of machines (which is very tricky to accomplish with traditional relational databases).&lt;/p&gt;
&lt;p&gt;
	For example, a website like Pinterest (yes, the obligatory Pinterest mention since it seems illegal not to mention it these days), serving 10,000,000 visits a day, with a catalog of data growing exponentially, simply cannot be successful on a traditional, relational back-end.&amp;nbsp; They need layers of caching and persistence to ensure a reliably interactive user experience.&lt;/p&gt;
&lt;p&gt;
	This kind of scale is very different than what you experience in the enterprise, where you get fewer users with different speed expectations.&lt;/p&gt;
&lt;h2&gt;
	The Benefits of NoSQL: Flexibility&lt;/h2&gt;
&lt;p&gt;
	Aside from performance &amp;amp; scalability, the other major advantage of NoSQL systems is data flexibility.&lt;/p&gt;
&lt;p&gt;
	SQL systems require that you create a schema before doing anything else.&amp;nbsp; Want to build an application?&amp;nbsp; First, build your model.&amp;nbsp; Then start coding.&amp;nbsp; Need to change your model?&amp;nbsp; Good luck, since you have to change every single thing that ever might have depended on your first model.&lt;/p&gt;
&lt;p&gt;
	NoSQL systems turn this on its head.&amp;nbsp; When working with MongoDB, for example, you can start coding your app, storing things in the database as you learn you need to.&lt;/p&gt;
&lt;p&gt;
	Lots of changes are &lt;em&gt;so much easier&lt;/em&gt;.&amp;nbsp; If you need a property to be multi-assigned, just do it!&amp;nbsp; You don’t have to worry about creating entire link tables and adding joins and redoing your business logic all over the place just to make this change work.&lt;/p&gt;
&lt;h2&gt;
	No Enterprise for NoSQL&lt;/h2&gt;
&lt;p&gt;
	Despite these advantages, enterprise penetration of NoSQL databases has been pretty limited to date.&amp;nbsp; Some technical reasons include:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		Poor support for ACID transactions.&lt;/li&gt;
	&lt;li&gt;
		Loose guarantees of data consistency across a grid.&lt;/li&gt;
	&lt;li&gt;
		Limited support for aggregations.&lt;/li&gt;
	&lt;li&gt;
		Limited support for joins.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
	Basically, many of the things that are needed to maintain consistency of mission critical data do not hold true for the NoSQL databases.&lt;/p&gt;
&lt;p&gt;
	Said another way, if you’re in IT and you’re trying to build a system in support of a mission-critical application, you have been trained to rely on these types of guarantees.&amp;nbsp; It’s mentally unsettling to think about different, softer software guarantees such as “eventually consistency” of some NoSQL databases.&lt;/p&gt;
&lt;p&gt;
	So you have to think about when you &lt;em&gt;don’t&lt;/em&gt; need the rock solid ACID transactions of the relational world.&amp;nbsp; And without the pressure of user scale as on the web there is much less motivation to actually go through this exercise.&lt;/p&gt;
&lt;p&gt;
	This leaves enterprises going with newer SQL technologies like Vertica or Attivio that scale very well and are less confusing than NoSQL systems.&amp;nbsp; Or, if they’re really adventurous, using Hadoop for a specific Big Data problem.&lt;/p&gt;
&lt;h2&gt;
	Semantic Web Databases: Flexible NoSQL for the Enterprise&lt;/h2&gt;
&lt;p&gt;
	One kind of NoSQL system that has been seeing penetration in the enterprise is Semantic Web databases.&amp;nbsp; They don’t offer the same kind of performance and scale that the web-based NoSQL variety does, but instead provide much more flexibility than traditional relational systems while maintaining security &amp;amp; transactional integrity.&lt;/p&gt;
&lt;p&gt;
	Getting back to our IT guy analogy.&amp;nbsp; If you’re deciding between a relational database and a Semantic Web database, it no longer has to be about rock solid data integrity and transactional guarantees, because both systems provide them.&amp;nbsp; It becomes about flexibility and tooling compatibility, which are easier things to wrap your head around.&lt;/p&gt;
&lt;p&gt;
	For example, if you’re dealing with lots of unstructured data, then use a Semantic Web database.&amp;nbsp; If you’re dealing with a schema that you expect to change over time (to incorporate new information types or sources), then use a Semantic Web database.&lt;/p&gt;
&lt;p&gt;
	Thus you can start having a reasonable conversation when you limit the number of differences between the different style systems, instead of being overwhelmed by a class of systems that is somewhat alien.&amp;nbsp; It’s easier to compare Semantic Web databases to relational databases than it is to compare something like Riak to a relational database. &amp;nbsp;&amp;nbsp;I believe this is one reason why Semantic Web databases have made progress in the enterprise where other NoSQL technologies have not.&lt;/p&gt;
&lt;h2&gt;
	Polyglot Persistence&lt;/h2&gt;
&lt;p&gt;
	So what it comes down to is that for decades we’ve had one standard way to store and query important data, and today there are new choices.&amp;nbsp; As with any choice, there are tradeoffs, and for some applications NoSQL databases, including Semantic Web databases, can enable organizations to get more done in less time and with less hardware than relational databases.&amp;nbsp; The trick is to know when and how to deploy these new tools.&lt;/p&gt;
&lt;p&gt;
	Martin Fowler called this &lt;a href="http://martinfowler.com/bliki/PolyglotPersistence.html"&gt;Polyglot Persistence&lt;/a&gt;&amp;nbsp;(also the source for this post's image), which I think describes the future fantastically. Our job is made both easier and more difficult by the new world of database technology choices available to us.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/YCMnl0FJlmo" height="1" width="1"/&gt;</summary>
    <dc:creator>Rob Gonzalez</dc:creator>
    <dc:date>2012-02-24T15:28:51Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/what-happened-to-nosql-for-the-enterprise-</feedburner:origLink></entry>
  <entry>
    <title>Best of Both Worlds: Enterprise Data Management &amp; Familiar Tools</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/n1NAvRP_Q7M/best-of-both-worlds-enterprise-data-management-familiar-tools" />
    <author>
      <name>Lee Feigenbaum</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/best-of-both-worlds-enterprise-data-management-familiar-tools</id>
    <updated>2012-02-20T15:16:25Z</updated>
    <published>2012-02-17T09:42:40Z</published>
    <summary type="html">&lt;p&gt;
	&lt;img alt="" src="http://www.cambridgesemantics.com/documents/16985/3255960a-e859-4f2d-9290-55190ac80378" style="width: 200px; height: 245px; float: right;" /&gt;In a recent post at &lt;a href="http://www.dataversity.net/"&gt;Dataversity&lt;/a&gt;, &lt;a href="http://www.dataversity.net/contributors/javed-zaidi"&gt;Jay Zaidi&lt;/a&gt; asked &lt;a href="http://www.dataversity.net/archives/8063"&gt;&lt;em&gt;Should Users Switch from Office Productivity Tools to a Commercial Data Quality Tool? &lt;/em&gt;&lt;/a&gt;. Jay writes in his intro:&lt;/p&gt;
&lt;blockquote&gt;
	Due to the proliferation of software tools within companies, end users tend to cringe when there is talk of a new tool that they should use for data analysis, data validation and data transformation. They are very comfortable using office productivity tools or other data processing tools that are at their disposal, so the initial reaction to change is negative.&amp;nbsp; Such tools are relatively easy to use and are familiar to them, since most users have had years of hands-on experience with them.
	&lt;p&gt;
		&amp;nbsp;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
	To paraphrase, why should I learn a new, unfamiliar tool when I’ve gotten by OK with Excel all these years? After all, as my colleague Rob is fond of saying, solving a problem with Excel requires only two steps:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;
		Open Excel&lt;/li&gt;
	&lt;li&gt;
		Type stuff in&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
	&amp;nbsp;And &lt;em&gt;the stuff &lt;/em&gt;can be anything you want it to be. It doesn’t need to fit into an existing data model or conform to a schema or stay the same tomorrow as it was today. It can just be whatever you need to do your job. That’s why Excel ends up holding all sorts of business-critical data: it gets the job done without requiring any thinking about what tool, what web page, or what database you need to use.&lt;/p&gt;
&lt;p&gt;
	Jay then does a good job of outlining ten reasons to use a dedicated tool for data analysis, quality, and validation. Broadly speaking, Jay promotes:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;
		The ability to work with data from&lt;em&gt; heterogeneous formats&lt;/em&gt;, whether its tabular data, XML data, or relational data.&lt;/li&gt;
	&lt;li&gt;
		&lt;em&gt;Out-of-the-box capabilities &lt;/em&gt;of dedicated tools for analytics, transformation, reporting, and data reconciliation. These are capabilities that come “for free” from nearly any information management or BI tool that you get only to a limited extent from Excel for a single spreadsheet and have to work pretty hard to get in Excel once your data starts coming from multiple sources.&lt;/li&gt;
	&lt;li&gt;
		&lt;em&gt;The benefits of an underlying repository, &lt;/em&gt;including the ability to perform historical analysis, to store data quality &amp;amp; data validation rules outside of code, performance and scalability, governance and automation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
	These benefits are all very real and very valuable. But on their own they don’t tell the whole story; on their own, they leave us in an unappealing position of evaluating trade-offs. Do we continue to use Excel as we see fit while muddling through our data integration and validation needs and foregoing historical analysis, data governance, and more sophisticated analysis? Do we switch to a dedicated tool to realize all the benefits that Jay speaks of but forgo the flexibility of using Excel and other office productivity tools to simply and quickly collect and update data by just typing stuff in? Do we try to bridge the two worlds and copy data between Excel and a dedicated data management tool, or constrain ourselves to only certain spreadsheet layouts, or treat spreadsheets as simple data extracts rather than as dynamic, interactive, living documents?&lt;/p&gt;
&lt;p&gt;
	In the face of these tradeoffs, many people &lt;em&gt;still &lt;/em&gt;choose to use Excel. This gives rise to &lt;a href="http://en.wikipedia.org/wiki/Shadow_system"&gt;&lt;em&gt;shadow data&lt;/em&gt;&lt;/a&gt; environments, in which key business information exists outside of any governed system. Shadow data is extremely useful for its owner, but it can’t easily be shared with colleagues or reused for other purposes, and it’s also a huge risk to the business as it’s not backed up, secured, validated, or harmonized with other sources of data.&lt;/p&gt;
&lt;p&gt;
	At Cambridge Semantics, we think you can have your cake and eat it too. This shadow data problem was one of the original motivating factors when we built Anzo. It’s the reason that we leveraged the flexibility of Semantic Web technologies to build the &lt;a href="http://www.cambridgesemantics.com/products/anzo-express"&gt;Anzo for Microsoft Excel&lt;/a&gt; plug-in. This plug-in lets you continue to use Excel and all of its flexibility to collect and share data, while still leveraging the rest of the Anzo software suite to derive all of the data management, integration and analysis benefits that Jay points out.&lt;/p&gt;
&lt;p&gt;
	Without using Semantic Web technologies as the foundation of Anzo, I suspect we would have had a great deal of trouble flexibly accommodating people’s normal spreadsheet usage/behavior (that goes way beyond simple, static tabular CSV-style spreadsheets). We wouldn’t have been able to handle changing and evolving data models while minimizing any data preparation needed to integrate additional kinds of data. In short, Semantic Web technologies are the secret sauce that has allowed us to build an information management platform suite of tools that brings users the benefits of dedicated data analysis tools without losing the ability to use familiar &amp;amp; efficient office productivity tools. (Of course, this secret sauce brings other capabilities to the picture, like the ability to operationalize insights and to handle unstructured text documents; but those are topics for another day…)&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/n1NAvRP_Q7M" height="1" width="1"/&gt;</summary>
    <dc:creator>Lee Feigenbaum</dc:creator>
    <dc:date>2012-02-17T09:42:40Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/best-of-both-worlds-enterprise-data-management-familiar-tools</feedburner:origLink></entry>
  <entry>
    <title>It All Starts with the Data Model</title>
    <link rel="alternate" href="http://feedproxy.google.com/~r/EnterpriseSemantics/~3/t0b2DLKtD9Y/it-all-starts-with-the-data-model" />
    <author>
      <name>Jeff Stamen</name>
    </author>
    <id>http://www.cambridgesemantics.com/blog/-/blogs/it-all-starts-with-the-data-model</id>
    <updated>2012-07-26T23:28:05Z</updated>
    <published>2012-02-06T09:12:22Z</published>
    <summary type="html">&lt;p&gt;
	A data model is a way of representing data along with the logical operations that can be performed on that data. It is the foundation on which everything else is built in enterprise applications because at their core they store, access and manipulate data. The data model you choose can make things easy or very difficult. It can add complexity or not. It can require lots of expert setup and maintenance or very little.&lt;/p&gt;
&lt;p&gt;
	Let’s talk about two extremely common data models—that of spreadsheets and that of relational databases. I will later introduce a newer data model—that of RDF, which combines the best of both.&lt;/p&gt;
&lt;h2&gt;
	Spreadsheet and Relational Data Models&lt;/h2&gt;
&lt;p&gt;
	The spreadsheet has a very simple data model. A single spreadsheet is broken into discrete cells whose values are referenced by a Row and Column index. That’s it! Yes, most of the time the Column has a heading telling you what kind of data it contains, but that is just a suggestion and is not enforced. What can you put into each cell? Anything! In the worst case you end up with a pesky error, but you can always go back and fix it, that is if you are aware of it. Who can enter data into each cell? Anyone! Sure, there are some security measures that you can use, but they are easily circumvented.&lt;/p&gt;
&lt;p&gt;
	So the spreadsheet data model is completely flexible (the good news), but not very manageable because the association of what each cell represents and how it’s to be used is left entirely to the user…and is mostly in the user's head (the bad news). Said another way, you can't be sure of what the data represents and if it is accurate.&lt;/p&gt;
&lt;p&gt;
	The relational data model, in contrast, is very manageable. In fact, that’s its most highly regarded trait! A relational database is composed of a series of pre-defined grids which have very strict rules regarding what is allowed in each cell, under what circumstances it might be modified, and how it might be combined with data residing elsewhere in the database. The key feature of the set of grids in the relational database is that there is a formal logic (relational algebra) that can operate on the columns and rows of the grids. This means that they can be managed with the certainty of understanding of what data is being dealt with at all times. This is the key to data integrity.&lt;/p&gt;
&lt;p&gt;
	The drawback of this enforced structure is that it is very difficult to make even simple structural changes to a relational database. Any change requires experienced database administrators and, worse, might require changes to applications that rely on the database because a lot of the meaning of the data (what the rows represent, for example) is built into each of the applications. Said another way: although the relational database strictly enforces what is stored and how it can be operated on, it has very limited knowledge of what it means or how it is used. That is left to the applications, and gets very complicated as multiple applications share the same data.&lt;/p&gt;
&lt;h2&gt;
	A New Data Model in Town&lt;/h2&gt;
&lt;p&gt;
	There is a new data model called RDF—the data model of the Semantic Web—which combines the best of both worlds: the flexibility of a spreadsheet and the manageability and data integrity of a relational database. Based on standards set by the World Wide Web Consortium (W3C) to enable data combination on the Web, RDF defines each data cell by the entity it applies to (row) and the attribute it represents (column). Each cell is self-describing and not locked into a grid, in other words the data doesn't have to be "regular". Further, it has formal operations that can be performed on it, much like relational algebra, but clearly at a more atomic level. Unlike a relational database, each cell knows what entity it is describing at all times and through all data operations and it knows about the relationship each entity has to other entities, which in the relational data model have to be done by join statements at run-time.&lt;/p&gt;
&lt;p&gt;
	The RDF data model is also called the Semantic Data Model since the word semantic means "meaning" and the meaning of each cell is attached to it. Each cell can be associated without any application logic to any other data cell representing the same entity, or more dramatically, any data cell or cells representing entities to which that entity is related. For example, it would know automatically that information about a company applies to each of its employees and vice versa. Applications can take advantage of this, without building in the association logic themselves; rather RDF will navigate through the data for them. And since it doesn't rely on data being "regular", new types of data of any shape or form can be added on the fly and quite easily.&lt;/p&gt;
&lt;p&gt;
	So how important is the data model that is used? How important is the foundation for a building?&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/EnterpriseSemantics/~4/t0b2DLKtD9Y" height="1" width="1"/&gt;</summary>
    <dc:creator>Jeff Stamen</dc:creator>
    <dc:date>2012-02-06T09:12:22Z</dc:date>
  <feedburner:origLink>http://www.cambridgesemantics.com/blog/-/blogs/it-all-starts-with-the-data-model</feedburner:origLink></entry>
</feed>
