Semantic Focus, Semantic Web Blog and Community

Semantic Focus, Semantic Web Blog and Community On the Semantic Web, Semantic Web technology and computational semantics http://www.semanticfocus.com/ What is the Semantic Web? David Siegel Tue, 14 Dec 10 13:27:30 +0000 There's a lot of confusion about what the semantic web is, exactly. There are so many definitions that I can't possibly unify everything in one article. Some people say it's all about linked data, RDF, and ontologies. Some people call it "Web 3.0." (I recently gave a keynote speech at a "Web 3.0" conference, where many of the people had confused "Web 3.0" with "Web in 3D." I'm sure many people in the audience wondered why I wasn't talking about the future 3D web and, instead, was talking about information.) Some people say it will lead us to the singularity. Rather than try to define these terms, I propose we abandon them. I propose we stop talking about complicated solutions and start talking about problems. <ul> <li>One of the most vexing aspects of today's web is that we have to do so much of the work ourselves. We type in keywords, then we see a list of web sites that have those keywords on them. In this way, it can take hours to get answers to our questions.</li> <li>When searching for products and services, we can't make apples-to-apples comparisons. Think about it - how often do you make apples-to-apples comparisons of products side by side? Can you do that with pocket cameras, large-screen TVs, mobile phones, tampons, mortgages, or all-in-one vacations? In general, you can't. Go to <a href="http://www.google.com/squared" title="Google Squared">Google Squared</a> or <a href="http://code.google.com/p/google-refine/" title="Google Refine">Google Refine</a>, and you can see how hard it is, because the data isn't easy to compare. So any comparison engine either has to do a lot of work by hand or do a lot of guessing, and neither of those solutions scales to meet the challenges of today.</li> <li>Another problem is that it's difficult to find things, especially something specific that isn't everywhere available. Have you ever searched for hours, thinking up new keyword combinations, trying to find a part or a piece of furniture or an item of clothing that you know is out there, somewhere, but you can't seem to dial it in? Studies show that about 1/3 of searches result in no clicks and another 1/3 result in clicks that are soon abandoned. That means 2/3 of all searches fail to produce a satisfactory result.</li> <li>Another problem is that we are used to all the false positives and false negatives. We sort and sift data by hand, because so many things have the same name but are different, and there can be several different names for exactly the same thing.</li> <li>Today, we seem to be in a rush to re-create all our old-fashioned paperwork in the cloud, and that keeps us working with our information by hand. Google docs aren't connected to the rest of the web, they have no idea what you mean to communicate, and there's no helpful scaffolding for building structured documents. We use the same tool to write a contract as we use to write a poem. That keeps us from reaping the benefits of an always-on network that spans the entire human enterprise.</li> <li>Somehow, we've all managed to join a number of social networks, which force us to re-create a new profile and re-establish all our connections all over again inside each one. How many times do we need to do that, and how maintainable is this?</li> </ul> I could go on, but I think you get the idea. The web as we've built it isn't scaling very well, and even Google (which, it must be remembered, is in the advertising business, same as Facebook) can't easily make sense of it all. We're trying to make sense of something that is fundamentally broken and scaling badly. The semantic web is not the answer. Semantic technology will only help us when we know exactly what problem we're trying to solve and what it will take to solve the entire problem, not just the data problem. That's why I propose talking about the switch from pushing information to pulling it. Much of it involves making our information unambiguous, and that, I think is a better term than the semantic web. Let's call it the unambiguous web. Let's watch for the business shift from pushing information to pulling it and apply the appropriate technology to get us there. You can learn more from my writings on this subject: <ol> <li>Start here: <a href="http://whatisthesemanticweb.com/" title="What is the Semantic Web?">What is the Semantic Web?</a></li> <li>Read my blog: <a href="http://www.thepowerofpull.com/" title="The Power of Pull">The Power of Pull</a></li> <li>Buy the book: <a href="http://www.amazon.com/Pull-Power-Semantic-Transform-Business/dp/1591842778/" title="Pull: The Power of the Semantic Web to Transform Your Business">Pull: The Power of the Semantic Web to Transform Your Business</a></li> <li>Follow me on Twitter: <a href="http://twitter.com/PullNews" title="@PullNews">@PullNews</a> and <a href="http://twitter.com/_dsiegel" title="@_dsiegel">@_dsiegel</a></li> </ol> It doesn't matter what we call it. And if we focus too much on linked data, RDF, and ontologies, we'll miss the big opportunity. I hope to change the way you think about the web. There's no time to lose. Dive in and join me.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=897191"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=897191" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/what-is-the-semantic-web/#comments" title="What is the Semantic Web?">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/what-is-the-semantic-web/ http://www.semanticfocus.com/blog/entry/title/what-is-the-semantic-web/ http://www.semanticfocus.com/blog/entry/title/what-is-the-semantic-web/#comments RDF Semantic Web Research Isn't Working Zack Rosen Fri, 30 Jul 10 18:40:11 +0000 It has been twelve years since <a href="http://en.wikipedia.org/wiki/Tim_Berners-Lee">Tim Berners-Lee</a> threw up his hands and said "it's all crap, let's do it over" and <a href="http://www.w3.org/DesignIssues/Semantic.html">set off</a> to create the <a href="http://en.wikipedia.org/wiki/Semantic_Web">Semantic Web</a>. We've got <a href="http://en.wikipedia.org/wiki/RSS">very little</a> to show for it so far. I firmly believe the work Semantic Web technologists are pursuing is important and the concepts will inevitably be realized and I very much want to see this research become viable. But things are not moving fast enough and the tack semantic <a href="http://dig.csail.mit.edu/">researchers</a> are taking simply isn't working. Semantic Web technology is marred in a chicken/egg paradox. The technologies are generally not useful unless they are adopted and implemented on a large scale and people are not willing to invest in implementing them unless they are useful. This is exacerbated by the fact that there are very high technology, business, and social barriers to implementing the Semantic Web. <ol> <li>Technology Barriers: Even today, implementing RDF parsers is complex and difficult and the best tools are hopelessly slow. These are the most basic and fundamental tools the Semantic Web needs to operate and we still can't get them to work.</li> <li>Business Barriers: If the Semantic Web is implemented the current web industry will be intensely disrupted. EBay, Google, Amazon - virtually all mainstays of web-business will have to significantly adjust their business and technology models. Because of this web-businesses are trepidacious when it comes to investing, adopting, and promoting the Semantic Web.</li> <li>Social Barriers: The way in which we use the web will be greatly changed when the Semantic Web is implemented. Just look at the current state of usability in feed aggregation for a hint of what will be required for users to adopt the newly realized functionality.</li> </ol> These barriers are far from insurmountable, but the tack the current researchers are taking simply won't cut it. <ol> <li>Researchers are not finding adequate use-cases for implementing compelling functionality, instead they are creating widgets. There are a great many of organizations out there with real-world needs that would be greatly served by implemented Semantic Web-technology but researchers are for the most part turning a blind eye and working in a vacuum.</li> <li>Researchers are not picking their battles. Instead they are building generic tools with little real world applicability.</li> <li>Researchers are not keeping up with the web and web-publishing software. It seems that in an effort to remain neutral towards the current web-publishing industry Semantic Web researches choose to build their own tools in isolation. This means that anyone wanting to reuse these tools in a real world application has to re-implement them within their own web-publishing environment which due to the high technology barriers simply isn't happening. This is a shame because it would actually save the researchers time, effort, and money if they simply implemented their tools within web-publishing environments such as <a href="http://drupal.org/">Drupal</a> and it would allow adopters to implement the tools at zero cost.</li> <li>Researchers are not moving at the pace the web is currently developing, instead they are attempting to leap-frog it. A good example of this is the <a href="http://microformats.org/">Microformats</a> initiative. Why are Semantic Web researchers not collaborating with the teams pursuing these projects? </li> </ol> So what can we do about it? <ol> <li>Researchers need to stop thinking of themselves as researchers and start thinking of themselves as implementers.</li> <li>Research institutes need to join forces with emerging businesses looking to adopt semantic technology. This breaks the current model of business / research institute collaboration since startups do not have money to contribute to fund research, but tough noogies.</li> <li>Researchers need to build their tools in real-world development environments, i.e. as modules for LAMP web-publishing tools such as <a href="http://drupal.org/">Drupal</a> and <a href="http://wordpress.org/">Wordpress</a>. They need to find more organizational partners to deploy their solutions. They need to do something other than build <a href="http://simile.mit.edu/">widgets</a>.</li> </ol> This article is a re-post of an article that was originally published in 2006. Some of the points are now outdated.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=719919"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=719919" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/rdf-semantic-web-research-isnt-working/#comments" title="RDF Semantic Web Research Isn't Working">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/rdf-semantic-web-research-isnt-working/ http://www.semanticfocus.com/blog/entry/title/rdf-semantic-web-research-isnt-working/ http://www.semanticfocus.com/blog/entry/title/rdf-semantic-web-research-isnt-working/#comments Bueda API Turns Tags into RDF URIs Vasco Pedro Fri, 26 Feb 10 12:54:58 +0000 <a href="http://www.bueda.com/" title="Bueda" rel="Bueda"><img class="right" src="http://www.semanticfocus.com/media/insets/buedalogo.png" width="121" height="37" alt="Bueda" title="Bueda" /></a>A large percentage of content that users deal with on a daily basis is created by other users. Every minute more than 90,000 videos and images are uploaded to YouTube, Flickr and other social media websites, yet this represents a relatively small revenue percentage when compared with traditional media. We believe that one reason for this is the publisher's lack of ability to understand high density content that lacks the adequate description. With mobile platforms providing users with easy methods for rich media upload, this problem will rapidly increase. Tags are an attempt to mitigate this problem. They allow users an easy way to label content with the labels that make sense to them. Its strengths rely in the simplicity for the user and the ability of the user to use anything as tag, enabling an accurate description of content from the user's perspective. Yet, the strength of tags is also a weakness when it comes to the publisher's ability to understand that content. A tag is, realistically speaking, any sequence of characters. It could be a well formed word, a <a href="http://www.bueda.com/demo/?query=canon">company name</a>, a <a href="http://www.bueda.com/demo/?query=abe%20lincoln">person name</a>, an ISBN number, a <a href="http://www.bueda.com/demo/?query=winter2010">concatenated version of dates and words</a>, etc. The problem of coverage and disambiguation makes a hard problem to solve. Bueda addresses this problem by presenting a new solution in the form of an API that can be used by developers to get clean information from noisy tags. It provides a low friction way of tapping into the latest in semantic analysis for tags in a scalable platform. <img class="center" src="http://www.semanticfocus.com/media/insets/bueda-example.png" width="570" height="357" alt="" title="" /> Bueda provides actionable information that enables targeted advertising, content recommendation, search engine optimization and semantic search, amongst other things. Even though the biggest impact might be in high-density content, such as rich media and pictures, the platform is open to any application and use case. Bueda is a CMU spin-off and uses proprietary technology for Semantic Resource integration, enabling the integration of heterogeneous data sources that enable open domain coverage in a distributed and scalable framework. Bueda is also an Alphalab alum and currently funded by Innovation Works. Bueda is currently in private beta. However, Semantic Focus readers have access to some <a href="http://www.bueda.com/accounts/registration/register/vrxabax4IHkFuCciuGUAevnSdZQPKi0B8eKHtw/">exclusive API keys</a>.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=976816"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=976816" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/bueda-api-turns-tags-into-rdf-uris/#comments" title="Bueda API Turns Tags into RDF URIs">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/bueda-api-turns-tags-into-rdf-uris/ http://www.semanticfocus.com/blog/entry/title/bueda-api-turns-tags-into-rdf-uris/ http://www.semanticfocus.com/blog/entry/title/bueda-api-turns-tags-into-rdf-uris/#comments Semantic Data Storage in Oracle Aditya Thatte Thu, 15 Jan 09 19:26:28 +0000 <a href="http://www.oracle.com/database/index.html" title="Oracle 11g" rel="external"><img class="right" src="http://www.semanticfocus.com/media/insets/oracle_db11g.png" width="230" height="62" alt="Oracle 11g" title="Oracle 11g" /></a><a href="http://www.oracle.com/database/index.html" title="Oracle 11g" rel="external">Oracle 10g Release 2 / Oracle 11g</a> offers a robust, scalable, secure platform to store RDF and OWL data. It allows efficient storage, loading and querying of semantic data. Queries are enhanced by adding relationships (ontologies) to data and evaluated on the basis of semantics. Data storage is in the form of RDF triples (Subject, Predicate, Object) and can scale up to millions of triples. The triples stored in the semantic data store are modeled as a graphed structure. All the data is stored in a single central schema allowing access to users for loading and querying data. The Subject and Object are modeled as nodes, while the predicates are denoted by links in the graphed structure. Nodes are stored and efficiently reused when required. An RDF triple in the semantic store has a subject (start node), predicate (relationship), object (end node), which comprises a link. A new link is created on inserting a new triple and nodes are reused if similar nodes already exists. New object types are defined to manage Semantic Data viz. SDO_RDF_TRIPLE and SDO_RDF_TRIPLE_S. The former stores the references to the data and the latter holds the actual data content. The nodes (Subject, Object) are stored in the RDF_NODE$ table, which can be further broken down into START_NODE_ID and END_NODE_ID. The RDF_LINKS$ table stores the record for the link whenever a new triple is inserted. Blank nodes may also be inserted as a part of any triple, which are stored in the RDF_BLANK_NODE$. An RDF model stores references to all the RDF data in the database and can be created by executing the sem_apis.create_sem_model procedure. Get started with semantic data management on Windows XP and <a href="http://thesemanticway.wordpress.com/2009/01/04/configuring-semantic-web-technology-support-in-oracle-11g-release-1-on-windows-xp/" title="Configuring Semantic Web Technology Support in Oracle 11g Release 1 on Windows XP" rel="external">configure semantic web technology support in Oracle 11g Release 1</a>. This article gives an overview of semantic data storage, however to get additional in-depth information on Semantic Data support in Oracle, here are some useful links: <ul> <li><a href="http://download.oracle.com/docs/cd/B19306_01/appdev.102/b19307/sdo_rdf_concepts.htm" title="RDF Overview" rel="external">RDF Overview</a></li> <li><a href="http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28397/sdo_rdf_concepts.htm#CIHBFEGC" title="Oracle Semantic Technologies Overview" rel="external">Oracle Semantic Technologies Overview</a></li> </ul> References: <a href="http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_tech_rdf_wp.pdf" title="RDF support in Oracle" rel="external">RDF support in Oracle</a> (.pdf)<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=234661"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=234661" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/semantic-data-storage-in-oracle/#comments" title="Semantic Data Storage in Oracle">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/semantic-data-storage-in-oracle/ http://www.semanticfocus.com/blog/entry/title/semantic-data-storage-in-oracle/ http://www.semanticfocus.com/blog/entry/title/semantic-data-storage-in-oracle/#comments Calling All RDF Dumps James Simmons Thu, 18 Dec 08 12:34:50 +0000 Today on the <a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData" title="Linking Open Data">Linking Open Data</a> mailing list, <a href="http://www.openlinksw.com/blog/~kidehen" title="Kingsley Idehen's blog">Kingsley Idehen</a> of <a href="http://www.openlinksw.com" title="OpenLink Software">OpenLink Software</a> announced that he is preparing to load the entire LOD cloud into Virtuoso 6.0 Cluster Edition. <a href="http://esw.w3.org/topic/DataSetRDFDumps" title="Linked data sets as RDF dumps">The datasets</a> are being added to a table on the ESW wiki, making it convenient for anyone doing Semantic Web research to get a hold of the datasets. Once all the datasets are added we should have a better idea of how much linked data there really is out there. This may also raise the bar for other triple stores and force them to develop methods for storing several billion triples. Here are his instructions for adding your dataset to the table: <ul> <li>Go to: <a href="http://esw.w3.org/topic/DataSetRDFDumps" title="Linked data sets as RDF dumps">http://esw.w3.org/topic/DataSetRDFDumps</a></li> <li>Add your data set to the table (if it isn't already listed) or correct erroneous entries</li> <li>Add a URL entry to the "Archive URL" column</li> <li>Add a Publisher URI to the "Publisher / Maintainer" column (used for the construction of Attribution Triples)</li> </ul> If you don't have a URI for yourself, you can get one by <a href="http://community.linkeddata.org/ods" title="Register">registering</a> and you will receive one.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=872378"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=872378" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/calling-all-rdf-dumps/#comments" title="Calling All RDF Dumps">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/calling-all-rdf-dumps/ http://www.semanticfocus.com/blog/entry/title/calling-all-rdf-dumps/ http://www.semanticfocus.com/blog/entry/title/calling-all-rdf-dumps/#comments Service Ontologies Aditya Thatte Sun, 14 Dec 08 18:32:58 +0000 Ontologies classifying and describing services are called service ontologies. The currently used WSDL interface describes a service by specifying the operation name, inputs required for the service invocation, output of the service and its target address for invocation. Human intervention is required in this loop since the current architecture only addresses the syntactical aspects of Web services and lacks choreography mechanisms. Service ontologies supplements the WSDL interface, since additional knowledge is required to enable automation discovery, invocation and composition of services. The idea is to annotate web services, enabling the automation of the web service life cycle. The existing conceptual models for describing services are OWL-S, WSMO, WSDL-S, SWSF, SAWSDL. Web services can be modeled in different tools like OWL-S Editor, OWL-S IDE, Protege, IRS-III, METEOR-S. For example, the OWL-S service ontology is classified into three categories: profile, model, grounding. The service component is actually an instance of the service and is linked to the profile, model, grounding by different properties. The profile is an advertisement of what the service does i.e what the service offers in terms of functionality. It considers input, output, preconditions, effects (IOPE). The input specifies the actual input required for invoking the web service, output specifies the actual output the client gets or expects. Preconditions indicates the conditions that need to be satisfied for the successful execution of the web service and finally effect describes the state of the web service after its execution. The service model describes how the service works in order to achieve its functionality. It describes atomic processes, composite processes and the message choreography involved in invoking the web service. Atomic processes are the ones, that undergo straight forward execution requiring standard input, whereas composite processes are the ones which involve a combination of different services. Service grounding illustrates as to how the service can be accessed. It describes the network protocols, data exchange formats, required to invoke the web service. Like OWL-S, the other models also address the semantic nature of web service descriptions thereby making an effort to automate the web service life cycle.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=121232"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=121232" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/service-ontologies/#comments" title="Service Ontologies">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/service-ontologies/ http://www.semanticfocus.com/blog/entry/title/service-ontologies/ http://www.semanticfocus.com/blog/entry/title/service-ontologies/#comments Semantic Web Service Life Cycle and Service Modeling Aditya Thatte Sun, 14 Dec 08 16:14:12 +0000 Semantic Web services follow a life cycle, right from deployment to its invocation. The life cycle of Semantic Web services comprises different stages like service modeling, service discovery, service definition and service delivery. The life cycle begins with modeling the web service and the service request by the provider and the consumer respectively. Web service descriptions are developed using models like OWL-S, WSMO. Service descriptions are used in the discovery stage on which discovery algorithms, matchmaking techniques are applied. Once a set of service providers are identified for a service requester, service definition takes place to select the concrete service. Finally, the concrete service is delivered to the service requester in the delivery phase. Web service modeling is a critical aspect of the web service life cycle. It requires loads of human effort for annotating web services. Services can be modeled using two approaches viz. Code driven approach, model driven approach. <h3>Code Driven Approach</h3> It is assumed that the web service is already implemented, and the corresponding WSDL is generated from it. The web service can be annotated semantically by adding OWL-S specifications. Tools like Java2WSDL, WSDL2OWL-S can be used to generate abstract OWL-S specifications. The service description would later be published to the registry for discovery and invocation. This approach is referred to as the Code Driven Approach since the starting point of this process uses a web service (code). <h3>Model Driven Approach</h3> This approach uses the high level service descriptions to generate partial code. Service descriptions are created using ontologies and the process model is used to generate stubs for implementing the web service. The code generated is used to create the WSDL, and later published to the registry.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=344226"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=344226" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/semantic-web-service-life-cycle-and-service-modeling/#comments" title="Semantic Web Service Life Cycle and Service Modeling">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/semantic-web-service-life-cycle-and-service-modeling/ http://www.semanticfocus.com/blog/entry/title/semantic-web-service-life-cycle-and-service-modeling/ http://www.semanticfocus.com/blog/entry/title/semantic-web-service-life-cycle-and-service-modeling/#comments Can Graphd Scale to Meet Semantic Web Demands? James Simmons Tue, 09 Dec 08 14:45:48 +0000 <a href="http://www.freebase.com/" title="Freebase" rel="external"><img class="right" src="http://www.semanticfocus.com/media/insets/freebase-logo.png" width="114" height="54" alt="Freebase" title="Freebase" /></a><a href="http://www.freebase.com/" title="Freebase.com" rel="external">Freebase</a> stores millions of entities and assertions about nearly every topic one can ponder (thanks are owed to their seed dataset – Wikipedia – and their amazing community). The amount of information that Freebase stores is incredible, and is a testament to what can be accomplished with the help of a dedicated community and a little (or a lot) of clever software engineering. <a href="http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/" title="A Brief Tour of Graphd" rel="external">Graphd</a> is the in-house tuple store powering Freebase's backend. Written in C, Graphd runs on Unix-based machines (presumably some Linux distro) and processes commands in a simple, template-based query language called MQL. The query language looks strikingly similar to JSON and Python dictionary syntax, so developers familiar with either should find working with their API a sinch. On performance, Freebase's Scott Meyer stated as of April 9th, 2008 that Graphd is able to demonstrate sustained throughput of about 200,000 simple queries per minute on a single AMD64 core (querying a graph of only 121 million tuples, however). For his example of what a simple query might look like, he gave the example "show me all people who are authors with names containing 'herman'." As well on April 9th, 2008, on disk, their current graph of 121 million primitives (tuples) consumed about 12gb (includes all index storage). We see that Graphd is able to handle a stunning sustained ~3300 queries/sec on a single AMD64 core. That's not anything to scoff at, either. However, the question I am finally getting around to, can Graphd scale to meet the demands of the Semantic Web? Eventually, Freebase will be much larger. 121m tuples is nothing when compared to the amount of data <a href="http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets" title="Linking Open Data Datasets" rel="external">currently available in RDF</a> (already in the order of billions of assertions). I have read in comments that Graphd runs completely in memory (or perhaps more likely, only the indices). This explains the amazing performance to a degree. On an AMD64 Phenom Quad Core with 2gb of RAM I can run "simple" operations linearly through a flat file of 17m Freebase tuples in under 6 seconds (in memory). On a slice of 1m tuples the test was able complete the iterations within ~0.003 seconds. The test was written in Python, so it isn't even as quick as the potential Graphd has (written in C). The test should illustrate the amazing performance you can achieve when processing entirely in memory, but when you can no longer store your entire set of indices in memory (say, for 3b+ tuples) you have to apply some of that clever software engineering to quickly locate data positions regardless of the number or distribution of indices. Can Freebase scale Graphd to meet the demands of the Semantic Web, or will they need to completely redesign the architecture of their backend to reach a scale not originally designed for? I cannot say, but I wish them the best of luck. I think I speak for everyone when I say I would really like to see Graphd open sourced! PS: Freebase, I promise I'll use the new logo in my posts going forward.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=886011"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=886011" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/can-graphd-scale-to-meet-semantic-web-demands/#comments" title="Can Graphd Scale to Meet Semantic Web Demands?">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/can-graphd-scale-to-meet-semantic-web-demands/ http://www.semanticfocus.com/blog/entry/title/can-graphd-scale-to-meet-semantic-web-demands/ http://www.semanticfocus.com/blog/entry/title/can-graphd-scale-to-meet-semantic-web-demands/#comments The Map of Data: Over 10 Billion Pieces of Reusable Information James Simmons Wed, 19 Nov 08 10:58:45 +0000 I just stumbled upon a useful resource from <a href="http://sindice.com/" title="Sindice" rel="external">Sindice</a> (the Semantic Web search engine) called the Map of Data. The <a href="http://sindice.com/map" title="Sindice's Map of Data" rel="external">Map of Data</a> lists sites that export their information via Microformats and embedded RDF (as well which format(s) the sites are using). Each site has been categorized and conveniently placed into lists. The categories include books, people, places, products and listings, social news, events, politics, and more. According to Sindice over 10 billion pieces of reusable information can already be found across 100 million pages.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=669891"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=669891" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/the-map-of-data-over-10-billion-pieces-of-reusable-information/#comments" title="The Map of Data: Over 10 Billion Pieces of Reusable Information">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/the-map-of-data-over-10-billion-pieces-of-reusable-information/ http://www.semanticfocus.com/blog/entry/title/the-map-of-data-over-10-billion-pieces-of-reusable-information/ http://www.semanticfocus.com/blog/entry/title/the-map-of-data-over-10-billion-pieces-of-reusable-information/#comments Algorithms vs. Data: The Seesaw Effect James Simmons Thu, 30 Oct 08 19:16:50 +0000 <img class="right" src="http://www.semanticfocus.com/media/insets/seesaw.jpg" width="215" height="140" alt="The Seesaw Effect of Algorithms vs. Data" title="The Seesaw Effect of Algorithms vs. Data" />Over the years I've noticed that the importance of algorithms and data tends to shift back and forth, depending on which at the time is hardest to duplicate (often from a business perspective). This effect seems to be caused by the availability or demand of one side increasing or decreasing, shifting the balance of importance to the other. At one point the world of software was dominated by the proprietary. The organization with the best software (backend, algorithms, etc) was the dominant entity and data (from say, a Web 2.0 perspective) was generally not the focus. This may have partly been the responsibility of a mindset formed during an era with very little storage space and before mass user activity on the Web. Things have changed and the word proprietary has become a sort-of developer faux pas. Open source has caused a paradigm shift away from the old proprietary software models and has allowed organizations to focus their attention on the other side of the equation: data. As a result of this shift we saw the start of the Web 2.0 era (perhaps with a few years of padding before the phrase started floating around). Now many organizations focus on the data they acquire and how they can leverage it to their advantage. As a result we see many walled gardens in an attempt to preserve this advantage. However we may be seeing another shift, this time back to software once again. The Semantic Web calls for making data open and ubiquitous. This is a strong paradigm shift away from the walled garden mindset (and most people understand this, especially the business set). After writing about <a href="http://www.semanticfocus.com/blog/entry/title/cross-pollinating-dbpedia-and-freebase/" title="Cross-Pollinating DBpedia and Freebase">the cross-pollination of DBpedia and Freebase</a> it occurred to me that the project with the most advanced proprietary information extraction algorithms would in a sense be the "dominant" project because it would be able to leverage its software in a space where data is becoming a commodity. Freebase has a secret sauce and that is probably their biggest advantage over competing projects. In the Semantic Web/Linked Data Web/Web 3.0 (whatever we feel like calling it at the time), data may decrease in value as it spreads and becomes more commoditized; at least in the original sense of value it once had: as a tool that only the walled gardens could leverage. We are seeing the walls come down, possibly to be replaced once again by proprietary algorithms.<a href="http://www.pheedo.com/click.phdo?x=4d748c4f04194f988f6b7971541a619a&u=500118"><img src="http://www.pheedo.com/img.phdo?x=4d748c4f04194f988f6b7971541a619a&u=500118" border="0"/></a>Got something to say? <a href="http://www.semanticfocus.com/blog/entry/title/algorithms-vs-data-the-seesaw-effect/#comments" title="Algorithms vs. Data: The Seesaw Effect">Leave a comment!</a> http://www.semanticfocus.com/blog/entry/title/algorithms-vs-data-the-seesaw-effect/ http://www.semanticfocus.com/blog/entry/title/algorithms-vs-data-the-seesaw-effect/ http://www.semanticfocus.com/blog/entry/title/algorithms-vs-data-the-seesaw-effect/#comments