<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!-- generator="Joomla! 1.5 - Open Source Content Management" --><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-gb">
	<title type="text">Unified Information Access Blog | Attivio Blog</title>
	<subtitle type="text">Attivio is a software company specializing in enterprise search solutions and unified information access (uia).  Our product is the Active Intelligence Engine, AIE.</subtitle>
	<link rel="alternate" type="text/html" href="http://www.attivio.com" />
	<id>http://www.attivio.com/blog.feed</id>
	<updated>2012-05-27T03:31:49Z</updated>
	<generator uri="http://joomla.org" version="1.5">Joomla! 1.5 - Open Source Content Management</generator>

	<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/AttivioBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="attivioblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">AttivioBlog</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><entry>
		<title>Communicating Across the Two Worlds of Structured and Unstructured Information, with SQL</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/57-unified-information-access/1128-communicating-across-the-two-worlds-of-structured-and-unstructured-information-with-sql.html" />
		<published>2012-05-17T17:56:25Z</published>
		<updated>2012-05-17T17:56:25Z</updated>
		<id>http://www.attivio.com/blog/57-unified-information-access/1128-communicating-across-the-two-worlds-of-structured-and-unstructured-information-with-sql.html</id>
		<author>
			<name>Rik Tamm-Daniels and Greg George</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;One of our colleagues at Attivio has a niece and nephew who are as fluent in Japanese as they are in English. Their mom is Japanese and their dad is American, so they have a completely bilingual household. In one moment they might talk to each other in English, their Mom or Dad might call to them from another room in Japanese, and they will answer in kind, switching between their two languages as easily as switching TV channels.&lt;/p&gt;
&lt;p&gt;Unified information access (UIA) technology is a lot like being strongly bilingual, in that UIA also quickly and easily communicates information that spans different worlds — specifically, structured data (databases) and unstructured content (documents/text), whether from internal and external sources.&lt;/p&gt;
&lt;p&gt;Just as our colleague's niece and nephew can communicate as easily with anyone when visiting Japan as they can at home, a true UIA platform can also freely communicate with disparate information sources and with other applications; particularly BI tools, self-service dashboards and analytic systems. Doing so requires supporting the widely-used SQL (Structured Query Language) standard, via ODBC/JDBC connectivity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Much energy and effort has gone into the production of tools and technologies to analyze data. From iPhone and iPad apps to spreadsheets, reporting tools, "self-service" dashboards, various analytic systems, right on up to full-blown ad-hoc drag &amp;amp; drop BI tools, we live in an era where everything is analyzed, and the tools we use for that analysis actually contribute to better decisions. One of the keys to the interoperability of this huge ecosystem is a standards-based approach: the broad use of the Structured Query Language (SQL) is the reason the eco-system exists.&lt;/p&gt;
&lt;p&gt;The downside to many of these tools is that they operate only on so-called structured data — until recently, ignoring valuable context contained in unstructured sources. Without an integrated and fully correlated view of the complete picture, organizations will miss out on a much wider world of business insights and understanding; not unlike relying on a really bad language interpreter (poor Bill Murray!):&lt;/p&gt;
&lt;iframe src="http://www.youtube-nocookie.com/embed/FiQnH450hPM?rel=0" frameborder="0" height="315" width="560"&gt;&lt;/iframe&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unified Information Access&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Fortunately, Attivio’s Active Intelligence Enging (AIE) gives you “the best of both worlds.” Because AIE supports querying in SQL via ODBC and JDBC, organizations can use it to explore all information regardless of source or format. By deploying AIE as a back-end unified information source, your users can continue to use BI and other tools they are comfortable with — but now with the added ability to access a far more complete business informational picture, for more informed decisions and deeper understanding that is simply not possible working with structured or unstructured information alone.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img style="margin: 5px;" title="Screenshot of Attivio-Tibco Spotfire" alt="Attivio-TibcoSpotfire Screenshot" src="images/at_images/general/blog/tbco-sptfre-screen.jpg" height="354" width="600" /&gt;&lt;/p&gt;
&lt;p&gt;One key to making this happen: Active Intelligence SQL (AI-SQL) - a set of full-text function extensions to SQL. AI-SQL functions make it easy for SQL query authors to incorporate AIE's unique features including operations like:&lt;/p&gt;
&lt;ul style="padding-left: 25px;"&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;REGEX&lt;/strong&gt;&lt;/span&gt; — find rows based on a pattern in a field.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;STARTSWITH&lt;/strong&gt;&lt;/span&gt; — find rows in which a given fields starts with a specified string.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;ENDSWITH&lt;/strong&gt;&lt;/span&gt; — find rows in which a given fields ends with a specified string.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;NEAR&lt;/strong&gt;&lt;/span&gt; — find rows based on two or more terms being within a specific number of words of each other&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;ONEAR&lt;/strong&gt;&lt;/span&gt; — find rows based on two or more terms being within a specific number of words of each other in the order they are specified to the function.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;FULLTEXTSEARCH&lt;/strong&gt;&lt;/span&gt; — apply a simple query language query as a filter to a specified field.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;p&gt;The &lt;strong&gt;&lt;span style="font-family: courier new,courier;"&gt;fulltextsearch&lt;/span&gt;&lt;/strong&gt; function enables blended search and analytic user interfaces by enabling applications to plug user search box input into a SQL query to allow the user to interact with data - but in a controlled, simple way.&lt;/p&gt;
&lt;p&gt;Some examples of using AI-SQL extensions:&lt;/p&gt;
&lt;pre&gt;select r_regionkey,r_name from all_tables where r_name = regex('e.*e')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = startswith('brown')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = endswith('car')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = near('brown','car','set(distance=1)')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = onear('brown','car','set(distance=1)')&lt;br /&gt;&lt;br /&gt;select company.company, company.ticker, count(news.newsarticleid) &lt;br /&gt;FROM company INNER JOIN news ON company.ticker = news.ticker&lt;br /&gt;WHERE news.content=fulltextsearch(?UserSearch)&lt;br /&gt;GROUP BY company.company, company.ticker&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;The Importance of JOIN&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;It should be noted that it is no easy feat to support a detailed query language like SQL, while also serving as a UIA platform that can ingest and search across countless information sources at massive scale.&lt;br /&gt; &lt;br /&gt;Some SQL capabilities, like a single field GROUP BY, are easily accommodated by unstructured search, and are relatively straightforward to handle.  For example, given the following query:&lt;/p&gt;
&lt;pre&gt;&lt;strong&gt;SELECT name,count(*) FROM customers GROUP BY name&lt;/strong&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;This can be easily issued as an AIE &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Faceted_search"&gt;facet query&lt;/a&gt;, asking for facet values and counts on the name field:&lt;/p&gt;
&lt;pre&gt;&lt;strong&gt;query-request&amp;gt;&lt;/strong&gt; table:customers&lt;strong&gt;&lt;br /&gt;facet-request&amp;gt;&lt;/strong&gt; name&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;However, other SQL features, like JOIN, are among the most difficult to support; but happily, they are handled by AIE’s &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1019:patent-issued-to-attivio-for-unique-method-of-unifying-multiple-content-and-data-sources&amp;amp;catid=33:newsroom&amp;amp;Itemid=38"&gt;patented&lt;/a&gt; ability to &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=665:aie-join&amp;amp;catid=90:technology&amp;amp;Itemid=156"&gt;dynamically JOIN data and content without advance data modeling on an unstructured index&lt;/a&gt;. These advanced AIE capabilities allow us to execute all of the queries of the &lt;a target="_blank" href="http://www.tpc.org/tpch/"&gt;TPC-H&lt;/a&gt; benchmark, including TPC-H "Query 3" joining three tables and aggregating the results:&lt;/p&gt;
&lt;pre&gt;SELECT l_orderkey,  &lt;br /&gt;       SUM(l_extendedprice*(1-l_discount)) as revenue, &lt;br /&gt;       o_orderdate, &lt;br /&gt;       o_shippriority&lt;br /&gt;FROM customer,&lt;br /&gt;     orders,&lt;br /&gt;     lineitem&lt;br /&gt;WHERE c_mktsegment = 'BUILDING'&lt;br /&gt;      and c_custkey = o_custkey   &lt;br /&gt;      and l_orderkey = o_orderkey   &lt;br /&gt;      and o_orderdate &amp;lt; '1995-03-15'   &lt;br /&gt;      and l_shipdate &amp;gt;  '1995-03-15'  &lt;br /&gt;GROUP BY l_orderkey,   &lt;br /&gt;         o_orderdate,   &lt;br /&gt;         o_shippriority  &lt;br /&gt;ORDER BY revenue DESC,   &lt;br /&gt;         o_orderdate&lt;/pre&gt;
&lt;strong&gt;Partnering for Success&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;While AIE supports a wide range of SQL, as with any backend system, some SQL operations will not be available. According to a leading analyst, no relational database actually supports the full ANSI-92 standard.&lt;br /&gt;&lt;br /&gt;BI vendors recognize this and provide mechanisms to tune the SQL that a tool tries to issue to a specific backend. To help our customers in ensuring success with a given BI tool, Attivio has implemented a BI vendor certification program, which currently includes &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1099:tableau&amp;amp;catid=93:technology-alliances&amp;amp;Itemid=34"&gt;Tableau&lt;/a&gt;, &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1110:tibco-software&amp;amp;catid=93:technology-alliances&amp;amp;Itemid=34"&gt;TIBCO Spotfire&lt;/a&gt; and &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1098:qlikview&amp;amp;catid=93:technology-alliances&amp;amp;Itemid=34"&gt;QlikView&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This program certifies tools into two levels:&lt;br /&gt; 
&lt;ul style="padding-left: 25px;"&gt;
&lt;li&gt;&lt;strong&gt;Gold&lt;/strong&gt; — The BI tool is certified to work with AIE using standard SQL&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Platinum&lt;/strong&gt; — The BI tool is certified to work with AIE using standard SQL, but also supports the use of AI-SQL extension functions.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;Opening up the flexibility of AIE’s universal index to support a powerful industry standard language like SQL via ODBC and JDBC, enabling BI tools and scores of other compatible applications to access and analyze unified information, has proven to be a compelling value proposition for organizations looking for new opportunities to build revenue, cut costs and/or increase competitiveness. Clearly, AIE also speaks the universal language of business success.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Learn more about AIE’s &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=666:aie-sql-and-jdbc&amp;amp;catid=90:technology&amp;amp;Itemid=157"&gt;support for SQL via ODBC and JDBC&lt;/a&gt;, as well as &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=665:aie-join&amp;amp;catid=90:technology&amp;amp;Itemid=156"&gt;AIE’s Query-Time JOIN&lt;/a&gt; with no data modeling required.&lt;/em&gt;</summary>
		<content type="html">&lt;p&gt;One of our colleagues at Attivio has a niece and nephew who are as fluent in Japanese as they are in English. Their mom is Japanese and their dad is American, so they have a completely bilingual household. In one moment they might talk to each other in English, their Mom or Dad might call to them from another room in Japanese, and they will answer in kind, switching between their two languages as easily as switching TV channels.&lt;/p&gt;
&lt;p&gt;Unified information access (UIA) technology is a lot like being strongly bilingual, in that UIA also quickly and easily communicates information that spans different worlds — specifically, structured data (databases) and unstructured content (documents/text), whether from internal and external sources.&lt;/p&gt;
&lt;p&gt;Just as our colleague's niece and nephew can communicate as easily with anyone when visiting Japan as they can at home, a true UIA platform can also freely communicate with disparate information sources and with other applications; particularly BI tools, self-service dashboards and analytic systems. Doing so requires supporting the widely-used SQL (Structured Query Language) standard, via ODBC/JDBC connectivity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Much energy and effort has gone into the production of tools and technologies to analyze data. From iPhone and iPad apps to spreadsheets, reporting tools, "self-service" dashboards, various analytic systems, right on up to full-blown ad-hoc drag &amp;amp; drop BI tools, we live in an era where everything is analyzed, and the tools we use for that analysis actually contribute to better decisions. One of the keys to the interoperability of this huge ecosystem is a standards-based approach: the broad use of the Structured Query Language (SQL) is the reason the eco-system exists.&lt;/p&gt;
&lt;p&gt;The downside to many of these tools is that they operate only on so-called structured data — until recently, ignoring valuable context contained in unstructured sources. Without an integrated and fully correlated view of the complete picture, organizations will miss out on a much wider world of business insights and understanding; not unlike relying on a really bad language interpreter (poor Bill Murray!):&lt;/p&gt;
&lt;iframe src="http://www.youtube-nocookie.com/embed/FiQnH450hPM?rel=0" frameborder="0" height="315" width="560"&gt;&lt;/iframe&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unified Information Access&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Fortunately, Attivio’s Active Intelligence Enging (AIE) gives you “the best of both worlds.” Because AIE supports querying in SQL via ODBC and JDBC, organizations can use it to explore all information regardless of source or format. By deploying AIE as a back-end unified information source, your users can continue to use BI and other tools they are comfortable with — but now with the added ability to access a far more complete business informational picture, for more informed decisions and deeper understanding that is simply not possible working with structured or unstructured information alone.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img style="margin: 5px;" title="Screenshot of Attivio-Tibco Spotfire" alt="Attivio-TibcoSpotfire Screenshot" src="images/at_images/general/blog/tbco-sptfre-screen.jpg" height="354" width="600" /&gt;&lt;/p&gt;
&lt;p&gt;One key to making this happen: Active Intelligence SQL (AI-SQL) - a set of full-text function extensions to SQL. AI-SQL functions make it easy for SQL query authors to incorporate AIE's unique features including operations like:&lt;/p&gt;
&lt;ul style="padding-left: 25px;"&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;REGEX&lt;/strong&gt;&lt;/span&gt; — find rows based on a pattern in a field.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;STARTSWITH&lt;/strong&gt;&lt;/span&gt; — find rows in which a given fields starts with a specified string.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;ENDSWITH&lt;/strong&gt;&lt;/span&gt; — find rows in which a given fields ends with a specified string.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;NEAR&lt;/strong&gt;&lt;/span&gt; — find rows based on two or more terms being within a specific number of words of each other&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;ONEAR&lt;/strong&gt;&lt;/span&gt; — find rows based on two or more terms being within a specific number of words of each other in the order they are specified to the function.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: courier new,courier;"&gt;&lt;strong&gt;FULLTEXTSEARCH&lt;/strong&gt;&lt;/span&gt; — apply a simple query language query as a filter to a specified field.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;p&gt;The &lt;strong&gt;&lt;span style="font-family: courier new,courier;"&gt;fulltextsearch&lt;/span&gt;&lt;/strong&gt; function enables blended search and analytic user interfaces by enabling applications to plug user search box input into a SQL query to allow the user to interact with data - but in a controlled, simple way.&lt;/p&gt;
&lt;p&gt;Some examples of using AI-SQL extensions:&lt;/p&gt;
&lt;pre&gt;select r_regionkey,r_name from all_tables where r_name = regex('e.*e')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = startswith('brown')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = endswith('car')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = near('brown','car','set(distance=1)')&lt;br /&gt;select p_partkey,p_container from all_tables where p_container = onear('brown','car','set(distance=1)')&lt;br /&gt;&lt;br /&gt;select company.company, company.ticker, count(news.newsarticleid) &lt;br /&gt;FROM company INNER JOIN news ON company.ticker = news.ticker&lt;br /&gt;WHERE news.content=fulltextsearch(?UserSearch)&lt;br /&gt;GROUP BY company.company, company.ticker&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;The Importance of JOIN&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;It should be noted that it is no easy feat to support a detailed query language like SQL, while also serving as a UIA platform that can ingest and search across countless information sources at massive scale.&lt;br /&gt; &lt;br /&gt;Some SQL capabilities, like a single field GROUP BY, are easily accommodated by unstructured search, and are relatively straightforward to handle.  For example, given the following query:&lt;/p&gt;
&lt;pre&gt;&lt;strong&gt;SELECT name,count(*) FROM customers GROUP BY name&lt;/strong&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;This can be easily issued as an AIE &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Faceted_search"&gt;facet query&lt;/a&gt;, asking for facet values and counts on the name field:&lt;/p&gt;
&lt;pre&gt;&lt;strong&gt;query-request&amp;gt;&lt;/strong&gt; table:customers&lt;strong&gt;&lt;br /&gt;facet-request&amp;gt;&lt;/strong&gt; name&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;However, other SQL features, like JOIN, are among the most difficult to support; but happily, they are handled by AIE’s &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1019:patent-issued-to-attivio-for-unique-method-of-unifying-multiple-content-and-data-sources&amp;amp;catid=33:newsroom&amp;amp;Itemid=38"&gt;patented&lt;/a&gt; ability to &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=665:aie-join&amp;amp;catid=90:technology&amp;amp;Itemid=156"&gt;dynamically JOIN data and content without advance data modeling on an unstructured index&lt;/a&gt;. These advanced AIE capabilities allow us to execute all of the queries of the &lt;a target="_blank" href="http://www.tpc.org/tpch/"&gt;TPC-H&lt;/a&gt; benchmark, including TPC-H "Query 3" joining three tables and aggregating the results:&lt;/p&gt;
&lt;pre&gt;SELECT l_orderkey,  &lt;br /&gt;       SUM(l_extendedprice*(1-l_discount)) as revenue, &lt;br /&gt;       o_orderdate, &lt;br /&gt;       o_shippriority&lt;br /&gt;FROM customer,&lt;br /&gt;     orders,&lt;br /&gt;     lineitem&lt;br /&gt;WHERE c_mktsegment = 'BUILDING'&lt;br /&gt;      and c_custkey = o_custkey   &lt;br /&gt;      and l_orderkey = o_orderkey   &lt;br /&gt;      and o_orderdate &amp;lt; '1995-03-15'   &lt;br /&gt;      and l_shipdate &amp;gt;  '1995-03-15'  &lt;br /&gt;GROUP BY l_orderkey,   &lt;br /&gt;         o_orderdate,   &lt;br /&gt;         o_shippriority  &lt;br /&gt;ORDER BY revenue DESC,   &lt;br /&gt;         o_orderdate&lt;/pre&gt;
&lt;strong&gt;Partnering for Success&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;While AIE supports a wide range of SQL, as with any backend system, some SQL operations will not be available. According to a leading analyst, no relational database actually supports the full ANSI-92 standard.&lt;br /&gt;&lt;br /&gt;BI vendors recognize this and provide mechanisms to tune the SQL that a tool tries to issue to a specific backend. To help our customers in ensuring success with a given BI tool, Attivio has implemented a BI vendor certification program, which currently includes &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1099:tableau&amp;amp;catid=93:technology-alliances&amp;amp;Itemid=34"&gt;Tableau&lt;/a&gt;, &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1110:tibco-software&amp;amp;catid=93:technology-alliances&amp;amp;Itemid=34"&gt;TIBCO Spotfire&lt;/a&gt; and &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1098:qlikview&amp;amp;catid=93:technology-alliances&amp;amp;Itemid=34"&gt;QlikView&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This program certifies tools into two levels:&lt;br /&gt; 
&lt;ul style="padding-left: 25px;"&gt;
&lt;li&gt;&lt;strong&gt;Gold&lt;/strong&gt; — The BI tool is certified to work with AIE using standard SQL&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Platinum&lt;/strong&gt; — The BI tool is certified to work with AIE using standard SQL, but also supports the use of AI-SQL extension functions.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;Opening up the flexibility of AIE’s universal index to support a powerful industry standard language like SQL via ODBC and JDBC, enabling BI tools and scores of other compatible applications to access and analyze unified information, has proven to be a compelling value proposition for organizations looking for new opportunities to build revenue, cut costs and/or increase competitiveness. Clearly, AIE also speaks the universal language of business success.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Learn more about AIE’s &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=666:aie-sql-and-jdbc&amp;amp;catid=90:technology&amp;amp;Itemid=157"&gt;support for SQL via ODBC and JDBC&lt;/a&gt;, as well as &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=665:aie-join&amp;amp;catid=90:technology&amp;amp;Itemid=156"&gt;AIE’s Query-Time JOIN&lt;/a&gt; with no data modeling required.&lt;/em&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=oUF0rcae6j8:msnYmN2VoFU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=oUF0rcae6j8:msnYmN2VoFU:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=oUF0rcae6j8:msnYmN2VoFU:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=oUF0rcae6j8:msnYmN2VoFU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=oUF0rcae6j8:msnYmN2VoFU:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=oUF0rcae6j8:msnYmN2VoFU:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=oUF0rcae6j8:msnYmN2VoFU:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=oUF0rcae6j8:msnYmN2VoFU:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=oUF0rcae6j8:msnYmN2VoFU:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/oUF0rcae6j8" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>How We Handle Open Source</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/56-java-development/1126-how-we-handle-open-source.html" />
		<published>2012-05-10T11:54:35Z</published>
		<updated>2012-05-10T11:54:35Z</updated>
		<id>http://www.attivio.com/blog/56-java-development/1126-how-we-handle-open-source.html</id>
		<author>
			<name>Will Johnson</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;Attivio, like many other companies, uses open source where appropriate. The &lt;a target="_blank" href="http://www.jcp.org/en/home/index"&gt;Java community&lt;/a&gt; in particular has great open source technologies powering some of the hottest technology trends today, from big data (&lt;a target="_blank" href="http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F"&gt;Hadoop &lt;/a&gt;and &lt;a target="_blank" href="http://hadoop.apache.org/mapreduce/"&gt;MapReduce&lt;/a&gt;) to columnar storage (&lt;a target="_blank" href="http://www.monetdb.org/"&gt;MonetDB&lt;/a&gt;) to IDE frameworks (&lt;a target="_blank" href="http://www.eclipse.org/org/"&gt;Eclipse&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;There‘s also a trove of infrastructure technologies out there for logging, dependency ingestion, character set normalization, etc. Knowing when to use open source and when to write your own code is where the real value comes in as a developer and more importantly, can save you and the company a great deal of time and resources.&lt;/p&gt;
&lt;p&gt;All that being said, there are often bugs and functional gaps in open source code and as a developer you need to have a system in place that allows you to handle these issues. For example, we may find a bug in a particular package and submit a patch but since we don't control the release cycle of the open source projects we can't simply wait around for the fix. We are also very picky about less critical run-time issues such as threads being left around after a process or unit test finishes. Many open source projects assume that they are going to be run as a standalone server that terminates with the &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Java_virtual_machine"&gt;JVM&lt;/a&gt; and these sorts of assumptions can break or disturb our unit test environments.&lt;/p&gt;
&lt;p&gt;We've recently switched to the vendor branching methodology &lt;a target="_blank" href="http://svnbook.red-bean.com/en/1.7/svn-book.html#svn.advanced.vendorbr"&gt;described here&lt;/a&gt; against our &lt;a target="_blank" href="http://subversion.apache.org/"&gt;Subversion&lt;/a&gt; repository. This allows us to import external projects into our revision control system.&lt;/p&gt;
&lt;p&gt;When we encounter an issue in open source code, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check in a copy of the source code for the class(es) we plan to fix  or enhance. It's important to have this copy so we can easily compare  and merge new revisions of the upstream project with our own  modifications.&lt;/li&gt;
&lt;li&gt;Make changes to the class(es) annotating them with &lt;span style="font-family: courier new,courier;"&gt;//attivio start&lt;/span&gt; mod and &lt;span style="font-family: courier new,courier;"&gt;//attivio end&lt;/span&gt; mod to make quick scans for changes that are  easier to tease out. We put a reference to the internal developer ticket  in the code as well; so later reviewers can back track those changes  easily when upgrading upstream code.&lt;/li&gt;
&lt;li&gt;Compile and build these changes into a &lt;a target="_blank" href="http://en.wikipedia.org/wiki/JAR_%28file_format%29"&gt;JAR file&lt;/a&gt; that contains not  only the class files, but also the source files in order to comply with  many open source licenses. Generally speaking, there is no intellectual  property in this code so we don't have to worry about anyone reviewing  our changes.&lt;/li&gt;
&lt;li&gt;Deploy this JAR to our internal &lt;a target="_blank" href="http://maven.apache.org/guides/introduction/introduction-to-repositories.html"&gt;Maven repository&lt;/a&gt; with an Attivio  specific revision.  We then update our top level POM (&lt;a target="_blank" href="http://maven.apache.org/pom.html#What_is_the_POM"&gt;Project Object Model&lt;/a&gt;) to reference this  Attivio version instead of the public Maven repository version. &lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;p&gt;Lastly, we strive to create a formal ticket, patch and test for contribution back to the open source community. In all fairness, this is our weakest part of the process, but one we are striving to improve. Many of our changes are small in nature and fix either esoteric edge cases or general code cleanliness like the thread example I mentioned above, but most changes are still useful for the wider community.&lt;/p&gt;
&lt;p&gt;Related:&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=592-enterprise-strategy-group-report-todays-information-access-requirements-outpace-open-source-search-options&amp;amp;catid=69-analyst-reports&amp;amp;Itemid=213"&gt;Enterprise Strategy Group Report - Today's Information Access Requirements Outpace Open Source Search Options&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=639-thinking-about-replacing-your-search-engine-or-search-appliance&amp;amp;catid=54-enterprise-search&amp;amp;Itemid=245"&gt;Thinking About Replacing Your Search Engine or Search Appliance?&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=175-software-at-the-speed-of-light&amp;amp;catid=53-attivio&amp;amp;Itemid=245"&gt;Software at the Speed of Light&lt;/a&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;Attivio, like many other companies, uses open source where appropriate. The &lt;a target="_blank" href="http://www.jcp.org/en/home/index"&gt;Java community&lt;/a&gt; in particular has great open source technologies powering some of the hottest technology trends today, from big data (&lt;a target="_blank" href="http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F"&gt;Hadoop &lt;/a&gt;and &lt;a target="_blank" href="http://hadoop.apache.org/mapreduce/"&gt;MapReduce&lt;/a&gt;) to columnar storage (&lt;a target="_blank" href="http://www.monetdb.org/"&gt;MonetDB&lt;/a&gt;) to IDE frameworks (&lt;a target="_blank" href="http://www.eclipse.org/org/"&gt;Eclipse&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;There‘s also a trove of infrastructure technologies out there for logging, dependency ingestion, character set normalization, etc. Knowing when to use open source and when to write your own code is where the real value comes in as a developer and more importantly, can save you and the company a great deal of time and resources.&lt;/p&gt;
&lt;p&gt;All that being said, there are often bugs and functional gaps in open source code and as a developer you need to have a system in place that allows you to handle these issues. For example, we may find a bug in a particular package and submit a patch but since we don't control the release cycle of the open source projects we can't simply wait around for the fix. We are also very picky about less critical run-time issues such as threads being left around after a process or unit test finishes. Many open source projects assume that they are going to be run as a standalone server that terminates with the &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Java_virtual_machine"&gt;JVM&lt;/a&gt; and these sorts of assumptions can break or disturb our unit test environments.&lt;/p&gt;
&lt;p&gt;We've recently switched to the vendor branching methodology &lt;a target="_blank" href="http://svnbook.red-bean.com/en/1.7/svn-book.html#svn.advanced.vendorbr"&gt;described here&lt;/a&gt; against our &lt;a target="_blank" href="http://subversion.apache.org/"&gt;Subversion&lt;/a&gt; repository. This allows us to import external projects into our revision control system.&lt;/p&gt;
&lt;p&gt;When we encounter an issue in open source code, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check in a copy of the source code for the class(es) we plan to fix  or enhance. It's important to have this copy so we can easily compare  and merge new revisions of the upstream project with our own  modifications.&lt;/li&gt;
&lt;li&gt;Make changes to the class(es) annotating them with &lt;span style="font-family: courier new,courier;"&gt;//attivio start&lt;/span&gt; mod and &lt;span style="font-family: courier new,courier;"&gt;//attivio end&lt;/span&gt; mod to make quick scans for changes that are  easier to tease out. We put a reference to the internal developer ticket  in the code as well; so later reviewers can back track those changes  easily when upgrading upstream code.&lt;/li&gt;
&lt;li&gt;Compile and build these changes into a &lt;a target="_blank" href="http://en.wikipedia.org/wiki/JAR_%28file_format%29"&gt;JAR file&lt;/a&gt; that contains not  only the class files, but also the source files in order to comply with  many open source licenses. Generally speaking, there is no intellectual  property in this code so we don't have to worry about anyone reviewing  our changes.&lt;/li&gt;
&lt;li&gt;Deploy this JAR to our internal &lt;a target="_blank" href="http://maven.apache.org/guides/introduction/introduction-to-repositories.html"&gt;Maven repository&lt;/a&gt; with an Attivio  specific revision.  We then update our top level POM (&lt;a target="_blank" href="http://maven.apache.org/pom.html#What_is_the_POM"&gt;Project Object Model&lt;/a&gt;) to reference this  Attivio version instead of the public Maven repository version. &lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;p&gt;Lastly, we strive to create a formal ticket, patch and test for contribution back to the open source community. In all fairness, this is our weakest part of the process, but one we are striving to improve. Many of our changes are small in nature and fix either esoteric edge cases or general code cleanliness like the thread example I mentioned above, but most changes are still useful for the wider community.&lt;/p&gt;
&lt;p&gt;Related:&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=592-enterprise-strategy-group-report-todays-information-access-requirements-outpace-open-source-search-options&amp;amp;catid=69-analyst-reports&amp;amp;Itemid=213"&gt;Enterprise Strategy Group Report - Today's Information Access Requirements Outpace Open Source Search Options&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=639-thinking-about-replacing-your-search-engine-or-search-appliance&amp;amp;catid=54-enterprise-search&amp;amp;Itemid=245"&gt;Thinking About Replacing Your Search Engine or Search Appliance?&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=175-software-at-the-speed-of-light&amp;amp;catid=53-attivio&amp;amp;Itemid=245"&gt;Software at the Speed of Light&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=N8N-SssWcr0:K5OMwuIdEK8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=N8N-SssWcr0:K5OMwuIdEK8:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=N8N-SssWcr0:K5OMwuIdEK8:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=N8N-SssWcr0:K5OMwuIdEK8:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=N8N-SssWcr0:K5OMwuIdEK8:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=N8N-SssWcr0:K5OMwuIdEK8:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=N8N-SssWcr0:K5OMwuIdEK8:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=N8N-SssWcr0:K5OMwuIdEK8:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=N8N-SssWcr0:K5OMwuIdEK8:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/N8N-SssWcr0" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>More About Unstructured Information</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/55-industry-insights/1122-more-about-unstructured-information.html" />
		<published>2012-04-30T12:58:57Z</published>
		<updated>2012-04-30T12:58:57Z</updated>
		<id>http://www.attivio.com/blog/55-industry-insights/1122-more-about-unstructured-information.html</id>
		<author>
			<name>Sid Probstein</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;I recently attended the YES Boston panel on Big Data &amp;amp; Analytics at the Harvard Innovation Center. Overall it was an excellent discussion. At least one panelist indicated that "It feels like 1995 again", referring to that heady period when the Internet emerged and drove the dot-com era forward. Most of the focus was on the superb opportunities that Big Data creates for entrepreneurs. A few panelists also suggested that Big Data would lead to the death of the traditional relational database and data warehouse.&lt;/p&gt;
&lt;p&gt;Earlier in the discussion, one panelist characterized Big Data as having "three V's — Volume, Velocity, Variety" etc. This has become the standard way to segment the various use cases that collectively add up to "Big Data", as well as a number of other often cited characteristics like value and complexity. However, another panelist then said that Big Data was mostly about unstructured information. I have &lt;a target="_blank" href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=875:missing-some-key-points-with-big-data&amp;amp;catid=55:industry-insights&amp;amp;Itemid=245"&gt;written about how all unstructured data is not the same previously&lt;/a&gt;. Most "unstructured" in the context of Big Data is "data" — variable length log files, etc. Not truly unstructured &lt;strong&gt;CONTENT&lt;/strong&gt;, which includes articles, web pages, documents, email, etc. It is important to understand the difference, especially from the entrepreneur's perspective. Much of the "volume" in the Big Data that is unstructured &lt;strong&gt;DATA&lt;/strong&gt; — again, log files, mostly — has very low individual value. It is only when analyzed in volume that it becomes interesting and valuable.&lt;/p&gt;
&lt;p&gt;One of the more interesting questions at the end of the panel came from Wikibon's Dave Vellante. He asked, more or less, "Why are big data and the new technologies that are emerging to analyze it going to be disruptive to the enterprise data warehouse?"&lt;/p&gt;
&lt;p&gt;Here, the panelists' answers seemed uncertain. Several spoke about the challenge of getting centralized IT to produce new information from legacy BI tools. While this is probably impossible to argue with, at least one panelist went too far, saying that data will just be kept in new systems and BI tools will work directly against them. I didn't buy this angle, and in a follow-up conversation with Dave, he agreed with me that the panel mostly missed the mark.&lt;/p&gt;
&lt;p&gt;I would have answered Dave's question like this: the key with Big Data is to take the volume of low value items and turn it into high-value analysis. That analysis then needs to be co-mingled with other information that has high item value. This includes email, documents, text in applications, rows in databases, ERP, CRM etc. That isn't disruptive to the eDW in and of itself. Most big data — like behavioral information — won't be interesting to typical corporate decision makers. A few data scientists etc. will analyze click streams and use it to optimize the end user experience. The transactions (hopefully sales) that result from that will go into the eDW. The yield improvements will also be analyzed and tracked over time — again, probably by traditional BI tools.&lt;/p&gt;
&lt;p&gt;The best example I can give of a real world case is from &lt;a target="_blank" href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=905:it-knowledge-expert&amp;amp;catid=91:by-business-need&amp;amp;Itemid=191"&gt;Attivio's IT Knowledge Expert solution&lt;/a&gt;. ITKE analyzes log files from operating systems and applications to identify events that are interesting and/or problematic. For example, we may drop or summarize informative messages, keep warnings, and correlate errors. This is great because it helps system administrators quickly discover the symptoms of an issue. However, it is the other data — the high value articles, knowledge bases, SharePoint articles created by previous admins, etc., in which the solution to the problem is found. This is why I refer to content as high-value. It explains WHY things happen.&lt;/p&gt;
&lt;p&gt;Related:&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=875-missing-some-key-points-with-big-data&amp;amp;catid=55-industry-insights&amp;amp;Itemid=245"&gt;Missing Some Key Points with Big Data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?Itemid=186"&gt;Extreme Information: Completing the Big Data Picture&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?Itemid=191"&gt;Attivio IT Knowledge Expert&lt;/a&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;I recently attended the YES Boston panel on Big Data &amp;amp; Analytics at the Harvard Innovation Center. Overall it was an excellent discussion. At least one panelist indicated that "It feels like 1995 again", referring to that heady period when the Internet emerged and drove the dot-com era forward. Most of the focus was on the superb opportunities that Big Data creates for entrepreneurs. A few panelists also suggested that Big Data would lead to the death of the traditional relational database and data warehouse.&lt;/p&gt;
&lt;p&gt;Earlier in the discussion, one panelist characterized Big Data as having "three V's — Volume, Velocity, Variety" etc. This has become the standard way to segment the various use cases that collectively add up to "Big Data", as well as a number of other often cited characteristics like value and complexity. However, another panelist then said that Big Data was mostly about unstructured information. I have &lt;a target="_blank" href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=875:missing-some-key-points-with-big-data&amp;amp;catid=55:industry-insights&amp;amp;Itemid=245"&gt;written about how all unstructured data is not the same previously&lt;/a&gt;. Most "unstructured" in the context of Big Data is "data" — variable length log files, etc. Not truly unstructured &lt;strong&gt;CONTENT&lt;/strong&gt;, which includes articles, web pages, documents, email, etc. It is important to understand the difference, especially from the entrepreneur's perspective. Much of the "volume" in the Big Data that is unstructured &lt;strong&gt;DATA&lt;/strong&gt; — again, log files, mostly — has very low individual value. It is only when analyzed in volume that it becomes interesting and valuable.&lt;/p&gt;
&lt;p&gt;One of the more interesting questions at the end of the panel came from Wikibon's Dave Vellante. He asked, more or less, "Why are big data and the new technologies that are emerging to analyze it going to be disruptive to the enterprise data warehouse?"&lt;/p&gt;
&lt;p&gt;Here, the panelists' answers seemed uncertain. Several spoke about the challenge of getting centralized IT to produce new information from legacy BI tools. While this is probably impossible to argue with, at least one panelist went too far, saying that data will just be kept in new systems and BI tools will work directly against them. I didn't buy this angle, and in a follow-up conversation with Dave, he agreed with me that the panel mostly missed the mark.&lt;/p&gt;
&lt;p&gt;I would have answered Dave's question like this: the key with Big Data is to take the volume of low value items and turn it into high-value analysis. That analysis then needs to be co-mingled with other information that has high item value. This includes email, documents, text in applications, rows in databases, ERP, CRM etc. That isn't disruptive to the eDW in and of itself. Most big data — like behavioral information — won't be interesting to typical corporate decision makers. A few data scientists etc. will analyze click streams and use it to optimize the end user experience. The transactions (hopefully sales) that result from that will go into the eDW. The yield improvements will also be analyzed and tracked over time — again, probably by traditional BI tools.&lt;/p&gt;
&lt;p&gt;The best example I can give of a real world case is from &lt;a target="_blank" href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=905:it-knowledge-expert&amp;amp;catid=91:by-business-need&amp;amp;Itemid=191"&gt;Attivio's IT Knowledge Expert solution&lt;/a&gt;. ITKE analyzes log files from operating systems and applications to identify events that are interesting and/or problematic. For example, we may drop or summarize informative messages, keep warnings, and correlate errors. This is great because it helps system administrators quickly discover the symptoms of an issue. However, it is the other data — the high value articles, knowledge bases, SharePoint articles created by previous admins, etc., in which the solution to the problem is found. This is why I refer to content as high-value. It explains WHY things happen.&lt;/p&gt;
&lt;p&gt;Related:&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=875-missing-some-key-points-with-big-data&amp;amp;catid=55-industry-insights&amp;amp;Itemid=245"&gt;Missing Some Key Points with Big Data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?Itemid=186"&gt;Extreme Information: Completing the Big Data Picture&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?Itemid=191"&gt;Attivio IT Knowledge Expert&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=2cPeDo_q5xk:S1u-tpK8XCs:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=2cPeDo_q5xk:S1u-tpK8XCs:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=2cPeDo_q5xk:S1u-tpK8XCs:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=2cPeDo_q5xk:S1u-tpK8XCs:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=2cPeDo_q5xk:S1u-tpK8XCs:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=2cPeDo_q5xk:S1u-tpK8XCs:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=2cPeDo_q5xk:S1u-tpK8XCs:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=2cPeDo_q5xk:S1u-tpK8XCs:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=2cPeDo_q5xk:S1u-tpK8XCs:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/2cPeDo_q5xk" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>RESTing Easy</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/56-java-development/1114-resting-easy.html" />
		<published>2012-04-05T12:05:15Z</published>
		<updated>2012-04-05T12:05:15Z</updated>
		<id>http://www.attivio.com/blog/56-java-development/1114-resting-easy.html</id>
		<author>
			<name>Martin Serrano</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;h4&gt;Introduction&lt;/h4&gt;
&lt;p&gt;Modularity is a hallmark of good application development practice. When attempting to rapidly implement a web-based application, basing the front end on one or more REST APIs is a common choice. A REST API is simple to use and provides for easy client-side bookmarking. Since most of our customers develop some type of custom web UI for AIE-based applications, we've made it simple to build robust, testable REST APIs.&lt;/p&gt;
&lt;p&gt;Building a REST API in AIE requires extending &lt;span style="font-family: courier new,courier;"&gt;PlatformComponent&lt;/span&gt; and overriding two methods. Testing the API involves a few lines of code and use of shipped test framework classes (the same we use to test AIE). In the example below, I create a REST API that provides the current total memory and memory in use by the hosting JVM.&lt;/p&gt;
&lt;h4&gt;A simple REST API&lt;/h4&gt;
&lt;p&gt;Creating an AIE service that provides a REST API is as simple as extending the &lt;span style="font-family: courier new,courier;"&gt;PlatformComponent&lt;/span&gt; class and overriding a couple of methods.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Override&lt;/strong&gt; &lt;span style="font-family: courier new,courier;"&gt;convertCGIRequest&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;Whenever a service that exposes an HTTP endpoint is accessed with a GET request, the request is converted by AIE into a &lt;span style="font-family: courier new,courier;"&gt;CgiRequest&lt;/span&gt; message. &lt;span style="font-family: courier new,courier;"&gt;CgiRequest&lt;/span&gt; messages contain all of the parameters and headers of the request. The &lt;span style="font-family: courier new,courier;"&gt;convertCgiRequest&lt;/span&gt; method is a hook for the developer to translate the request into a message that the REST service can handle. This can be as simple as translating the request into a &lt;span style="font-family: courier new,courier;"&gt;StringMessage&lt;/span&gt; with the relevant data, but I prefer to use simple inner classes to make requests concrete instead of opaque (this also makes testing easier). Returning null from this method indicates the CGI request is not recognized and the service should return nothing. Alternatively, you can throw an exception that will result in an error getting returned to the caller.&lt;/p&gt;
&lt;h4&gt;Sample GET request&lt;/h4&gt;
&lt;pre&gt;http://localhost:17001/memtracker?mem=true&lt;br /&gt;&lt;/pre&gt;
&lt;h4&gt;Overridden class&lt;/h4&gt;
&lt;pre&gt;/** {@inheritDoc} */
  @Override
  protected PlatformMessage convertCgiRequest(CgiRequest cgi) throws AttivioException {
    String mem = cgi.getCgiParameter("mem");
    if (mem != null) {&lt;br /&gt;      return new MemRequestMessage();  // concrete private message class&lt;br /&gt;    } else {&lt;br /&gt;      return null; // no valid request found... alternatively could throw an exception&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;&lt;/pre&gt;
&lt;h4&gt;The inner message class&lt;/h4&gt;
&lt;pre&gt;public static class MemRequestMessage extends AbstractPlatformMessage {}&lt;/pre&gt;
&lt;h4&gt;Override &lt;span style="font-family: courier new,courier;"&gt;handleMessage&lt;/span&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;strong&gt;handleMessage()&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;/** {@inheritDoc} */&lt;br /&gt;  @Override&lt;br /&gt;  protected PlatformMessage handleMessage(MessageContext context, PlatformMessage msg) throws AttivioException {&lt;br /&gt;    if (msg instanceof MemRequestMessage) {&lt;br /&gt;      return new CgiResponse() {&lt;br /&gt;        @Override&lt;br /&gt;        public void writeResponse(HttpServletResponse resp) throws IOException {&lt;br /&gt;          resp.getWriter().format("%d,%d", Runtime.getRuntime().freeMemory(), Runtime.getRuntime().maxMemory());&lt;br /&gt;        }&lt;br /&gt;      };&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;&lt;/pre&gt;
&lt;h4&gt;Testing&lt;/h4&gt;
&lt;p&gt;As has been pointed out &lt;a target="_blank" href="blog/56-java-development/785-testing-complex-behavior-with-junit-for-highly-parallel-platforms.html"&gt;here&lt;/a&gt;, &lt;a target="_blank" href="blog/52-agile-development/211-quality-is-job-1-to-3000.html"&gt;here&lt;/a&gt; and &lt;a target="_blank" href="blog/52-agile-development/1043-thinking-like-a-tester.html"&gt;here&lt;/a&gt;, at Attivio we really believe in testing. We support that philosophy by making it as easy (fast to write/fast to run) to test functionality as possible. To that end, we have developed supporting classes for testing &lt;span style="font-family: courier new,courier;"&gt;PlatformComponents&lt;/span&gt; (including services) without having to start a full AIE system. With the AIE SDK, we ship those supporting classes so that our customers can test easily as well.&lt;/p&gt;
&lt;h4&gt;Testing the Service&lt;/h4&gt;
&lt;pre&gt;&lt;strong&gt;@Test&lt;/strong&gt;
  public void cgiTest() throws AttivioException, IOException {
    MemoryApiService srv = new MemoryApiService();
    TransformerTestUtils.startTransformer(srv); // mock-up system stuff and start the service
    &lt;br /&gt;    CgiRequest req = new CgiRequest(null);&lt;br /&gt;    req.setCgiParameters("mem=", IOUtils.DEFAULT_ENCODING); // provide the full cgi query string we are testing, use UTF-8 encoding&lt;br /&gt;    CgiResponse respMessage = (CgiResponse) srv.onCall(null, req); // execute the request&lt;br /&gt;    MockHttpServletResponse mockServletResponse = new MockHttpServletResponse();&lt;br /&gt;    respMessage.writeResponse(mockServletResponse);
&lt;br /&gt;    String[] results = mockServletResponse.getWrittenResponse().split(",");&lt;br /&gt;    Assert.assertTrue(Long.parseLong(results[0]) &amp;lt; Long.parseLong(results[1]));&lt;br /&gt;    Assert.assertTrue(Long.parseLong(results[0]) &amp;gt; 0);&lt;br /&gt; }&lt;br /&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;span style="font-family: courier new,courier;"&gt;MockHttpServletResponse&lt;/span&gt; that is available in AIE 3.1 is a very simple class. Below is a partial implementation for your use if needed.&lt;/p&gt;
&lt;pre&gt;/**
   * A mocked version of a HttpServletResponse
   */
  public class MockHttpServletResponse implements HttpServletResponse {
    private final ByteArrayOutputStream bos = new ByteArrayOutputStream();&lt;br /&gt;    private final PrintWriter writer = new PrintWriter(bos, true);
    /**&lt;br /&gt;     * @return the response written so far&lt;br /&gt;     */&lt;br /&gt;    public String getWrittenResponse() {&lt;br /&gt;      return bos.toString();&lt;br /&gt;    }
    /** {@inheritDoc} */&lt;br /&gt;    @Override&lt;br /&gt;    public PrintWriter getWriter() throws IOException {&lt;br /&gt;      return writer;&lt;br /&gt;    }
   ////  rest of default (unchanged from Eclipse auto-generation) implementation omitted for brevity.&lt;br /&gt;&lt;/pre&gt;</summary>
		<content type="html">&lt;h4&gt;Introduction&lt;/h4&gt;
&lt;p&gt;Modularity is a hallmark of good application development practice. When attempting to rapidly implement a web-based application, basing the front end on one or more REST APIs is a common choice. A REST API is simple to use and provides for easy client-side bookmarking. Since most of our customers develop some type of custom web UI for AIE-based applications, we've made it simple to build robust, testable REST APIs.&lt;/p&gt;
&lt;p&gt;Building a REST API in AIE requires extending &lt;span style="font-family: courier new,courier;"&gt;PlatformComponent&lt;/span&gt; and overriding two methods. Testing the API involves a few lines of code and use of shipped test framework classes (the same we use to test AIE). In the example below, I create a REST API that provides the current total memory and memory in use by the hosting JVM.&lt;/p&gt;
&lt;h4&gt;A simple REST API&lt;/h4&gt;
&lt;p&gt;Creating an AIE service that provides a REST API is as simple as extending the &lt;span style="font-family: courier new,courier;"&gt;PlatformComponent&lt;/span&gt; class and overriding a couple of methods.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Override&lt;/strong&gt; &lt;span style="font-family: courier new,courier;"&gt;convertCGIRequest&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;Whenever a service that exposes an HTTP endpoint is accessed with a GET request, the request is converted by AIE into a &lt;span style="font-family: courier new,courier;"&gt;CgiRequest&lt;/span&gt; message. &lt;span style="font-family: courier new,courier;"&gt;CgiRequest&lt;/span&gt; messages contain all of the parameters and headers of the request. The &lt;span style="font-family: courier new,courier;"&gt;convertCgiRequest&lt;/span&gt; method is a hook for the developer to translate the request into a message that the REST service can handle. This can be as simple as translating the request into a &lt;span style="font-family: courier new,courier;"&gt;StringMessage&lt;/span&gt; with the relevant data, but I prefer to use simple inner classes to make requests concrete instead of opaque (this also makes testing easier). Returning null from this method indicates the CGI request is not recognized and the service should return nothing. Alternatively, you can throw an exception that will result in an error getting returned to the caller.&lt;/p&gt;
&lt;h4&gt;Sample GET request&lt;/h4&gt;
&lt;pre&gt;http://localhost:17001/memtracker?mem=true&lt;br /&gt;&lt;/pre&gt;
&lt;h4&gt;Overridden class&lt;/h4&gt;
&lt;pre&gt;/** {@inheritDoc} */
  @Override
  protected PlatformMessage convertCgiRequest(CgiRequest cgi) throws AttivioException {
    String mem = cgi.getCgiParameter("mem");
    if (mem != null) {&lt;br /&gt;      return new MemRequestMessage();  // concrete private message class&lt;br /&gt;    } else {&lt;br /&gt;      return null; // no valid request found... alternatively could throw an exception&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;&lt;/pre&gt;
&lt;h4&gt;The inner message class&lt;/h4&gt;
&lt;pre&gt;public static class MemRequestMessage extends AbstractPlatformMessage {}&lt;/pre&gt;
&lt;h4&gt;Override &lt;span style="font-family: courier new,courier;"&gt;handleMessage&lt;/span&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;strong&gt;handleMessage()&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;/** {@inheritDoc} */&lt;br /&gt;  @Override&lt;br /&gt;  protected PlatformMessage handleMessage(MessageContext context, PlatformMessage msg) throws AttivioException {&lt;br /&gt;    if (msg instanceof MemRequestMessage) {&lt;br /&gt;      return new CgiResponse() {&lt;br /&gt;        @Override&lt;br /&gt;        public void writeResponse(HttpServletResponse resp) throws IOException {&lt;br /&gt;          resp.getWriter().format("%d,%d", Runtime.getRuntime().freeMemory(), Runtime.getRuntime().maxMemory());&lt;br /&gt;        }&lt;br /&gt;      };&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;&lt;/pre&gt;
&lt;h4&gt;Testing&lt;/h4&gt;
&lt;p&gt;As has been pointed out &lt;a target="_blank" href="blog/56-java-development/785-testing-complex-behavior-with-junit-for-highly-parallel-platforms.html"&gt;here&lt;/a&gt;, &lt;a target="_blank" href="blog/52-agile-development/211-quality-is-job-1-to-3000.html"&gt;here&lt;/a&gt; and &lt;a target="_blank" href="blog/52-agile-development/1043-thinking-like-a-tester.html"&gt;here&lt;/a&gt;, at Attivio we really believe in testing. We support that philosophy by making it as easy (fast to write/fast to run) to test functionality as possible. To that end, we have developed supporting classes for testing &lt;span style="font-family: courier new,courier;"&gt;PlatformComponents&lt;/span&gt; (including services) without having to start a full AIE system. With the AIE SDK, we ship those supporting classes so that our customers can test easily as well.&lt;/p&gt;
&lt;h4&gt;Testing the Service&lt;/h4&gt;
&lt;pre&gt;&lt;strong&gt;@Test&lt;/strong&gt;
  public void cgiTest() throws AttivioException, IOException {
    MemoryApiService srv = new MemoryApiService();
    TransformerTestUtils.startTransformer(srv); // mock-up system stuff and start the service
    &lt;br /&gt;    CgiRequest req = new CgiRequest(null);&lt;br /&gt;    req.setCgiParameters("mem=", IOUtils.DEFAULT_ENCODING); // provide the full cgi query string we are testing, use UTF-8 encoding&lt;br /&gt;    CgiResponse respMessage = (CgiResponse) srv.onCall(null, req); // execute the request&lt;br /&gt;    MockHttpServletResponse mockServletResponse = new MockHttpServletResponse();&lt;br /&gt;    respMessage.writeResponse(mockServletResponse);
&lt;br /&gt;    String[] results = mockServletResponse.getWrittenResponse().split(",");&lt;br /&gt;    Assert.assertTrue(Long.parseLong(results[0]) &amp;lt; Long.parseLong(results[1]));&lt;br /&gt;    Assert.assertTrue(Long.parseLong(results[0]) &amp;gt; 0);&lt;br /&gt; }&lt;br /&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;span style="font-family: courier new,courier;"&gt;MockHttpServletResponse&lt;/span&gt; that is available in AIE 3.1 is a very simple class. Below is a partial implementation for your use if needed.&lt;/p&gt;
&lt;pre&gt;/**
   * A mocked version of a HttpServletResponse
   */
  public class MockHttpServletResponse implements HttpServletResponse {
    private final ByteArrayOutputStream bos = new ByteArrayOutputStream();&lt;br /&gt;    private final PrintWriter writer = new PrintWriter(bos, true);
    /**&lt;br /&gt;     * @return the response written so far&lt;br /&gt;     */&lt;br /&gt;    public String getWrittenResponse() {&lt;br /&gt;      return bos.toString();&lt;br /&gt;    }
    /** {@inheritDoc} */&lt;br /&gt;    @Override&lt;br /&gt;    public PrintWriter getWriter() throws IOException {&lt;br /&gt;      return writer;&lt;br /&gt;    }
   ////  rest of default (unchanged from Eclipse auto-generation) implementation omitted for brevity.&lt;br /&gt;&lt;/pre&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Q6T1ud2r95o:o1ww6RN9wRg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Q6T1ud2r95o:o1ww6RN9wRg:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=Q6T1ud2r95o:o1ww6RN9wRg:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Q6T1ud2r95o:o1ww6RN9wRg:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Q6T1ud2r95o:o1ww6RN9wRg:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=Q6T1ud2r95o:o1ww6RN9wRg:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Q6T1ud2r95o:o1ww6RN9wRg:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=Q6T1ud2r95o:o1ww6RN9wRg:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Q6T1ud2r95o:o1ww6RN9wRg:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/Q6T1ud2r95o" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>“Something is not Right!” – Don’t Ignore Your Gut When Analyzing Information</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/55-industry-insights/1109-something-is-not-right-dont-ignore-your-gut-when-analyzing-information.html" />
		<published>2012-03-27T12:32:51Z</published>
		<updated>2012-03-27T12:32:51Z</updated>
		<id>http://www.attivio.com/blog/55-industry-insights/1109-something-is-not-right-dont-ignore-your-gut-when-analyzing-information.html</id>
		<author>
			<name>Mike Urbonas, Director of Product Marketing</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;&lt;img style="float: right; margin: 5px;" title="Madeline_Cover.png" alt="Madeline_Cover.png" src="images/at_images/general/blog/Madeline_Cover.png" height="135" width="100" /&gt;Ludwig Bemelmans' classic picture book &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Madeline"&gt;Madeline&lt;/a&gt; has been enjoyed by generations of children — my own daughters included — who are perennially attracted to the story of Madeline, the smallest yet most adventurous of twelve little girls in a Paris boarding school.&lt;/p&gt;
&lt;p&gt;The story also has an important lesson on analyzing — and questioning — information.&lt;/p&gt;
&lt;p&gt;In Madeline, Miss Clavel, the girls' teacher and caregiver, suddenly awoke one night sensing trouble:&lt;/p&gt;
&lt;p style="padding-left: 30px;"&gt;&lt;em&gt;In the middle of the night&lt;br /&gt;Miss Clavel turned on her light&lt;br /&gt;and said, "Something is not right!"&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Sure enough, she found Madeline in her bed, in pain from appendicitis. Of course, all turns out well, thanks to Miss Clavel listening to her personal sense that something was not right.&lt;/p&gt;
&lt;p&gt;That scene from Madeline somehow came to mind while recently reading &lt;em&gt;Know What You Don't Know: How Great Leaders Prevent Problems Before They Happen&lt;/em&gt; (2009) by best-selling author and business professor &lt;a target="_blank" href="https://plus.google.com/100945254816047625845/about"&gt;Michael Roberto&lt;/a&gt;. One of the most troubling causes of unseen problems mushrooming into catastrophes, Roberto writes, is an organizational culture that dismisses intuition in favor of hard data:&lt;/p&gt;
&lt;p style="padding-left: 30px; padding-right: 30px;"&gt;&lt;em&gt;Some organizations exhibit an intensely analytical culture. They apply quantitative analysis and structured frameworks to solve problems and make decisions. "Data rule the day; without a wealth of statistics and information, one does not persuade others to adopt his or her proposals".&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img style="float: right; margin: 5px;" title="Know_What_You_Dont_Know_Cover.png" alt="Know_What_You_Dont_Know_Cover.png" src="images/at_images/general/blog/Know_What_You_Dont_Know_Cover.png" height="147" width="100" /&gt;Roberto convincingly drives this point home with a serious real-world medical problem: many hospitals experience high levels of cardiac arrest among admitted patients. According to one study, hospital personnel who did observe some advance warning sign(s) of cardiac arrest alerted a doctor only 25% of the time. Why? Nurses and other staff members might have noticed a change in patient monitoring data, but when observed in isolation, the data did not clearly indicate an urgent problem. Other warning signs are not based on quantifiable data, such as the patient's mental condition and level of fatigue or discomfort.&lt;/p&gt;
&lt;p&gt;Often nurses and other staff felt a Miss Clavel-like sense that "Something is not right" with a patient who was indeed approaching cardiac arrest; but, with nominal hard data, if any, to support their concern, they did not feel comfortable alerting a doctor. The consequences of hospital cultures that unwittingly compel caregivers to ignore their intuition are high: once the window of opportunity to avert cardiac arrest closes, a "Code Blue" crisis is at hand — with a survival rate of less than 15%.&lt;/p&gt;
&lt;p&gt;Many hospitals nationwide have since implemented a highly successful program to sharply reduce Code Blue incidents.  Nurses and staff are actively encouraged to report observed warning signs, as well as concerns not yet supported by observed data, to a new Rapid Response Team. The team will arrive at an affected patient's bedside within minutes and actively diagnose whether further testing or treatment to prevent a cardiac arrest is warranted. Unlike a Code Blue team that "fights the fire" of a full-on heart attack, Roberto writes, a Rapid Response Team "detects the smoke" of a potential heart attack.&lt;/p&gt;
&lt;p&gt;Traditional data warehousing and data analytics vendors  often present their solutions as a way to make decisions ‘based on objective facts' rather than relying on ‘emotional gut feel.' The problem is, however, the known ‘objective facts' may not provide a complete — or even accurate — picture of what's really going on.&lt;/p&gt;
&lt;p&gt;As Roberto's hospital case study illustrates, a gnawing sense that "Something is not right" should be interpreted as an alert that you probably do not have "all the facts," but rather just some facts. That is, you don't know what you don't know. On this key point, Roberto writes:&lt;/p&gt;
&lt;p style="padding-left: 30px; padding-right: 30px;"&gt;&lt;em&gt;In highly analytical cultures, my research suggests that employees also may self-censor their concerns...In one case, a manager told me, "I was trained to rely on data [which] pointed in the opposite direction of my [correct] hunch that we had a problem. I relied on the data and ignored that nagging feeling in my gut."&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So, listen to your gut, your intuition, as a signal that you need to dig deeper into the matter at hand. Actively seek out further information beyond the hard data available to you. Compare that information with your hard data and "connect the dots" for a far more complete picture, which may well yield surprising new insights.&lt;/p&gt;
&lt;p&gt;What I find very exciting is that &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1037:insight-that-matters&amp;amp;catid=96:showcase&amp;amp;Itemid=247"&gt;unified information access (UIA)&lt;/a&gt; is playing a vital role in empowering managers and leaders to connect those dots between data and other silos of information to realize those critical new insights.&lt;/p&gt;
&lt;p&gt;UIA integrates, joins and presents all related information — &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1041:real-time-real-savings&amp;amp;catid=96:showcase&amp;amp;Itemid=251"&gt;structured data and unstructured content&lt;/a&gt; to complete the informational picture and significantly expand what organizations "know" to determine with confidence whether "Something is not right."&lt;/p&gt;
&lt;p&gt;Analyzing just a single data type or source of information would not only fail to detect those new insights, but — even worse —  such a limited, incomplete analysis may well point leaders in the wrong direction.&lt;/p&gt;
&lt;p&gt;Stay tuned here on the Attivio blog for more insights on how UIA can help you get that complete informational picture, and, as Roberto writes, how to go beyond being a "problem solver" to becoming a "problem finder".&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;&lt;img style="float: right; margin: 5px;" title="Madeline_Cover.png" alt="Madeline_Cover.png" src="images/at_images/general/blog/Madeline_Cover.png" height="135" width="100" /&gt;Ludwig Bemelmans' classic picture book &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Madeline"&gt;Madeline&lt;/a&gt; has been enjoyed by generations of children — my own daughters included — who are perennially attracted to the story of Madeline, the smallest yet most adventurous of twelve little girls in a Paris boarding school.&lt;/p&gt;
&lt;p&gt;The story also has an important lesson on analyzing — and questioning — information.&lt;/p&gt;
&lt;p&gt;In Madeline, Miss Clavel, the girls' teacher and caregiver, suddenly awoke one night sensing trouble:&lt;/p&gt;
&lt;p style="padding-left: 30px;"&gt;&lt;em&gt;In the middle of the night&lt;br /&gt;Miss Clavel turned on her light&lt;br /&gt;and said, "Something is not right!"&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Sure enough, she found Madeline in her bed, in pain from appendicitis. Of course, all turns out well, thanks to Miss Clavel listening to her personal sense that something was not right.&lt;/p&gt;
&lt;p&gt;That scene from Madeline somehow came to mind while recently reading &lt;em&gt;Know What You Don't Know: How Great Leaders Prevent Problems Before They Happen&lt;/em&gt; (2009) by best-selling author and business professor &lt;a target="_blank" href="https://plus.google.com/100945254816047625845/about"&gt;Michael Roberto&lt;/a&gt;. One of the most troubling causes of unseen problems mushrooming into catastrophes, Roberto writes, is an organizational culture that dismisses intuition in favor of hard data:&lt;/p&gt;
&lt;p style="padding-left: 30px; padding-right: 30px;"&gt;&lt;em&gt;Some organizations exhibit an intensely analytical culture. They apply quantitative analysis and structured frameworks to solve problems and make decisions. "Data rule the day; without a wealth of statistics and information, one does not persuade others to adopt his or her proposals".&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img style="float: right; margin: 5px;" title="Know_What_You_Dont_Know_Cover.png" alt="Know_What_You_Dont_Know_Cover.png" src="images/at_images/general/blog/Know_What_You_Dont_Know_Cover.png" height="147" width="100" /&gt;Roberto convincingly drives this point home with a serious real-world medical problem: many hospitals experience high levels of cardiac arrest among admitted patients. According to one study, hospital personnel who did observe some advance warning sign(s) of cardiac arrest alerted a doctor only 25% of the time. Why? Nurses and other staff members might have noticed a change in patient monitoring data, but when observed in isolation, the data did not clearly indicate an urgent problem. Other warning signs are not based on quantifiable data, such as the patient's mental condition and level of fatigue or discomfort.&lt;/p&gt;
&lt;p&gt;Often nurses and other staff felt a Miss Clavel-like sense that "Something is not right" with a patient who was indeed approaching cardiac arrest; but, with nominal hard data, if any, to support their concern, they did not feel comfortable alerting a doctor. The consequences of hospital cultures that unwittingly compel caregivers to ignore their intuition are high: once the window of opportunity to avert cardiac arrest closes, a "Code Blue" crisis is at hand — with a survival rate of less than 15%.&lt;/p&gt;
&lt;p&gt;Many hospitals nationwide have since implemented a highly successful program to sharply reduce Code Blue incidents.  Nurses and staff are actively encouraged to report observed warning signs, as well as concerns not yet supported by observed data, to a new Rapid Response Team. The team will arrive at an affected patient's bedside within minutes and actively diagnose whether further testing or treatment to prevent a cardiac arrest is warranted. Unlike a Code Blue team that "fights the fire" of a full-on heart attack, Roberto writes, a Rapid Response Team "detects the smoke" of a potential heart attack.&lt;/p&gt;
&lt;p&gt;Traditional data warehousing and data analytics vendors  often present their solutions as a way to make decisions ‘based on objective facts' rather than relying on ‘emotional gut feel.' The problem is, however, the known ‘objective facts' may not provide a complete — or even accurate — picture of what's really going on.&lt;/p&gt;
&lt;p&gt;As Roberto's hospital case study illustrates, a gnawing sense that "Something is not right" should be interpreted as an alert that you probably do not have "all the facts," but rather just some facts. That is, you don't know what you don't know. On this key point, Roberto writes:&lt;/p&gt;
&lt;p style="padding-left: 30px; padding-right: 30px;"&gt;&lt;em&gt;In highly analytical cultures, my research suggests that employees also may self-censor their concerns...In one case, a manager told me, "I was trained to rely on data [which] pointed in the opposite direction of my [correct] hunch that we had a problem. I relied on the data and ignored that nagging feeling in my gut."&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So, listen to your gut, your intuition, as a signal that you need to dig deeper into the matter at hand. Actively seek out further information beyond the hard data available to you. Compare that information with your hard data and "connect the dots" for a far more complete picture, which may well yield surprising new insights.&lt;/p&gt;
&lt;p&gt;What I find very exciting is that &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1037:insight-that-matters&amp;amp;catid=96:showcase&amp;amp;Itemid=247"&gt;unified information access (UIA)&lt;/a&gt; is playing a vital role in empowering managers and leaders to connect those dots between data and other silos of information to realize those critical new insights.&lt;/p&gt;
&lt;p&gt;UIA integrates, joins and presents all related information — &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1041:real-time-real-savings&amp;amp;catid=96:showcase&amp;amp;Itemid=251"&gt;structured data and unstructured content&lt;/a&gt; to complete the informational picture and significantly expand what organizations "know" to determine with confidence whether "Something is not right."&lt;/p&gt;
&lt;p&gt;Analyzing just a single data type or source of information would not only fail to detect those new insights, but — even worse —  such a limited, incomplete analysis may well point leaders in the wrong direction.&lt;/p&gt;
&lt;p&gt;Stay tuned here on the Attivio blog for more insights on how UIA can help you get that complete informational picture, and, as Roberto writes, how to go beyond being a "problem solver" to becoming a "problem finder".&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=lQMOKygztGY:VGT-HzQngMo:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=lQMOKygztGY:VGT-HzQngMo:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=lQMOKygztGY:VGT-HzQngMo:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=lQMOKygztGY:VGT-HzQngMo:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=lQMOKygztGY:VGT-HzQngMo:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=lQMOKygztGY:VGT-HzQngMo:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=lQMOKygztGY:VGT-HzQngMo:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=lQMOKygztGY:VGT-HzQngMo:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=lQMOKygztGY:VGT-HzQngMo:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/lQMOKygztGY" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>Apple almost bigger than all retailers combined</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/55-industry-insights/1105-apple-almost-bigger-than-all-retailers-combined.html" />
		<published>2012-03-19T17:18:21Z</published>
		<updated>2012-03-19T17:18:21Z</updated>
		<id>http://www.attivio.com/blog/55-industry-insights/1105-apple-almost-bigger-than-all-retailers-combined.html</id>
		<author>
			<name>Sid Probstein</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;&lt;a target="_blank" href="http://www.cbsnews.com/8301-505124_162-57396966/apple-is-almost-bigger-than-all-retailers-combined/"&gt;As reported by CBS News MoneyWatch recently&lt;/a&gt;, Apple's market cap now hovers over $560 billion. The rest of the entire US retail industry is valued just a bit higher, as the chart shows.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img style="margin: 5px;" title="Credit: Bloomberg" alt="Apple-retail-semiconductors.png" src="images/at_images/general/blog/Apple-retail-semiconductors.png" width="600" height="330" /&gt;&lt;/p&gt;
&lt;p style="text-align: center;"&gt;(Credit: Bloomberg)&lt;/p&gt;
&lt;p&gt;One thing the article makes quite clear is that Apple's platform really is that: products that are platforms, integrating numerous experiences into a unique experience — be that digital (iOS, iTunes) or physical (iPad, iPhone, Macs, Apple Stores). Look at the floor space of the Apple store. You know they all look the same — white and wood, glass stairs, colorfully shirted employees with iPod touches in hand... it is convergence: putting the entire thing together in one place, with consistency.&lt;/p&gt;
&lt;p&gt;Convergence is the ultimate competitive advantage.&lt;/p&gt;
&lt;p&gt;One way companies can pursue convergence is to start thinking of their information as a strategic asset. Converge it, and you will reap a multitude of benefits.  More about convergence...&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=988-convergence-a-winning-business-strategy-now-ready-for-enterprise-information&amp;amp;catid=55-industry-insights&amp;amp;Itemid=245"&gt;Convergence: A Winning Business Strategy, Now Ready for Enterprise Information&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=989-convergence-a-winning-business-strategy-now-ready-for-enterprise-information-part-two&amp;amp;catid=55-industry-insights&amp;amp;Itemid=245"&gt;Convergence: A Winning Business Strategy, Now Ready for Enterprise Information - Part Two&lt;/a&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;&lt;a target="_blank" href="http://www.cbsnews.com/8301-505124_162-57396966/apple-is-almost-bigger-than-all-retailers-combined/"&gt;As reported by CBS News MoneyWatch recently&lt;/a&gt;, Apple's market cap now hovers over $560 billion. The rest of the entire US retail industry is valued just a bit higher, as the chart shows.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img style="margin: 5px;" title="Credit: Bloomberg" alt="Apple-retail-semiconductors.png" src="images/at_images/general/blog/Apple-retail-semiconductors.png" width="600" height="330" /&gt;&lt;/p&gt;
&lt;p style="text-align: center;"&gt;(Credit: Bloomberg)&lt;/p&gt;
&lt;p&gt;One thing the article makes quite clear is that Apple's platform really is that: products that are platforms, integrating numerous experiences into a unique experience — be that digital (iOS, iTunes) or physical (iPad, iPhone, Macs, Apple Stores). Look at the floor space of the Apple store. You know they all look the same — white and wood, glass stairs, colorfully shirted employees with iPod touches in hand... it is convergence: putting the entire thing together in one place, with consistency.&lt;/p&gt;
&lt;p&gt;Convergence is the ultimate competitive advantage.&lt;/p&gt;
&lt;p&gt;One way companies can pursue convergence is to start thinking of their information as a strategic asset. Converge it, and you will reap a multitude of benefits.  More about convergence...&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=988-convergence-a-winning-business-strategy-now-ready-for-enterprise-information&amp;amp;catid=55-industry-insights&amp;amp;Itemid=245"&gt;Convergence: A Winning Business Strategy, Now Ready for Enterprise Information&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=989-convergence-a-winning-business-strategy-now-ready-for-enterprise-information-part-two&amp;amp;catid=55-industry-insights&amp;amp;Itemid=245"&gt;Convergence: A Winning Business Strategy, Now Ready for Enterprise Information - Part Two&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=HlyGZpA-pJE:mlcgRqODiRk:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=HlyGZpA-pJE:mlcgRqODiRk:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=HlyGZpA-pJE:mlcgRqODiRk:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=HlyGZpA-pJE:mlcgRqODiRk:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=HlyGZpA-pJE:mlcgRqODiRk:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=HlyGZpA-pJE:mlcgRqODiRk:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=HlyGZpA-pJE:mlcgRqODiRk:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=HlyGZpA-pJE:mlcgRqODiRk:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=HlyGZpA-pJE:mlcgRqODiRk:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/HlyGZpA-pJE" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>Active Security in AIE</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/53-attivio/1093-active-security-in-aie.html" />
		<published>2012-02-13T19:37:57Z</published>
		<updated>2012-02-13T19:37:57Z</updated>
		<id>http://www.attivio.com/blog/53-attivio/1093-active-security-in-aie.html</id>
		<author>
			<name>Will Johnson</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;&lt;img style="float: right; margin: 5px;" alt="attivio security" src="images/at_images/general/blog/security.jpg" height="250" width="250" /&gt;Information retrieval systems have classically been interested in simply retrieving information based on a user's query. Typically these applications were going after data that was either public, such as job listings, or semi-private, such as research papers that were viewable by anyone with an account. Most enterprise data does not fit into either of these categories, though.&lt;/p&gt;
&lt;p&gt;Content like last year's performance reviews, employee salary information or M&amp;amp;A research is usually only accessible by a select group(s) of people within an organization. Any system that provides access to this information needs to respect those same permission structures in order to be viable inside a company.&lt;/p&gt;
&lt;p&gt;There were two common patterns that legacy search applications used in order to deal with these issues:&lt;/p&gt;
&lt;h2&gt;Late-bound security&lt;/h2&gt;
&lt;p&gt;In a late-bound security model, each search result is checked at run time to determine if the user is allowed to view the information. This has the nice effect that any changes are propagated immediately to the search experience, but it also has some significant drawbacks. First off, the back end has to be able to support a high volume of requests for ‘can user X see document Y?' If you have a highly secure system, a single page of 10 results might need to filter through hundreds or thousands of results in order to return the ten specific items the user is allowed to see.&lt;/p&gt;
&lt;p&gt;Second, it's impossible to give users accurate summarization information such as facet counts or even total number of search results. Doing either of these would require scanning the full corpus, which might work for a few thousand documents, but would never scale to a few million documents.&lt;/p&gt;
&lt;p&gt;At Attivio, we like to call this the easy way out. As long as you can write a &lt;em&gt;for loop&lt;/em&gt; and build a web service, rest API or intelligent pidgeon to say yes/no to the "can user X see document Y," you have ‘security'. The code looks like this:&lt;/p&gt;
&lt;pre&gt;int resultsToShow = 10;&lt;br /&gt;List&amp;lt;Result&amp;gt; resultsToDisplay = new ArrayList&amp;lt;Result&amp;gt;();&lt;br /&gt;for (Result result : mySearchResults) {&lt;br /&gt;  if (securityService.canRead(result, searchUser) {&lt;br /&gt;    resultsToDisplay.add(result);&lt;br /&gt;    if (resultsToDisplay.size() &amp;gt;= resultsToShow) {&lt;br /&gt;      break;&lt;br /&gt;    }&lt;br /&gt;  } &lt;br /&gt;}&lt;br /&gt;// print out the results&lt;br /&gt;&lt;/pre&gt;
&lt;h2&gt;Early-bound security&lt;/h2&gt;
&lt;p&gt;In early-bound security, user, group and permission information is stored along with the documents in the index. Essentially, you index the list of users and groups (and parent groups) who are allowed to see each document in a field of that document. Then at query time, you append a user's id and their groups (and parent groups) to the query in an &lt;em&gt;AND clause&lt;/em&gt;. This has the nice benefit of allowing for fast queries and exact counts for faceting and other aggregations. The downside is that updating security information can be a huge pain. If a user joins a group it's usually not a big deal, but if a group joins another group due to company reorg or a new parent group is created, you end up having to update large portions of the index.&lt;/p&gt;
&lt;p&gt;Doing this scheme well is complicated, but it's been the prevailing pattern for implementing security for most search applications for the last 10 years or so.&lt;/p&gt;
&lt;p&gt;Using this design a document might look like this:&lt;/p&gt;
&lt;div style="padding-left: 25px;"&gt;
&lt;table cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;url&lt;/td&gt;
&lt;td&gt;http://www.acme.com/someSecretProject/verySecretStuff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;blah blah blah&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;users&lt;/td&gt;
&lt;td&gt;joe, mik&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groups&lt;/td&gt;
&lt;td&gt;headHonchos, superVillians&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;p&gt;If a user named 'Tim' who is a member of the 'engineering' and 'qa' groups logged into the system and ran a query for 'secret' his query would be rewritten as follows:&lt;/p&gt;
&lt;pre&gt;AND(&lt;br /&gt;  "secret",&lt;br /&gt;  users:tim,&lt;br /&gt;  OR(&lt;br /&gt;    groups:engineering, &lt;br /&gt;    groups:qa&lt;br /&gt;  )&lt;br /&gt;)&lt;br /&gt;&lt;/pre&gt;
&lt;h2&gt;Attivio's Answer — Active Security&lt;/h2&gt;
&lt;p&gt;The problem with both of the previous methods is that they miss the key design of the source systems, namely that the security information is stored separately from the actual content and is linked by some sort of pointer. In a database that relationship might be modeled as a foreign key from the content table to the ACL table. In a file system, it's a pointer from the file's content to a record in the NTFS/ext3/xyz file system's database for permissioning information.&lt;/p&gt;
&lt;p&gt;The active security model maintains this information in a relational structure and then uses our patented &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=665:aie-join&amp;amp;catid=76:aie-features&amp;amp;Itemid=156"&gt;JOIN&lt;/a&gt; operator to link the two records at query time. We are able to index users and groups as native objects in our index, and traverse up and down the hierarchy in order to determine the full set of permissions for a given document.&lt;/p&gt;
&lt;p&gt;All that's required at search time is for the application to pass in a user's id. This scheme also allows us to perform rapid updates of the security information. If one group joins another group, all we need to do is update the one parent group's record.&lt;/p&gt;
&lt;p&gt;In this world we have the following objects in our index:&lt;/p&gt;
&lt;div style="padding-left: 25px;"&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;A Document&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;id&lt;/td&gt;
&lt;td&gt;doc1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;url&lt;/td&gt;
&lt;td&gt;http://www.acme.com/someSecretProject/verySecretStuff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;blah blah blah&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;A User&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;userid&lt;/td&gt;
&lt;td&gt;user1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;username&lt;/td&gt;
&lt;td&gt;jsmith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;realm&lt;/td&gt;
&lt;td&gt;acme&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;A Group&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groupid&lt;/td&gt;
&lt;td&gt;group1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groupname&lt;/td&gt;
&lt;td&gt;engineering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;members&lt;/td&gt;
&lt;td&gt;qa, patty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;realm&lt;/td&gt;
&lt;td&gt;acme&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;An ACL&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;docid&lt;/td&gt;
&lt;td&gt;doc1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;users&lt;/td&gt;
&lt;td&gt;user57, user88&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groups&lt;/td&gt;
&lt;td&gt;group1, group77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;p&gt;The user's query simply needs to have the principal set on it:&lt;/p&gt;
&lt;pre&gt;QueryRequest req = new QueryRequest("secret");&lt;br /&gt;req.setPrincipal(new AttivioPrincipal("acme", "jsmith");&lt;br /&gt;&lt;/pre&gt;
&lt;br /&gt;
&lt;p&gt;This scheme allows us to independently update each individual record. For example, when a group's members change, you update the group record. When permissions on a document change, you update the ACL. We're able to make these smaller updates much faster than a full document update in an early bound security model.&lt;/p&gt;
&lt;p&gt;In addition, this method lets us set up simple connectors to fetch each of the record types from Active Directory, LDAP, or a custom database schema, without having to merge all of that information in with the documents, which also speeds up ingestion time.&lt;/p&gt;
&lt;p&gt;Learn more about Active Security and AIE JOIN:&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=845-introducing-active-security&amp;amp;catid=53-attivio&amp;amp;Itemid=52"&gt;Introducing Active Security&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1019-patent-issued-to-attivio-for-unique-method-of-unifying-multiple-content-and-data-sources&amp;amp;catid=33-newsroom&amp;amp;Itemid=38"&gt;Patent Issued to Attivio for Unique Method of Unifying  Multiple Content and Data Sources&lt;/a&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;&lt;img style="float: right; margin: 5px;" alt="attivio security" src="images/at_images/general/blog/security.jpg" height="250" width="250" /&gt;Information retrieval systems have classically been interested in simply retrieving information based on a user's query. Typically these applications were going after data that was either public, such as job listings, or semi-private, such as research papers that were viewable by anyone with an account. Most enterprise data does not fit into either of these categories, though.&lt;/p&gt;
&lt;p&gt;Content like last year's performance reviews, employee salary information or M&amp;amp;A research is usually only accessible by a select group(s) of people within an organization. Any system that provides access to this information needs to respect those same permission structures in order to be viable inside a company.&lt;/p&gt;
&lt;p&gt;There were two common patterns that legacy search applications used in order to deal with these issues:&lt;/p&gt;
&lt;h2&gt;Late-bound security&lt;/h2&gt;
&lt;p&gt;In a late-bound security model, each search result is checked at run time to determine if the user is allowed to view the information. This has the nice effect that any changes are propagated immediately to the search experience, but it also has some significant drawbacks. First off, the back end has to be able to support a high volume of requests for ‘can user X see document Y?' If you have a highly secure system, a single page of 10 results might need to filter through hundreds or thousands of results in order to return the ten specific items the user is allowed to see.&lt;/p&gt;
&lt;p&gt;Second, it's impossible to give users accurate summarization information such as facet counts or even total number of search results. Doing either of these would require scanning the full corpus, which might work for a few thousand documents, but would never scale to a few million documents.&lt;/p&gt;
&lt;p&gt;At Attivio, we like to call this the easy way out. As long as you can write a &lt;em&gt;for loop&lt;/em&gt; and build a web service, rest API or intelligent pidgeon to say yes/no to the "can user X see document Y," you have ‘security'. The code looks like this:&lt;/p&gt;
&lt;pre&gt;int resultsToShow = 10;&lt;br /&gt;List&amp;lt;Result&amp;gt; resultsToDisplay = new ArrayList&amp;lt;Result&amp;gt;();&lt;br /&gt;for (Result result : mySearchResults) {&lt;br /&gt;  if (securityService.canRead(result, searchUser) {&lt;br /&gt;    resultsToDisplay.add(result);&lt;br /&gt;    if (resultsToDisplay.size() &amp;gt;= resultsToShow) {&lt;br /&gt;      break;&lt;br /&gt;    }&lt;br /&gt;  } &lt;br /&gt;}&lt;br /&gt;// print out the results&lt;br /&gt;&lt;/pre&gt;
&lt;h2&gt;Early-bound security&lt;/h2&gt;
&lt;p&gt;In early-bound security, user, group and permission information is stored along with the documents in the index. Essentially, you index the list of users and groups (and parent groups) who are allowed to see each document in a field of that document. Then at query time, you append a user's id and their groups (and parent groups) to the query in an &lt;em&gt;AND clause&lt;/em&gt;. This has the nice benefit of allowing for fast queries and exact counts for faceting and other aggregations. The downside is that updating security information can be a huge pain. If a user joins a group it's usually not a big deal, but if a group joins another group due to company reorg or a new parent group is created, you end up having to update large portions of the index.&lt;/p&gt;
&lt;p&gt;Doing this scheme well is complicated, but it's been the prevailing pattern for implementing security for most search applications for the last 10 years or so.&lt;/p&gt;
&lt;p&gt;Using this design a document might look like this:&lt;/p&gt;
&lt;div style="padding-left: 25px;"&gt;
&lt;table cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;url&lt;/td&gt;
&lt;td&gt;http://www.acme.com/someSecretProject/verySecretStuff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;blah blah blah&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;users&lt;/td&gt;
&lt;td&gt;joe, mik&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groups&lt;/td&gt;
&lt;td&gt;headHonchos, superVillians&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;p&gt;If a user named 'Tim' who is a member of the 'engineering' and 'qa' groups logged into the system and ran a query for 'secret' his query would be rewritten as follows:&lt;/p&gt;
&lt;pre&gt;AND(&lt;br /&gt;  "secret",&lt;br /&gt;  users:tim,&lt;br /&gt;  OR(&lt;br /&gt;    groups:engineering, &lt;br /&gt;    groups:qa&lt;br /&gt;  )&lt;br /&gt;)&lt;br /&gt;&lt;/pre&gt;
&lt;h2&gt;Attivio's Answer — Active Security&lt;/h2&gt;
&lt;p&gt;The problem with both of the previous methods is that they miss the key design of the source systems, namely that the security information is stored separately from the actual content and is linked by some sort of pointer. In a database that relationship might be modeled as a foreign key from the content table to the ACL table. In a file system, it's a pointer from the file's content to a record in the NTFS/ext3/xyz file system's database for permissioning information.&lt;/p&gt;
&lt;p&gt;The active security model maintains this information in a relational structure and then uses our patented &lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=665:aie-join&amp;amp;catid=76:aie-features&amp;amp;Itemid=156"&gt;JOIN&lt;/a&gt; operator to link the two records at query time. We are able to index users and groups as native objects in our index, and traverse up and down the hierarchy in order to determine the full set of permissions for a given document.&lt;/p&gt;
&lt;p&gt;All that's required at search time is for the application to pass in a user's id. This scheme also allows us to perform rapid updates of the security information. If one group joins another group, all we need to do is update the one parent group's record.&lt;/p&gt;
&lt;p&gt;In this world we have the following objects in our index:&lt;/p&gt;
&lt;div style="padding-left: 25px;"&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;A Document&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;id&lt;/td&gt;
&lt;td&gt;doc1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;url&lt;/td&gt;
&lt;td&gt;http://www.acme.com/someSecretProject/verySecretStuff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;blah blah blah&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;A User&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;userid&lt;/td&gt;
&lt;td&gt;user1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;username&lt;/td&gt;
&lt;td&gt;jsmith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;realm&lt;/td&gt;
&lt;td&gt;acme&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;A Group&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groupid&lt;/td&gt;
&lt;td&gt;group1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groupname&lt;/td&gt;
&lt;td&gt;engineering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;members&lt;/td&gt;
&lt;td&gt;qa, patty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;realm&lt;/td&gt;
&lt;td&gt;acme&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;An ACL&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table border="1" cellpadding="3"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 75px;"&gt;&lt;strong&gt;field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;value&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;docid&lt;/td&gt;
&lt;td&gt;doc1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;users&lt;/td&gt;
&lt;td&gt;user57, user88&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;groups&lt;/td&gt;
&lt;td&gt;group1, group77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;p&gt;The user's query simply needs to have the principal set on it:&lt;/p&gt;
&lt;pre&gt;QueryRequest req = new QueryRequest("secret");&lt;br /&gt;req.setPrincipal(new AttivioPrincipal("acme", "jsmith");&lt;br /&gt;&lt;/pre&gt;
&lt;br /&gt;
&lt;p&gt;This scheme allows us to independently update each individual record. For example, when a group's members change, you update the group record. When permissions on a document change, you update the ACL. We're able to make these smaller updates much faster than a full document update in an early bound security model.&lt;/p&gt;
&lt;p&gt;In addition, this method lets us set up simple connectors to fetch each of the record types from Active Directory, LDAP, or a custom database schema, without having to merge all of that information in with the documents, which also speeds up ingestion time.&lt;/p&gt;
&lt;p&gt;Learn more about Active Security and AIE JOIN:&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=845-introducing-active-security&amp;amp;catid=53-attivio&amp;amp;Itemid=52"&gt;Introducing Active Security&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=1019-patent-issued-to-attivio-for-unique-method-of-unifying-multiple-content-and-data-sources&amp;amp;catid=33-newsroom&amp;amp;Itemid=38"&gt;Patent Issued to Attivio for Unique Method of Unifying  Multiple Content and Data Sources&lt;/a&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=1gME2-jbw8E:h9AyfJYh6hg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=1gME2-jbw8E:h9AyfJYh6hg:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=1gME2-jbw8E:h9AyfJYh6hg:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=1gME2-jbw8E:h9AyfJYh6hg:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=1gME2-jbw8E:h9AyfJYh6hg:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=1gME2-jbw8E:h9AyfJYh6hg:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=1gME2-jbw8E:h9AyfJYh6hg:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=1gME2-jbw8E:h9AyfJYh6hg:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=1gME2-jbw8E:h9AyfJYh6hg:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/1gME2-jbw8E" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>Thinking Like a Tester</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/52-agile-development/1043-thinking-like-a-tester.html" />
		<published>2012-01-23T16:30:48Z</published>
		<updated>2012-01-23T16:30:48Z</updated>
		<id>http://www.attivio.com/blog/52-agile-development/1043-thinking-like-a-tester.html</id>
		<author>
			<name>John McEleney</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our &lt;a target="_blank" href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=211:quality-is-job-1-to-3000&amp;amp;catid=52:agile-development&amp;amp;Itemid=52"&gt;early blog posts stating that quality is job 1 to 3000+&lt;/a&gt;. My manager had recently transitioned into another group and his vacancy had yet to be filled. Our CTO, Sid Probstein, had recently closed Attivio's maiden blog post with a promise to expound on Attivio's deliberate approach to quality. So here it is, I thought — had I taken on an impossible task with unrealistic objectives?&lt;br /&gt; &lt;br /&gt;Fortunately, though, thinking like a tester once again resuscitated me. The logic of abductive inference compelled me to continue reading Sid's post. While I had to hold my breath until the closing statements convinced me sufficiently that my initial interpretation had been misguided, I was relieved to discover that 3000+ was actually a reference to the number of unit tests covering 81.2% of AIE V1.2. I eyed the 81.2% assertion suspiciously...Ah, things were going to be ok (we are now on AIE V3, and actually have almost 19,000 automated unit tests covering close to 85% of the code base).&lt;/p&gt;
&lt;p&gt;Sid's post highlighted the premium Attivio puts on product quality. With a tip of the hat to Lessons Learned in Software Testing (Cem Kaner, James Bach, Bret Pettichord), I'd like to introduce another dynamic and indexable facet of our approach to quality at Attivio: Exploratory Testing.&lt;/p&gt;
&lt;p&gt;AIE, Attivio's unified information access platform is incredibly flexible, configurable, and extensible. You can feed multi-language documents into sophisticated workflows via custom clients, command line connectors and in-process configured connectors. You may want your timely insight that matters pushed to an active dashboard, prefer searching/exploring your unified information store via a custom GUI, or perhaps have occasion when only a sick multi-level SQL join will suffice. Of course, it goes without saying that all of this must be secure, stable, scalable, performant, highly available and fault-tolerant.&lt;/p&gt;
&lt;p&gt;While we continue to review, expand, and augment the unit tests that are the foundation of our quality strategy, our now fully-staffed QA team has also formally embarked on an exploratory voyage. More than tourists, like C.T. Granville setting sail on a whale watch, Attivio QA is on a quest, a never-ending journey to boldly go where no test or tester has gone before. Everything is fair game: stories, requirements, design, usability and documentation. We delve into all of it, bringing our own experience, curiosity, and skepticism to bear. Most importantly, we take inspiration from our customers who are constantly finding new and interesting ways to deploy our platform, challenging us to look at things from an entirely new perspective.&lt;/p&gt;
&lt;p&gt;Recently, a friend sent me a &lt;a target="_blank" href="http://www.slate.com/id/2288402/"&gt;Slate article&lt;/a&gt; about two studies that appeared in Cognition, an international journal that publishes theoretical and experimental papers on the study of the mind, which focused on how children learn. The article's author, Alison Gopnik, a professor at UC-Berkeley, who conducted one of the studies, says the studies "provide scientific support for the intuitions many teachers have had all along: Direct instruction really can limit young children's learning. Teaching is a very effective way to get children to learn something specific...But it also makes children less likely to discover unexpected information and to draw unexpected conclusions."&lt;br /&gt; &lt;br /&gt;I think this assertion has relevance to people of any age. In the province of software testing, it's reasonable to equate 'direct instruction' with test scripts, whether manual or automated. There is significant value in them (e.g., regression, smoke, and config testing). However, they have the potential to hamstring a tester's most valuable assets: creativity, curiosity, and judgment.&lt;br /&gt; &lt;br /&gt;At Attivio, we strive to take full advantage of those assets. The effort produces benefits beyond simply identifying additional defects and regressions. It augments our culture of communication, collaboration, and continuous learning/discovery and enables fresh and deep insight. That this is commensurate with AIE's capabilities is only fitting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="font-size: x-small;"&gt;Author Bio&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;John McEleney is a senior member of the Attivio QA team and has been with Attivio for over three years. Prior to Attivio, John worked on and managed teams at BEA and Plumtree.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our &lt;a target="_blank" href="index.php?option=com_content&amp;amp;view=article&amp;amp;id=211:quality-is-job-1-to-3000&amp;amp;catid=52:agile-development&amp;amp;Itemid=52"&gt;early blog posts stating that quality is job 1 to 3000+&lt;/a&gt;. My manager had recently transitioned into another group and his vacancy had yet to be filled. Our CTO, Sid Probstein, had recently closed Attivio's maiden blog post with a promise to expound on Attivio's deliberate approach to quality. So here it is, I thought — had I taken on an impossible task with unrealistic objectives?&lt;br /&gt; &lt;br /&gt;Fortunately, though, thinking like a tester once again resuscitated me. The logic of abductive inference compelled me to continue reading Sid's post. While I had to hold my breath until the closing statements convinced me sufficiently that my initial interpretation had been misguided, I was relieved to discover that 3000+ was actually a reference to the number of unit tests covering 81.2% of AIE V1.2. I eyed the 81.2% assertion suspiciously...Ah, things were going to be ok (we are now on AIE V3, and actually have almost 19,000 automated unit tests covering close to 85% of the code base).&lt;/p&gt;
&lt;p&gt;Sid's post highlighted the premium Attivio puts on product quality. With a tip of the hat to Lessons Learned in Software Testing (Cem Kaner, James Bach, Bret Pettichord), I'd like to introduce another dynamic and indexable facet of our approach to quality at Attivio: Exploratory Testing.&lt;/p&gt;
&lt;p&gt;AIE, Attivio's unified information access platform is incredibly flexible, configurable, and extensible. You can feed multi-language documents into sophisticated workflows via custom clients, command line connectors and in-process configured connectors. You may want your timely insight that matters pushed to an active dashboard, prefer searching/exploring your unified information store via a custom GUI, or perhaps have occasion when only a sick multi-level SQL join will suffice. Of course, it goes without saying that all of this must be secure, stable, scalable, performant, highly available and fault-tolerant.&lt;/p&gt;
&lt;p&gt;While we continue to review, expand, and augment the unit tests that are the foundation of our quality strategy, our now fully-staffed QA team has also formally embarked on an exploratory voyage. More than tourists, like C.T. Granville setting sail on a whale watch, Attivio QA is on a quest, a never-ending journey to boldly go where no test or tester has gone before. Everything is fair game: stories, requirements, design, usability and documentation. We delve into all of it, bringing our own experience, curiosity, and skepticism to bear. Most importantly, we take inspiration from our customers who are constantly finding new and interesting ways to deploy our platform, challenging us to look at things from an entirely new perspective.&lt;/p&gt;
&lt;p&gt;Recently, a friend sent me a &lt;a target="_blank" href="http://www.slate.com/id/2288402/"&gt;Slate article&lt;/a&gt; about two studies that appeared in Cognition, an international journal that publishes theoretical and experimental papers on the study of the mind, which focused on how children learn. The article's author, Alison Gopnik, a professor at UC-Berkeley, who conducted one of the studies, says the studies "provide scientific support for the intuitions many teachers have had all along: Direct instruction really can limit young children's learning. Teaching is a very effective way to get children to learn something specific...But it also makes children less likely to discover unexpected information and to draw unexpected conclusions."&lt;br /&gt; &lt;br /&gt;I think this assertion has relevance to people of any age. In the province of software testing, it's reasonable to equate 'direct instruction' with test scripts, whether manual or automated. There is significant value in them (e.g., regression, smoke, and config testing). However, they have the potential to hamstring a tester's most valuable assets: creativity, curiosity, and judgment.&lt;br /&gt; &lt;br /&gt;At Attivio, we strive to take full advantage of those assets. The effort produces benefits beyond simply identifying additional defects and regressions. It augments our culture of communication, collaboration, and continuous learning/discovery and enables fresh and deep insight. That this is commensurate with AIE's capabilities is only fitting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="font-size: x-small;"&gt;Author Bio&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;John McEleney is a senior member of the Attivio QA team and has been with Attivio for over three years. Prior to Attivio, John worked on and managed teams at BEA and Plumtree.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=yJ_jUAi81to:JM6CJuk64PM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=yJ_jUAi81to:JM6CJuk64PM:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=yJ_jUAi81to:JM6CJuk64PM:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=yJ_jUAi81to:JM6CJuk64PM:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=yJ_jUAi81to:JM6CJuk64PM:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=yJ_jUAi81to:JM6CJuk64PM:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=yJ_jUAi81to:JM6CJuk64PM:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=yJ_jUAi81to:JM6CJuk64PM:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=yJ_jUAi81to:JM6CJuk64PM:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/yJ_jUAi81to" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>What AIE and unified information access mean for developers</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/57-unified-information-access/1030-what-aie-and-unified-information-access-mean-for-developers.html" />
		<published>2012-01-12T18:55:33Z</published>
		<updated>2012-01-12T18:55:33Z</updated>
		<id>http://www.attivio.com/blog/57-unified-information-access/1030-what-aie-and-unified-information-access-mean-for-developers.html</id>
		<author>
			<name>Will Johnson</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide information to make better business decisions. The Active Intelligence Engine not only provides value to business users, there are also a number of advantages for developers that make life much easier:&lt;/p&gt;
&lt;h2&gt;Index now, think later&lt;/h2&gt;
&lt;p&gt;One of the greatest advantages of AIE is the ability to facet and join on fields without having to do a lot of preprocessing or, more importantly, design work. For example, you might not know ahead of time that your customer database records are linkable to customer comments on your website, but you can easily find out with a single query after both information sources are indexed. A number of POCs and development spikes we have conducted have followed the pattern of indexing everything possible and then trying to infer relationships using queries. Unlike a database where primary and foreign keys must be setup ahead of time, the index does not require this sort of predefined and rigid schema definition. Also, many of these features are so cheap to leave on, that tuning isn't always necessary.&lt;br /&gt; &lt;br /&gt;That's not to say that there aren't advantages to doing some tuning of the schema or the ingestion workflows, but it isn't required in order to start seeing the power and ROI of using our system.&lt;/p&gt;
&lt;h2&gt;Develop locally, deploy globally&lt;/h2&gt;
&lt;p&gt;Developing for a single node, single JVM system is a straightforward process for most any platform. Some platforms also make it fairly easy to write business logic for a large distributed system. The key advantage we've found is being able to develop and test in a small localized environment, but then deploy to a large distributed system and not be surprised by system behaviors. In addition, it's important to be able to use a standard debugger when developing locally in a fully functional system, but then also to be able to use the same debugger in a distributed environment. AIE topology files provide an abstraction that separates the system functionality from the system deployment. This allows operations teams the ability to scale the system for QA, staging and production environments without having to worry about functional issues with the configuration.&lt;/p&gt;
&lt;p&gt;Some other systems like Hadoop force users into somewhat complex development models. AIE's development models strive to support the "I want to do X to Y" in the simplest possible manner. We ship a sample transformer that implements some simple business logic and more importantly, we ship a unit test for the transformer.&lt;/p&gt;
&lt;h2&gt;Learn one API, let us handle the details&lt;/h2&gt;
&lt;p&gt;One of the hardest parts of building enterprise wide applications is the need to work with multiple different APIs. In addition, each system has its own idea of what a user is, what it means to have permissions to read a document and more importantly, what a document is to begin with. If you can't define and normalize all of these concepts it's impossible to join, group, categorize and make decisions based on the data. AIE not only provides connectors to these back-end systems, we also handle normalizing each system's data to a standard format that is accessible via our API. A user in Active Directory can have permissions to a document in Documentum and a document in SharePoint. More importantly, the permissions are applied transparently at search time so that developers don't have to worry about doing any sort of post filtering of results.&lt;/p&gt;
&lt;p&gt;Attivio's development environment is meant to hide all of the enterprise ugliness from developers and present a single user, group, document, acl, and query concept. If userX can see records in 10 repositories, we handle the details. If you want to join data from your internal SharePoint server to your CRM system based on a support person's contact information, we can do that for you as well.&lt;/p&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-size: small;"&gt;&lt;strong&gt;Author Bio&lt;/strong&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span style="font-size: x-small;"&gt; &lt;/span&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-size: x-small;"&gt;Since graduating from MIT with a  degree in Computer Science, Will Johnson has worked for Altavista and  FAST for over 7 years.  At Altavista Will developed AV's real time  indexing solution used by news aggregators who demanded instantaneous  access to news as it arrived.  In addition he was one of two engineers  responsible for developing the Altavista QIndexer product that was used  by the large majority of AV's customers.  At FAST, Will developed high  speed database connectors as well as developing search UI's and tool  sets used across the organization.  In addition Will also worked on many  of the largest and most complex sales engagements and deployments for  customers around the world, specializing in distributed systems for many  of the largest internet publishers, directories as well as internal  knowledge management systems.  Will is a founder, one of the Chief  Architects at Attivio and a really nice guy.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide information to make better business decisions. The Active Intelligence Engine not only provides value to business users, there are also a number of advantages for developers that make life much easier:&lt;/p&gt;
&lt;h2&gt;Index now, think later&lt;/h2&gt;
&lt;p&gt;One of the greatest advantages of AIE is the ability to facet and join on fields without having to do a lot of preprocessing or, more importantly, design work. For example, you might not know ahead of time that your customer database records are linkable to customer comments on your website, but you can easily find out with a single query after both information sources are indexed. A number of POCs and development spikes we have conducted have followed the pattern of indexing everything possible and then trying to infer relationships using queries. Unlike a database where primary and foreign keys must be setup ahead of time, the index does not require this sort of predefined and rigid schema definition. Also, many of these features are so cheap to leave on, that tuning isn't always necessary.&lt;br /&gt; &lt;br /&gt;That's not to say that there aren't advantages to doing some tuning of the schema or the ingestion workflows, but it isn't required in order to start seeing the power and ROI of using our system.&lt;/p&gt;
&lt;h2&gt;Develop locally, deploy globally&lt;/h2&gt;
&lt;p&gt;Developing for a single node, single JVM system is a straightforward process for most any platform. Some platforms also make it fairly easy to write business logic for a large distributed system. The key advantage we've found is being able to develop and test in a small localized environment, but then deploy to a large distributed system and not be surprised by system behaviors. In addition, it's important to be able to use a standard debugger when developing locally in a fully functional system, but then also to be able to use the same debugger in a distributed environment. AIE topology files provide an abstraction that separates the system functionality from the system deployment. This allows operations teams the ability to scale the system for QA, staging and production environments without having to worry about functional issues with the configuration.&lt;/p&gt;
&lt;p&gt;Some other systems like Hadoop force users into somewhat complex development models. AIE's development models strive to support the "I want to do X to Y" in the simplest possible manner. We ship a sample transformer that implements some simple business logic and more importantly, we ship a unit test for the transformer.&lt;/p&gt;
&lt;h2&gt;Learn one API, let us handle the details&lt;/h2&gt;
&lt;p&gt;One of the hardest parts of building enterprise wide applications is the need to work with multiple different APIs. In addition, each system has its own idea of what a user is, what it means to have permissions to read a document and more importantly, what a document is to begin with. If you can't define and normalize all of these concepts it's impossible to join, group, categorize and make decisions based on the data. AIE not only provides connectors to these back-end systems, we also handle normalizing each system's data to a standard format that is accessible via our API. A user in Active Directory can have permissions to a document in Documentum and a document in SharePoint. More importantly, the permissions are applied transparently at search time so that developers don't have to worry about doing any sort of post filtering of results.&lt;/p&gt;
&lt;p&gt;Attivio's development environment is meant to hide all of the enterprise ugliness from developers and present a single user, group, document, acl, and query concept. If userX can see records in 10 repositories, we handle the details. If you want to join data from your internal SharePoint server to your CRM system based on a support person's contact information, we can do that for you as well.&lt;/p&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-size: small;"&gt;&lt;strong&gt;Author Bio&lt;/strong&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span style="font-size: x-small;"&gt; &lt;/span&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-size: x-small;"&gt;Since graduating from MIT with a  degree in Computer Science, Will Johnson has worked for Altavista and  FAST for over 7 years.  At Altavista Will developed AV's real time  indexing solution used by news aggregators who demanded instantaneous  access to news as it arrived.  In addition he was one of two engineers  responsible for developing the Altavista QIndexer product that was used  by the large majority of AV's customers.  At FAST, Will developed high  speed database connectors as well as developing search UI's and tool  sets used across the organization.  In addition Will also worked on many  of the largest and most complex sales engagements and deployments for  customers around the world, specializing in distributed systems for many  of the largest internet publishers, directories as well as internal  knowledge management systems.  Will is a founder, one of the Chief  Architects at Attivio and a really nice guy.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Lc-VDBYFrMc:W43bYGUHFfA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Lc-VDBYFrMc:W43bYGUHFfA:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=Lc-VDBYFrMc:W43bYGUHFfA:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Lc-VDBYFrMc:W43bYGUHFfA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Lc-VDBYFrMc:W43bYGUHFfA:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=Lc-VDBYFrMc:W43bYGUHFfA:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Lc-VDBYFrMc:W43bYGUHFfA:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=Lc-VDBYFrMc:W43bYGUHFfA:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=Lc-VDBYFrMc:W43bYGUHFfA:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/Lc-VDBYFrMc" height="1" width="1"/&gt;</content>
	</entry>
	<entry>
		<title>The (Real) Semantic Web Requires Machine Learning</title>
		<link rel="alternate" type="text/html" href="http://www.attivio.com/blog/55-industry-insights/1026-the-real-semantic-web-requires-machine-learning.html" />
		<published>2012-01-05T17:57:21Z</published>
		<updated>2012-01-05T17:57:21Z</updated>
		<id>http://www.attivio.com/blog/55-industry-insights/1026-the-real-semantic-web-requires-machine-learning.html</id>
		<author>
			<name>John O'Neil</name>
		<email>help@attivio.com</email>
		</author>
		<summary type="html">&lt;p&gt;We think about the semantic web in two complementary (and equivalent) ways.  It can be viewed as:&lt;/p&gt;
&lt;p&gt;• A large set of subject-verb-object triples, where the verb is a relation and the subject and object are entities&lt;/p&gt;
&lt;p style="text-align: center;"&gt;OR&lt;/p&gt;
&lt;p&gt;• As a large graph or network, where the nodes of the graph are entities and the graph's directed edges or arrows are the relations between nodes.&lt;/p&gt;
&lt;p&gt;As a reminder, entities are proper names, like people, places, companies, and so on. Relations are meaningful events, outcomes or states, like BORN-IN, WORKS-FOR, MARRIED-TO, and so on. Each entity (like "John O'Neil", "Attivio" or "Newton, MA") has a type (like "PERSON", "COMPANY" or "LOCATION") and each relation is constrained to only accept certain types of entities. For example, WORKS-FOR may require a PERSON as the subject and a COMPANY as the object.&lt;/p&gt;
&lt;p&gt;How semantic web information is organized and transmitted is described by a blizzard of technical standards and XML namespaces. Once you escape from that, the basic goals of the semantic web are (1) to allow a lot of useful information about the world to be simply expressed, in a way that (2) allows computers to do useful things with it.&lt;/p&gt;
&lt;p&gt;Almost immediately, some problems crop up. As generations of artificial intelligence researchers have learned, it can be really difficult to encode real-world knowledge into predicate logic, which is more-or-less what the semantic web is. The same AI researchers also learned that different people will almost inevitably create knowledge encodings that can't easily be compared, because they use different — sometimes subtly, maddeningly different — basic definitions and concepts. Another difficult problem is to decide when entity names refer to the "same" real-world thing. Even worse, if the entity names are defined in two separate places, when and how should they be merged? For example, do an Internet search for "John O'Neil", and try to decide which of the results refer to how many different people. Believe me, all the results are not for the same person.&lt;/p&gt;
&lt;p&gt;&lt;img style="float: right; margin: 5px;" title="Image courtesy of AlchemyAPIBlog" alt="idata-semantic-web.jpg" src="images/at_images/general/blog/idata-semantic-web.jpg" height="305" width="400" /&gt;As for relations, it's difficult to tell when they really mean the same thing across different knowledge encodings. No matter how careful you are, if you want to use relations to infer new facts, you have few resources to check to see if the combined information is valid.&lt;/p&gt;
&lt;p&gt;So, when each web site can define its own entities and relations, independently of any other web site, how do you reconcile entities and relations defined by different people?&lt;/p&gt;
&lt;p&gt;One technique is to require (or STRONGLY SUGGEST) the use of a shared ontology. (For our purposes, an ontology is one person's — or one company's — semantic web).&lt;/p&gt;
&lt;p&gt;Perhaps, if it were carefully designed, it would be possible to allow anyone to add to it without making it unusable. Wikipedia might serve as an inspiration here. However, this is generally impractical, for a number of reasons:&lt;/p&gt;
&lt;ol style="padding-left: 25px;"&gt;
&lt;li&gt;A lot of smart people have tried to do this in the past, and they've obviously failed.&lt;/li&gt;
&lt;li&gt;Wikipedia has grown a community that is good — perhaps too good — at discussing how articles should be written. However, it's not clear that any community could become competent to discuss semantic web issues in detail - and to come into agreement about them.&lt;/li&gt;
&lt;/ol&gt;&lt;br /&gt;
&lt;p&gt;The major problem is the "open-world" requirement implicit in the semantic web. In a closed world or a limited domain - even if the limited domain isn't small — it's possible to agree on the ontological issues and get to work. Many companies have put a lot of effort into creating their domain ontologies, and some have even found a day-to-day use for them. However, it takes a lot of work, and continuously ongoing work, to maintain a good domain ontology.&lt;/p&gt;
&lt;p&gt;Even if companies were willing to open-source their ontologies, their domain is closed — and once you start trying to knit different domain ontologies together, you quickly start seeing the problems discussed above.&lt;/p&gt;
&lt;p&gt;By the way, the fact that the semantic web has failed to be widely adopted has, I think, a simple explanation: it's really difficult, much more so than learning HTML, and the practical payoff is not obvious, to put it mildly.&lt;/p&gt;
&lt;p&gt;As an aside, Attivio's unified information access architecture allows corporate ontologies to be directly imported, so a user can search through them, or perform SQL queries on them, including joins. Joins, in particular, are a powerful tool for understanding semantic web ontologies, and for using them to improve search and other kinds of business intelligence work.  (You can read about our newly awarded join patent here.)&lt;/p&gt;
&lt;p&gt;Is there a solution? Can the creation of domain ontologies be automated — or at least made easier? Will something make it possible to combine different domain (and different site) semantic webs — at least with some minimum guarantees about reliability? I think so, and here's why.&lt;/p&gt;
&lt;p&gt;At Attivio, we've been working on using statistical machine learning  to learn how to extract relations from plain text. We're still working on it — it's a difficult problem — but we're making real progress and I'm pretty sure that we'll discuss the details of our work in future blog posts. For now, though, it's clear to us that there's a real advantage in being able to associate probabilities with the entities and relations that we find in a document, especially when we can accumulate information from millions of documents (or more). If we build a knowledge graph with weights on the entity nodes and relational edges, we start having a way to measure the reliability of different parts of a semantic web. We can also determine, for two separate semantic webs, what entities and relations we know are the same or different, and where we're unsure.&lt;/p&gt;
&lt;p&gt;Human ontology builders can't create probabilities like that, since humans are even worse at statistics than they are at semantics. (No blame here — both are really confusing to think about!) However, there's been a lot of research into relation and event extraction, as well as in machine learning using big data (or extreme information, if you prefer). So it's now possible to create tools that substantially help the process of building ontologies.&lt;/p&gt;
&lt;p&gt;And, making no promises we'll regret, we hope that we'll be able to talk more about it soon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Author Bio&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;John O'Neil has written and  designed software for search, natural language processing and machine  learning for 10 years. After receiving a Ph.D. in computational  linguistics from Harvard University, he has worked for Lingo Motors,  where he designed their main commercial product and ended up with his  name on a number of their patents, as well as other search engine  companies where he worked to increase search relevancy and accuracy. He  also worked for over five years at Basis Technology, Inc., where he was  the designer and lead developer for the Rosette Linguistics Platform,  their language processing and entity extraction suite of products.&lt;/span&gt;&lt;/p&gt;</summary>
		<content type="html">&lt;p&gt;We think about the semantic web in two complementary (and equivalent) ways.  It can be viewed as:&lt;/p&gt;
&lt;p&gt;• A large set of subject-verb-object triples, where the verb is a relation and the subject and object are entities&lt;/p&gt;
&lt;p style="text-align: center;"&gt;OR&lt;/p&gt;
&lt;p&gt;• As a large graph or network, where the nodes of the graph are entities and the graph's directed edges or arrows are the relations between nodes.&lt;/p&gt;
&lt;p&gt;As a reminder, entities are proper names, like people, places, companies, and so on. Relations are meaningful events, outcomes or states, like BORN-IN, WORKS-FOR, MARRIED-TO, and so on. Each entity (like "John O'Neil", "Attivio" or "Newton, MA") has a type (like "PERSON", "COMPANY" or "LOCATION") and each relation is constrained to only accept certain types of entities. For example, WORKS-FOR may require a PERSON as the subject and a COMPANY as the object.&lt;/p&gt;
&lt;p&gt;How semantic web information is organized and transmitted is described by a blizzard of technical standards and XML namespaces. Once you escape from that, the basic goals of the semantic web are (1) to allow a lot of useful information about the world to be simply expressed, in a way that (2) allows computers to do useful things with it.&lt;/p&gt;
&lt;p&gt;Almost immediately, some problems crop up. As generations of artificial intelligence researchers have learned, it can be really difficult to encode real-world knowledge into predicate logic, which is more-or-less what the semantic web is. The same AI researchers also learned that different people will almost inevitably create knowledge encodings that can't easily be compared, because they use different — sometimes subtly, maddeningly different — basic definitions and concepts. Another difficult problem is to decide when entity names refer to the "same" real-world thing. Even worse, if the entity names are defined in two separate places, when and how should they be merged? For example, do an Internet search for "John O'Neil", and try to decide which of the results refer to how many different people. Believe me, all the results are not for the same person.&lt;/p&gt;
&lt;p&gt;&lt;img style="float: right; margin: 5px;" title="Image courtesy of AlchemyAPIBlog" alt="idata-semantic-web.jpg" src="images/at_images/general/blog/idata-semantic-web.jpg" height="305" width="400" /&gt;As for relations, it's difficult to tell when they really mean the same thing across different knowledge encodings. No matter how careful you are, if you want to use relations to infer new facts, you have few resources to check to see if the combined information is valid.&lt;/p&gt;
&lt;p&gt;So, when each web site can define its own entities and relations, independently of any other web site, how do you reconcile entities and relations defined by different people?&lt;/p&gt;
&lt;p&gt;One technique is to require (or STRONGLY SUGGEST) the use of a shared ontology. (For our purposes, an ontology is one person's — or one company's — semantic web).&lt;/p&gt;
&lt;p&gt;Perhaps, if it were carefully designed, it would be possible to allow anyone to add to it without making it unusable. Wikipedia might serve as an inspiration here. However, this is generally impractical, for a number of reasons:&lt;/p&gt;
&lt;ol style="padding-left: 25px;"&gt;
&lt;li&gt;A lot of smart people have tried to do this in the past, and they've obviously failed.&lt;/li&gt;
&lt;li&gt;Wikipedia has grown a community that is good — perhaps too good — at discussing how articles should be written. However, it's not clear that any community could become competent to discuss semantic web issues in detail - and to come into agreement about them.&lt;/li&gt;
&lt;/ol&gt;&lt;br /&gt;
&lt;p&gt;The major problem is the "open-world" requirement implicit in the semantic web. In a closed world or a limited domain - even if the limited domain isn't small — it's possible to agree on the ontological issues and get to work. Many companies have put a lot of effort into creating their domain ontologies, and some have even found a day-to-day use for them. However, it takes a lot of work, and continuously ongoing work, to maintain a good domain ontology.&lt;/p&gt;
&lt;p&gt;Even if companies were willing to open-source their ontologies, their domain is closed — and once you start trying to knit different domain ontologies together, you quickly start seeing the problems discussed above.&lt;/p&gt;
&lt;p&gt;By the way, the fact that the semantic web has failed to be widely adopted has, I think, a simple explanation: it's really difficult, much more so than learning HTML, and the practical payoff is not obvious, to put it mildly.&lt;/p&gt;
&lt;p&gt;As an aside, Attivio's unified information access architecture allows corporate ontologies to be directly imported, so a user can search through them, or perform SQL queries on them, including joins. Joins, in particular, are a powerful tool for understanding semantic web ontologies, and for using them to improve search and other kinds of business intelligence work.  (You can read about our newly awarded join patent here.)&lt;/p&gt;
&lt;p&gt;Is there a solution? Can the creation of domain ontologies be automated — or at least made easier? Will something make it possible to combine different domain (and different site) semantic webs — at least with some minimum guarantees about reliability? I think so, and here's why.&lt;/p&gt;
&lt;p&gt;At Attivio, we've been working on using statistical machine learning  to learn how to extract relations from plain text. We're still working on it — it's a difficult problem — but we're making real progress and I'm pretty sure that we'll discuss the details of our work in future blog posts. For now, though, it's clear to us that there's a real advantage in being able to associate probabilities with the entities and relations that we find in a document, especially when we can accumulate information from millions of documents (or more). If we build a knowledge graph with weights on the entity nodes and relational edges, we start having a way to measure the reliability of different parts of a semantic web. We can also determine, for two separate semantic webs, what entities and relations we know are the same or different, and where we're unsure.&lt;/p&gt;
&lt;p&gt;Human ontology builders can't create probabilities like that, since humans are even worse at statistics than they are at semantics. (No blame here — both are really confusing to think about!) However, there's been a lot of research into relation and event extraction, as well as in machine learning using big data (or extreme information, if you prefer). So it's now possible to create tools that substantially help the process of building ontologies.&lt;/p&gt;
&lt;p&gt;And, making no promises we'll regret, we hope that we'll be able to talk more about it soon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Author Bio&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-size: x-small;"&gt;John O'Neil has written and  designed software for search, natural language processing and machine  learning for 10 years. After receiving a Ph.D. in computational  linguistics from Harvard University, he has worked for Lingo Motors,  where he designed their main commercial product and ended up with his  name on a number of their patents, as well as other search engine  companies where he worked to increase search relevancy and accuracy. He  also worked for over five years at Basis Technology, Inc., where he was  the designer and lead developer for the Rosette Linguistics Platform,  their language processing and entity extraction suite of products.&lt;/span&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=9gEwYI-u6IM:HGVoaUDhPpM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=9gEwYI-u6IM:HGVoaUDhPpM:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=9gEwYI-u6IM:HGVoaUDhPpM:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=9gEwYI-u6IM:HGVoaUDhPpM:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=9gEwYI-u6IM:HGVoaUDhPpM:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=9gEwYI-u6IM:HGVoaUDhPpM:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=9gEwYI-u6IM:HGVoaUDhPpM:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?i=9gEwYI-u6IM:HGVoaUDhPpM:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AttivioBlog?a=9gEwYI-u6IM:HGVoaUDhPpM:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AttivioBlog?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AttivioBlog/~4/9gEwYI-u6IM" height="1" width="1"/&gt;</content>
	</entry>
</feed>

