<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Text Technologies</title>
	<atom:link href="http://www.texttechnologies.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.texttechnologies.com</link>
	<description>Understanding technology ... in both senses of the phrase</description>
	<lastBuildDate>Tue, 18 Oct 2016 02:38:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.9.26</generator>
	<item>
		<title>The future of search</title>
		<link>http://www.texttechnologies.com/2012/11/25/the-future-of-search/</link>
		<comments>http://www.texttechnologies.com/2012/11/25/the-future-of-search/#comments</comments>
		<pubDate>Mon, 26 Nov 2012 03:07:34 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Autonomy]]></category>
		<category><![CDATA[Coveo]]></category>
		<category><![CDATA[Endeca]]></category>
		<category><![CDATA[Enterprise search]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mercado]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Speech recognition]]></category>
		<category><![CDATA[Structured search]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=559</guid>
		<description><![CDATA[I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect: [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:</p>
<ul>
<li><strong>A small oligopoly dominating the conjoined businesses of mobile device software and search.</strong> The companies most obviously positioned for membership are Google and Apple.</li>
<li><strong>The continued and growing combination of search, advertisement/recommendation, and alerting.</strong> The same user-specific data will be needed for all three.</li>
<li><strong>A whole lot of <a href="http://www.dbms2.com/2012/03/01/where-the-privacy-discussion-needs-to-head/">privacy concerns</a>.</strong></li>
</ul>
<p>My reasoning starts from several observations:</p>
<ul>
<li><strong>Enterprise search is greatly disappointing.</strong><em> </em>My main reason for saying that is anecdotal evidence &#8212; I don&#8217;t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
<ul>
<li>HP just disclosed serious problems with Autonomy.</li>
<li>Microsoft&#8217;s acquisition of <a href="http://www.texttechnologies.com/2008/07/08/recent-reporting-on-the-shenanigans-at-fast/">FAST</a> was a similar debacle.</li>
<li>Lesser enterprise search outfits never prospered much. (E.g., when&#8217;s the last time you heard mention of <a href="http://www.texttechnologies.com/2008/02/07/coveo-highlights/">Coveo</a>?)</li>
<li><a href="http://www.texttechnologies.com/2007/05/01/huge-e-commerce-gains-claimed-by-everybody/">My favorable impressions of the e-commerce site search business</a> turned out to be overdone. (E.g., Mercado&#8217;s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)</li>
<li>Lucene/Solr&#8217;s recent stirrings aren&#8217;t really in the area of search.</li>
</ul>
</li>
<li><strong>Web search, while superior to the enterprise kind, is disappointing people as well.</strong> Are Google&#8217;s results any better than they were 8 years ago? Google&#8217;s ongoing <a href="http://searchengineland.com/2012-google-seo-year-in-review-139780">hard</a> <a href="http://insidesearch.blogspot.com/search/label/search%20quality">work</a> notwithstanding, are they even as good?</li>
<li><strong>Consumer computer usage is swinging toward mobile devices</strong>. I hope I don&#8217;t have to convince you about that one. <img src="https://s.w.org/images/core/emoji/11/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></li>
</ul>
<p>In principle, there are two main ways to make search better:</p>
<ul>
<li><strong>Understand more about the documents being searched over.</strong> But Google&#8217;s travails, combined with the rather dismal history of enterprise search, suggest we&#8217;re well into the diminishing-returns part of that project.</li>
<li><strong>Understand more about what the searcher wants.</strong></li>
</ul>
<p>The latter, I think, is where significant future improvement will be found.</p>
<p><span id="more-559"></span>So how does a search engine understand what you want? It can <strong>listen</strong> to you directly, parsing your search string. It can <strong>ask </strong>for more clarity, through some kind of disambiguation interface. Or it can make <strong>inferences,</strong> based on &#8212; well, based on just about any kind of information that might exist about you and your online behavior.</p>
<p>Search strings are short, typically four words or less. That doesn&#8217;t leave room for a lot of innovative parsing. Not a lot of progress can be made until search strings get a lot longer, and that is unlikely except perhaps through the convenience of speech recognition.</p>
<p>Faceted/parameterized selection has its place. For example, when I search on Amazon.com, the site encourages me to also select a department from its dropdown menu; otherwise, it refuses to rank the search results. And when I buy shirts from Land&#8217;s End, I just click through and never search at all. Still, Google&#8217;s been around for 15 years, and about all its successes in searcher-does-the-work disambiguation boil down to is:</p>
<ul>
<li>A list of a few major subcategories to search (News, YouTube, etc.).</li>
<li>Spelling correction.</li>
<li>A desultory list of related/more specific searches, perhaps just longer search strings other people have recently entered.</li>
<li>Well-hidden &#8220;Advanced Search&#8221; features, which look much like AltaVista&#8217;s and AllTheWeb&#8217;s similar features did late in the 20th Century.</li>
</ul>
<p>Whatever the user attitudes and behaviors are that constrain Google&#8217;s or its competitors&#8217; success in this area, I can&#8217;t imagine them changing much &#8212; except, once again, in the event that speech recognition leads to richer human-computer conversations.</p>
<p>I&#8217;ve now highlighted two different ways in which there&#8217;s <strong>a search-interface challenge that will be tough to beat without turning to speech recognition.</strong> But the case for speech recognition is even stronger than that. We&#8217;re moving to small, mobile devices, and:</p>
<ul>
<li><strong>Traditional search interfaces work worse on mobile devices than on desktop computers.</strong> Typing is harder. So is dealing with picky forms.</li>
<li><strong>Speech may work as well or better on mobile devices than at your desk.</strong> If you have upgraded your Apple device to IOS 6, you have both a microphone and Siri. The same may not be true of your desktop gear.</li>
</ul>
<p>And so I conclude that speech recognition is a big part of the future of search.</p>
<p>What will that allow? Since talking is easier than typing, speech is a way to get longer text strings as search inputs, or more of them. It&#8217;s plausible that people might speak queries as complex as:</p>
<ul>
<li>&#8220;I want to buy a recharger for an iPad 3 with delivery this week.&#8221;</li>
<li>&#8220;Where is 10gen&#8217;s Northern California office?&#8221; &#8230; &#8220;Which nearby restaurants have good Yelp reviews?&#8221;</li>
<li>&#8220;Tell me about the David Reed who went to the Kennedy School of Government around 1977, went to Dartmouth before that, and worked for the Federal Communications Commission.&#8221;</li>
</ul>
<p>Getting search engines to the point that they can handle such queries will be difficult but straightforward &#8212; but even more progress is needed. Search results for various queries will be greatly improved if the search engine &#8220;knows&#8221; things like:</p>
<ul>
<li>The location of your home and office, and the distance you&#8217;re willing to go from them to eat or shop.</li>
<li>Your tastes in food, clothing, and gadgetry.</li>
<li>The level of sophistication at which you like to read about medicine, finance, or electronics.</li>
<li>Which people are or might be in your extended social network.</li>
</ul>
<p>And that will cement internet search squarely in the world of &#8212; for once I approve of the term &#8212; <a href="http://www.dbms2.com/2011/09/11/big-data-has-jumped-the-shark/">big data</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2012/11/25/the-future-of-search/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>SOPA&#8217;s potentially chilling effect on public debate</title>
		<link>http://www.texttechnologies.com/2012/01/18/sopas-potentially-chilling-effect-on-public-debate/</link>
		<comments>http://www.texttechnologies.com/2012/01/18/sopas-potentially-chilling-effect-on-public-debate/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 17:02:59 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Online media]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=538</guid>
		<description><![CDATA[SOPA (Stop Online Piracy Act) is getting blasted all over the Internet. Even so, one of its major dangers has not yet been widely discussed. People seem to realize that SOPA can create censorship by governments, or businesses, or as collateral damage when governments and businesses pursue other interests. But they may not yet grasp [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>SOPA (Stop Online Piracy Act) is getting blasted all over the Internet. Even so, one of its major dangers has not yet been widely discussed. People seem to realize that SOPA can create censorship by governments, or businesses, or as collateral damage when governments and businesses pursue other interests. But they may not yet grasp that <strong>SOPA can allow individuals to stifle free speech</strong> as well.</p>
<p>To quote the owner of a popular sports fan discussion forum (emphasis mine):</p>
<blockquote><p>The problem is several of the provisions in SOPA will force ISPs hosting  websites (ie: the company that hosts our servers) to potentially  disconnect us from the Internet if there’s a claim &#8211; unsubstantiated or  not &#8211; that we&#8217;re infringing against copyright, regardless of if it has  not been fully proved in court. The argument is that<strong> this would make it  easy for someone to make false or weak claims against the site to take a  us offline until we went to court. </strong></p>
<p><strong>That&#8217;s a headache I&#8217;m not prepared to deal with.</strong> The number of threats I  get each year via e-mail from angry members from other teams we remove  are pretty unreal and obviously you guys don&#8217;t see them, so<strong> giving any  additional ammunition backed up by a law like this would be a  potentially huge issue.</strong> I&#8217;ve been talking with other sites and it&#8217;s a  very real concern that we&#8217;re all potentially going to be faced with if  this goes through, unless it&#8217;s rewritten to better target the sites that  are really the ones they&#8217;re looking to address.</p></blockquote>
<p>And that&#8217;s just from the passions of sports fandom. The passions of the politics &#8212; or the commercial interests of those being criticized &#8212; are of even greater concern.</p>
<p>Indeed, SOPA-like legislation creates an easy way to take down any forum, blog, or other site that allows user-generated content: flood it with copyrighted content, then run to the regulators. <strong>We must never, ever, ever accept a legal regime in which publishers may be censored before they are PROVED to be guilty of wrongdoing.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2012/01/18/sopas-potentially-chilling-effect-on-public-debate/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Freemium journalism business models, or the Launch of the Spawn of TechCrunch</title>
		<link>http://www.texttechnologies.com/2012/01/17/freemium-journalism-business-models-or-the-launch-of-the-spawn-of-techcrunch/</link>
		<comments>http://www.texttechnologies.com/2012/01/17/freemium-journalism-business-models-or-the-launch-of-the-spawn-of-techcrunch/#respond</comments>
		<pubDate>Tue, 17 Jan 2012 10:44:41 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Online media]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=526</guid>
		<description><![CDATA[In case you missed it, Sarah Lacy has launched Pando Daily, aka &#8220;Spawn of TechCrunch&#8221;. It has a clear mission statement, which she phrased as the site-of-record for that startup root-system and everything that springs up from it, cycle-after-cycle and mentor/investor/board member Mike Arrington simply called to be the paper of record for Silicon Valley [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>In case you missed it, Sarah Lacy has launched <em>Pando Daily,</em> aka &#8220;Spawn of <em>TechCrunch&#8221;.</em> It has a clear mission statement, which <a href="http://pandodaily.com/2012/01/16/why-i-started-pandodaily/">she phrased</a> as</p>
<blockquote><p>the site-of-record for that startup root-system and everything that springs up from it, cycle-after-cycle</p></blockquote>
<p>and mentor/investor/board member <a href="http://uncrunched.com/2012/01/16/sarah-lacy-lauches-pando-daily-your-new-favorite-news-site/">Mike Arrington simply called</a></p>
<blockquote><p>to be the paper of record for Silicon Valley</p></blockquote>
<p>That, I believe, is in <strong>the form a journalistic mission statement should take:</strong></p>
<ul>
<li>&#8220;We (will) offer the best X about Y&#8221;, where &#8230;</li>
<li>&#8230; &#8220;X&#8221; is something like news or analysis or opinion and &#8230;</li>
<li>&#8230; &#8220;Y&#8221; is a particular subject area.</li>
</ul>
<p>But there&#8217;s a problem with that template. One would ideally wish a mission statement of the form &#8220;We do the best A&#8221; to be followed up by &#8220;and, obviously, people will pay lots of money for A&#8221;. Journalistic mission statements don&#8217;t have that nice property.</p>
<p>Fortunately, at least in the case of tech blogging, they do tend to have a nice substitute. Let me explain.</p>
<p><span id="more-526"></span><em>TechCrunch</em> and<em> Pando Daily</em> seem to have the same business plan:</p>
<ul>
<li>Create a popular and respected blog.</li>
<li>Use the access provided by that popularity and respect to populate great conferences.</li>
<li>Use the readership provided by that blog to promote the conferences.</li>
<li>Ka-ching.</li>
</ul>
<p>I have an analogous plan for <em><a href="http://www.dbms2.com">DBMS 2</a>:</em></p>
<ul>
<li>Create a popular and respected blog.</li>
<li>Use the access provided by that popularity and respect to inform great consulting.</li>
<li>Use the readership provided by that blog to promote the consulting.</li>
<li>Ka-ching.</li>
</ul>
<p>Other business models, such as <em>GigaOm&#8217;s,</em> would seem to be a hybrid of our two. All are what could be called &#8220;freemium&#8221; models, even if the other guys (and gals) sell a few ads as well. All seem to work.</p>
<p>Here&#8217;s what I think is the non-obvious part of our models:<strong> </strong></p>
<p><strong>Different parts of our readership are important for different reasons.</strong></p>
<p>To a first approximation:</p>
<ul>
<li>Everybody who reads our work and benefits from it makes us feel good, and motivates us to do more.</li>
<li>Everybody who reads our work and is influenced by it makes tech vendors want to be on our good side, talk to us, give us insight, please us by speaking at our events, and so on.</li>
<li>A moderate fraction of our readers help us expand our readership by word-of-mouth.</li>
<li>Only a small fraction of our readers chip in with helpful blog comments, insightful/tip-off e-mail, and the like, or by publicly throwing us links/tweets.</li>
<li>Only a small fraction of our readers are likely to ever give us money.</li>
</ul>
<p>I think a lot of successful journalistic (or quasi-journalistic) business models will be similarly layered.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2012/01/17/freemium-journalism-business-models-or-the-launch-of-the-spawn-of-techcrunch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Social technology in the enterprise</title>
		<link>http://www.texttechnologies.com/2011/09/14/social-technology-in-the-enterprise/</link>
		<comments>http://www.texttechnologies.com/2011/09/14/social-technology-in-the-enterprise/#comments</comments>
		<pubDate>Wed, 14 Sep 2011 06:04:36 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[E-discovery]]></category>
		<category><![CDATA[Social software and online media]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=510</guid>
		<description><![CDATA[The recent Dreamforce conference (i.e, salesforce.com&#8217;s extravaganza) focused attention on &#8220;the social enterprise&#8221; or, more generally, enterprises&#8217; uses of social technology. salesforce is evidently serious about this push, with development/acquisition investment (e.g. Chatter, Radian 6), marketing focus (e.g. much of Dreamforce) and sales effort (Mark Benioff says he got thrown out of a CIO&#8217;s office [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>The recent Dreamforce conference (i.e, salesforce.com&#8217;s extravaganza) focused attention on &#8220;the social enterprise&#8221; or, more generally, enterprises&#8217; uses of social technology. salesforce is evidently serious about this push, with development/acquisition investment (e.g. Chatter, Radian 6), marketing focus (e.g. much of Dreamforce) and sales effort (Mark Benioff says he got thrown out of a CIO&#8217;s office because he wouldn&#8217;t stop talking about the &#8220;social&#8221; subject) all aligned.</p>
<p><em><a href="http://www.enterpriseirregulars.com/41437/some-economic-consequences-of-dreamforce/?utm_source=feedburner&amp;utm_medium=twitter&amp;utm_campaign=Feed%3A+EIblogs+%28Enterprise+Irregulars%29">Denis Pombriant</a> obviously attended the same Marc Benioff session I did. <a href="http://www.zdnet.com/blog/hinchcliffe/the-promise-and-challenges-of-benioffs-social-enterprise-vision/1722">Dion Hinchcliffe</a> blogged the whole story in considerable detail.</em></p>
<p>It&#8217;s a cool story, and worthy of attention. But I&#8217;d like to step back and remind us that there are numerous different ways to use social technology in the enterprise, which probably shouldn&#8217;t be confused with each other. And then I&#8217;d like to discuss one area of social technology that&#8217;s relatively new to me: <strong>integration between social and operational applications.</strong></p>
<p><span id="more-510"></span>Suppose we split up social technology use cases by saying it can help you:</p>
<ul>
<li>Communicate      and collaborate internally &#8230;</li>
<li>&#8230;      and also with small groups of outsiders, such as your supply chain.</li>
<li>Observe,      listen to, and interact with consumers (and the world at large).</li>
</ul>
<p>The biggest buzz, of course, is around social technology that reaches out to the buying public or world at large. You can use social technology to:</p>
<ul>
<li>Observe      and listen to consumers &#8212; i.e., classic <a href="../../../../../category/text-analytics-applications/voice-of-the-customer/">Voice      of the Customer/Voice of the Market</a> text analytics.</li>
<li>Publish      to consumers, influencers, etc., via blogging, broadcast-oriented Twitter,      and other social media, or go even further and &#8230;</li>
<li>&#8230; communicate      with consumers interactively, whether through loosely-structured      interaction (e.g. Twitter), or in the more structured ways that <a href="../../../../../2010/12/01/state-of-the-art-text-analytics-mining-applications/">Attensity</a> and others provide.</li>
</ul>
<p>I support all that, and indeed participate ferociously myself. But for now, let&#8217;s move on.</p>
<p>On the internal collaboration/communication side, I&#8217;d say:</p>
<ul>
<li>Any communication tool useful for communicating with the public may be valuable internally as well &#8212; <a href="http://www.monashreport.com/2006/01/20/the-power-of-portals/">portals</a>, blogs, Twitter-imitators, and so on.</li>
<li>Pure email &#8220;push&#8221; may not always be the best tool for point-to-point internal communication.</li>
<li>Text analytics on internal communication can have a variety of uses, e.g:
<ul>
<li>Compliance (yet another privacy intrusion, but sometimes a legitimate one).</li>
<li>Internal expert-finding. (In principle, this is the traditional genuine benefit of elaborate &#8220;knowledge management&#8221; implementations, but without the burdens of traditional knowledge management. In practice, that didn&#8217;t work out so great for <a href="http://en.wikipedia.org/wiki/Tacit_Software">Tacit Software</a>.)</li>
<li><a href="../../../../../2006/07/11/google-project-knowledge-management/">Project management</a>.</li>
</ul>
</li>
</ul>
<p>That all gives plenty of scope for useful adoption, on both the email-replacement and text-analytic sides. But again, let&#8217;s keep going.</p>
<p>The relatively new to me &#8212; notwithstanding the &#8220;portals&#8221; link above &#8212; part of the social technology story is <strong>integration between social and operational applications.</strong> While at Dreamforce, I talked with two manufacturing application SaaS vendors &#8212; Kenandy and Rootstock Software. In both cases I asked &#8220;So what are you doing that&#8217;s an advance over where MRP was 20 years ago?&#8221; In both cases the main answer was &#8220;Now users can use social technology to track and communicate about particular orders or issues.&#8221;</p>
<p><em>*MRP stood for &#8220;Material Requirements Planning&#8221; and then &#8220;Manufacturing Resources Planning&#8221;, and is essentially the  forerunner of ERP. By &#8220;Kenandy&#8221; I specifically mean Kenandy&#8217;s founder &#8212; ASK Computer Systems founder and thus MRP legend Sandy Kurtzig.</em></p>
<p>Good point. Of course, it can be generalized; <strong>one can communicate and collaborate around almost any kind of business process. </strong>I&#8217;ve mentioned this before in analytic contexts; it&#8217;s an important concept on the monitoring-oriented side of <a href="http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/">business intelligence</a> and &#8212; if <a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">Oliver Ratzesberger</a> is to be believed &#8212; in investigative analytics as well. But the operational side may actually be more important.</p>
<p>Some things one does in the business world actually involve using one&#8217;s body, from manufacturing products to repairing power stations to standing in a store and serving customers. Most of the rest fits into one or more of three buckets:</p>
<ul>
<li>Creating (a product, a marketing plan, a marketing document, a compensation plan, a program for internal use, an analytic insight, &#8230;)</li>
<li>Relating (to an employee, a sales prospect, a reporter, &#8230;)</li>
<li>Participating in a fairly routine business process (data entry, accounting, mortgage approval, parts ordering, &#8230;)</li>
</ul>
<p>And why can&#8217;t we just automate those routine business processes away? Because there&#8217;s so often a need for manual intervention. And <strong>when there&#8217;s a need for manual intervention, there&#8217;s usually also an element of communicating with other people.</strong> This is almost always true in cases of trouble-shooting or exception-handling (an order is late, a system is down, the automated result violates common sense). It may be present in other cases as well (the new account calls for a personal thank you note, the food order needs to be annotated with special requests). General email is commonly an awkward medium for these communications; automated messages are worse. Newer social technologies, however, have the potential to do much better.</p>
<p><em>So what do you think? Have I drunk too much Kool-Aid, or is this stuff for real?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2011/09/14/social-technology-in-the-enterprise/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The Text Analytics Summit needs to be replaced</title>
		<link>http://www.texttechnologies.com/2011/05/12/the-text-analytics-summit-needs-to-be-replaced/</link>
		<comments>http://www.texttechnologies.com/2011/05/12/the-text-analytics-summit-needs-to-be-replaced/#comments</comments>
		<pubDate>Fri, 13 May 2011 00:17:34 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Text Analytics Summit]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=485</guid>
		<description><![CDATA[I wasn&#8217;t asked to moderate a panel at the Text Analytics Summit because the guy running it &#8212; NOT Seth Grimes &#8212; didn&#8217;t feel &#8220;comfortable&#8221; with me doing so.  (I wanted real discussion; Ezra evidently just wanted to buy off sponsors and partners with marketing-opportunity slots.)  I also wasn&#8217;t given a press pass.* (Although uninterested [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>I wasn&#8217;t asked to moderate a panel at the Text Analytics Summit because the guy running it &#8212; NOT Seth Grimes &#8212; didn&#8217;t feel &#8220;comfortable&#8221; with me doing so.  (I wanted real discussion; Ezra evidently just wanted to buy off sponsors and partners with marketing-opportunity slots.)  I also wasn&#8217;t given a press pass.* (Although uninterested in the sessions, I was interested in stopping by and meeting some newer vendors.)</p>
<p><em>*This is although I&#8217;ve spoken at four prior versions of the event, and responded to their request for free consulting as recently as this year.</em></p>
<p>OK, that might have been personal in some way &#8212; but Nick Patience <a href="http://twitter.com/#!/nickpatience/status/68802265036759040">tweets</a> a very similar story. Even Seth himself <a href="http://twitter.com/#!/SethGrimes/status/68818936245919744">tweets</a> that</p>
<blockquote><p>They have a business model that does not apply well to the IT conference  space.</p></blockquote>
<p><a href="http://www.texttechnologies.com/2007/01/11/text-analytics-summit-a-promising-idea-gone-bad/">The Text Analytics Summit has been troubled for years</a>, but evidently things have gotten worse.</p>
<p>This is more than an incidental problem. Interest in text data is exploding, and marketplace confusing about text analytic technology abounds. More clarity is needed, but too few folks have found an economic model for providing it. (The industry shares some of the blame for that.) I&#8217;m glad Seth is doing other conference work &#8212; notably on sentiment analysis &#8212; but yet more is needed.</p>
<p>If I get into the conference business &#8212; and it seems natural that I would &#8212; I&#8217;ll try to help fill the gap. But if somebody else beats me to the punch, more power to you, and please let me know how I can help.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2011/05/12/the-text-analytics-summit-needs-to-be-replaced/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The state of the art in text analytics applications</title>
		<link>http://www.texttechnologies.com/2010/12/01/state-of-the-art-text-analytics-mining-applications/</link>
		<comments>http://www.texttechnologies.com/2010/12/01/state-of-the-art-text-analytics-mining-applications/#comments</comments>
		<pubDate>Thu, 02 Dec 2010 02:06:54 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=443</guid>
		<description><![CDATA[Text analytics application areas typically fall into one or more of three broad, often overlapping domains: Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) &#8212; in which case &#8212; or public-domain documents such [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Text analytics application areas typically fall into one or more of three broad, often overlapping domains:</p>
<ul>
<li><strong>Understanding the opinions of customers, prospects, or other groups.</strong> This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) &#8212; in which case &#8212; or public-domain documents such as blogs, forum posts, and tweets. The former is usually called <strong>Voice of the Customer (VotC),</strong> while the latter is <strong>Voice of the Market (VotM).</strong></li>
<li><strong>Detecting and identifying problems.</strong> This can happen across many domains &#8212; VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.</li>
<li><strong>Aiding text search, custom publishing, and other electronic document-shuffling use cases,</strong> often via document <a href="http://www.dbms2.com/2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">augmentation</a>.</li>
</ul>
<p>For several years, I&#8217;ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of <a href="../../../../../category/text-mining/sentiment-analysis/">sentiment analysis</a> has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there&#8217;s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:</p>
<ul>
<li>A bunch of documents are analyzed to ascertain the ideas expressed in them.</li>
<li>A count is made as to how many times each idea turns up.</li>
<li>The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.</li>
</ul>
<p>Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such<strong> integration is most commonly weak or nonexistent. </strong>Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among <a href="../../../../../2008/10/24/text-mining-data-warehousin/">other reasons</a>. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.</p>
<p>But in fairness, <strong>the text analytics innovation picture hasn&#8217;t been quite as bleak as what I&#8217;ve been painting so far. </strong><span id="more-443"></span>While standalone, passively-reported text analytics is indeed the baseline, there are some interesting exceptions. For example:</p>
<ul>
<li>I once confirmed that SPSS customer <a href="http://www.spss.com/press/template_view.cfm?PR_ID=1059">Cablecom</a>&#8216;s statistical models for churn and the like absolutely included text data; Cablecom even assigned different weights to the same apparent level of emotion depending on whether the text was in German, French, or Italian. Vertica recently told me of a <a href="http://www.dbms2.com/2010/10/12/vertica-hadoop-connector-integration/">Vertica/Hadoop</a> customer doing something similar, except for the multilingual aspect. And the end of a <a href="http://www2.sas.com/proceedings/forum2008/123-2008.pdf">2008 SAS-based paper</a> makes similar claims.</li>
<li>There long* have been some examples of fact extraction that don&#8217;t really fit into my three buckets above. For example, researchers mine collections of articles to try to determine biochemical or biological pathways that would not be apparent from examining single research studies alone.</li>
<li>It also has long* been the case that some bad-guy-finding applications &#8212; especially in the anti-terrorism area &#8212; used text analytics to populate state-of-the-art <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/">graph-oriented data analysis tools</a>.</li>
</ul>
<p><em>*When it comes to text analytics, &#8220;long&#8221; means &#8220;at least for the past several years.&#8221;</em></p>
<p>In more recent examples:</p>
<ul>
<li><a href="http://www.dbms2.com/category/products-and-vendors/greenplum/">Greenplum</a> built a document recommender for law firms that does hard-core statistical analysis to determine which .1% of a document set lawyers might actually want to see, and which then learns from users&#8217; feedback after they respond to initial result sets.</li>
<li><a href="../../../../../2008/09/19/low-latency-text-mining-in-the-investment-market/">Information extracted from investment news</a> gets included into automated trading algorithms. This was unusual technology a couple of years ago, but is more common today.</li>
<li>After a series of mergers, <a href="../../../../../2009/04/20/the-new-attensity-deal-overview/">Attensity</a> now uses marketing-oriented text analytics in at least three different ways:
<ul>
<li>Attensity text analytics feeds marketing dashboards just as it always did.</li>
<li>Attensity text analytics triggers alerts, as I wish dashboards and business intelligence tools more often did, <a href="http://www.dbms2.com/2010/07/25/alerts-metrics-dashboards/">the false positives problem</a> notwithstanding.</li>
<li>Attensity text analytics triggers concrete workflows, for example <a href="http://www.attensity.com/2010/10/05/attensity-announces-respond-for-social-media/">routing specific social media hits for priority response</a>.</li>
<li>And in one example that did not actually get into production, a very large social networking company correlated word usage (e.g., choice among different synonyms) against user characteristics such as age and gender.</li>
</ul>
</li>
</ul>
<p>Finally there are some applications that, while fitting the standard template, just strike me as getting to unusually sophisticated levels of analysis. For example, Vertica told me of another Vertica/Hadoop case where VotM document analysis is carried out to the level of observing which order brand names appear in, and adjusting that for whether or not it was just an alphabetical list.</p>
<p>I suspect <strong>text analytics is about to become more interesting again.</strong></p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li>The enabling <a href="../../../../../2006/06/24/attensity-extractive-exhaustion-and-the-frn/">technology for text/tabular data integration</a> has existed for years.</li>
<li>In 2006, I listed <a href="http://www.monashreport.com/2006/09/08/where-does-data-mining-succeed-and-why/">major application areas for data mining/predictive analytics</a>. It overlaps pretty closely with the similar list for text mining/text analytics.</li>
<li>Before being acquired by IBM, <a href="../../../../../2008/06/17/spss-update/">SPSS boasted a rather large text mining user base</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2010/12/01/state-of-the-art-text-analytics-mining-applications/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Notes, links, and comments, October 24, 2010</title>
		<link>http://www.texttechnologies.com/2010/10/24/notes-links-and-comments-october-24-2010/</link>
		<comments>http://www.texttechnologies.com/2010/10/24/notes-links-and-comments-october-24-2010/#respond</comments>
		<pubDate>Sun, 24 Oct 2010 08:58:25 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Blogosphere]]></category>
		<category><![CDATA[Online media]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[Social software and online media]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=433</guid>
		<description><![CDATA[Time for a notes/links/comments post just for Text Technologies:  TechCrunch got sold, GigaOm raised money, and VentureBeat/MediaBeat provided a good starting link for both those stories and more.  Since TechCrunch and GigaOm are/were both private, financial details are murky, but: TechCrunch is variously reported as having revenue in the $6-10 million range, probably mainly from [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Time for a notes/links/comments post just for <em>Text Technologies:  <span id="more-433"></span><br />
</em></p>
<ul>
<li><a href="http://venturebeat.com/2010/10/22/gigaom-raises-2-5m-claims-10000-pro-subscribers/">TechCrunch got sold, GigaOm raised money</a>, and VentureBeat/MediaBeat provided a good starting link for both those stories and more.  Since TechCrunch and GigaOm are/were both private, financial details are murky, but:
<ul>
<li>TechCrunch is variously reported as having revenue in the $6-10 million range, probably mainly from events. (If you believe that they sell ~3000 total tickets at ~$2000 each to two annual versions of TechCrunch Disrupt, that makes sense.)</li>
<li>GigaOm reports &gt;10,000 subscribers to market research sevice (sort of) GigaOm Pro, at $199, apparently concentrated on the vendor side.</li>
</ul>
</li>
<li>John Gruber straightforwardly posts <a href="http://daringfireball.net/feeds/sponsors/">both ad rates and circulation</a> for his blog. It&#8217;s a simple $5000/week for readership that exceeds mine by &gt;1 order of magnitude.</li>
<li>The <em>New Yorker</em> points out <a href="http://www.newyorker.com/online/blogs/johncassidy/2010/10/nick-denton.html">Gawker Media may not yet have crossed $20 million in revenue</a>.</li>
<li>An <a href="http://www.niemanlab.org/2010/10/aps-ascap-for-news-%E2%80%94-new-ecosystem-new-revenue-streams-new-enterprise-opportunities/">&#8220;ASCAP for news&#8221;</a> seems to finally be on the way.</li>
<li>Business Week/Bloomberg notices a trend that <a href="http://www.businessweek.com/magazine/content/10_44/b4201020317862.htm">social-media/Voice of the Customer/Voice of the Market text analytics firms are getting acquired by bigger marketing-oriented firms</a>. Seth Grimes, however, argues that the same trend is <a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/10/social_market_l.html">already passe&#8217;</a>.</li>
<li><a href="http://techcrunch.com/2010/10/22/wall-street-journal-investigation-into-myspace-was-quietly-killed/">TechCrunch</a> accused the Wall Street Journal of killing a story about sister company MySpace, then quickly running it after TechCrunch caught them.</li>
<li>LinkedIn has a really cool-looking tech blog. One recent post describes LinkedIn&#8217;s approach to <a href="http://sna-projects.com/blog/2010/10/linkedin-signal-a-look-under-the-hood/">socially-informed search</a>. I read about it in a thoughtful post on <a href="http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/">Daniel Tunkelang&#8217;s blog</a>.</li>
<li><a href="http://sports.espn.go.com/espn/page2/story?page=simmons/101013">Bill Simmons took 3843 words to explain the story of a two-word tweet</a> &#8212; &#8220;moss Vikings.&#8221; Somewhere in there are a few interesting ruminations about media in the current age.</li>
<li>Some notes and links that actually belong here instead went up on <em><a href="http://www.dbms2.com/2010/10/03/notes-and-links-october-3-2010/">DBMS 2</a></em> a few weeks ago.</li>
<li>About half of what I write about <a href="http://www.dbms2.com/category/liberty-privacy/">liberty and privacy</a> is highly relevant to the subjects of this blog, including almost all of <a href="http://www.dbms2.com/2010/10/24/the-privacy-discussion-is-heating-up/">today&#8217;s post</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2010/10/24/notes-links-and-comments-october-24-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A framework for thinking about New Media journalism</title>
		<link>http://www.texttechnologies.com/2010/09/28/a-framework-for-thinking-about-new-media-journalism/</link>
		<comments>http://www.texttechnologies.com/2010/09/28/a-framework-for-thinking-about-new-media-journalism/#comments</comments>
		<pubDate>Tue, 28 Sep 2010 05:54:10 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Online media]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=414</guid>
		<description><![CDATA[Jonathan Stray reminds us of an excellent point: New Media journalism should be thought of as a product that people use, not as collection of stories or other pieces. In particular, he argues: The value of journalism can only be assessed in connection with how people use it &#8230; &#8230; and their lack of enthusiasm [&#8230;]]]></description>
				<content:encoded><![CDATA[<p><a href="http://jonathanstray.com/designing-journalism-to-be-used">Jonathan Stray</a> reminds us of an excellent point:<br />
<strong><br />
New Media journalism should be thought of as a product that people use, not as collection of stories or other pieces.</strong></p>
<p>In particular, he argues:</p>
<ul>
<li>The value of journalism can only be assessed in connection with how people use it &#8230;</li>
<li>&#8230; and their lack of enthusiasm about New Media news is a warning sign.</li>
<li>Technology and form factor matter; imitating old media is likely not the best way to go.</li>
<li>Personalization and targeting need to be a lot better. In particular:
<ul>
<li>What&#8217;s most important is getting stories to the people who are likely to want to act on what&#8217;s in them. <strong>The true value of journalism lies in informing people&#8217;s choices and actions.</strong> (By contrast, he seems to denigrate the other main benefits of news, which are pure entertainment and/or the facilitation of social interaction.)</li>
<li>It&#8217;s OK and natural that <strong>the people inclined to act</strong> &#8212; on  a given story or indeed at all &#8212; <strong>are only a small fraction of the overall population.</strong></li>
</ul>
</li>
</ul>
<p>I am in vehement agreement with much of what Stray has to say, although I think he understates the importance of general knowledge and the often serendipitous benefits of pursuing same. <span id="more-414"></span>For example:</p>
<ul>
<li>I tend to assume that what we write <a href="http://www.strategicmessaging.com/further-notes-on-ethics-and-analyst-research/2010/08/02/">affects people&#8217;s choices</a> by supporting their <a href="http://www.texttechnologies.com/2009/03/29/where-i-think-the-information-ecosystem-is-headed/">informed judgments</a>.</li>
<li>I think it is neither necessary nor acceptable to let <a href="http://www.texttechnologies.com/2010/09/26/how-to-preserve-investigative-reporting-in-the-new-media-era/">investigative reporting</a> wane.</li>
<li>I have witheringly negative opinions about <a href="http://www.strategicmessaging.com/often-the-best-press-release-is-the-one-you-dont-issue/2010/04/01/">vacuous</a> &#8220;<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/">news</a>.&#8221;</li>
</ul>
<p>And I indeed try to practice what Stray preaches. Most of my own posts &#8212; especially when you weight them by length and/or time spent researching and writing them &#8212; are designed to help at least some people make on-the-job decisions.</p>
<ul>
<li>I do just mean &#8220;help,&#8221; the assumption being that people read my work as part of a general research process.</li>
<li>That lots of you read more for general interest or education is great. I suspect you still like the standard of quality to which I aspire, namely that what I write should in most cases actually be <strong>informative even to people who have reason to be well-informed in the area already.</strong></li>
</ul>
<p><strong><br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2010/09/28/a-framework-for-thinking-about-new-media-journalism/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to preserve investigative reporting in the New Media Era</title>
		<link>http://www.texttechnologies.com/2010/09/26/how-to-preserve-investigative-reporting-in-the-new-media-era/</link>
		<comments>http://www.texttechnologies.com/2010/09/26/how-to-preserve-investigative-reporting-in-the-new-media-era/#comments</comments>
		<pubDate>Sun, 26 Sep 2010 12:18:36 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Online media]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=405</guid>
		<description><![CDATA[It is common to say that “On the whole, journalism will be fine even as the media industry is disrupted – but the investigative part of journalism may not fare so well.” Indeed, I took something like that stance in my May, 2009 post on where the information ecosystem is headed and even more directly [&#8230;]]]></description>
				<content:encoded><![CDATA[<p style="margin-bottom: 0in;">It is common to say that “On the whole, journalism will be fine even as the media industry is disrupted – but the investigative part of journalism may not fare so well.” Indeed, I took something like that stance in my May, 2009 post on <a href="http://www.texttechnologies.com/2009/03/29/where-i-think-the-information-ecosystem-is-headed/">where the information ecosystem is headed</a> and even more directly in <a href="http://www.texttechnologies.com/2009/05/08/consumer-reports-national-enquirer-the-future-of-free-societies/">an earlier piece that month</a>. However, I&#8217;ve changed my mind in an optimistic direction, and now believe:</p>
<p><strong>There are still some things we need to do to preserve and extend the societal benefits of investigative reporting. But they are straightforward and very likely to happen.</strong></p>
<p style="margin-bottom: 0in;">Specifically, I recommend:  <span id="more-405"></span></p>
<ul>
<li>Public-spirited law-oriented types 	should do a better job of popularizing <strong>tips for how to get 	information out of governmen</strong>t (Freedom of Information Act and 	all that). And back it up with more <strong>pro bono or charitably-funded 	legal assistance</strong><span style="font-weight: normal;"> – not just 	for specific causes, but for general corruption investigations as 	well. </span>
<ul>
<li><span style="font-weight: normal;">I&#8217;m 	sure quite a bit</span> of that is happening, but it should be much 	more visible and active.</li>
</ul>
</li>
<li><strong>Domain-specific websites</strong> should be created and promoted that <strong>seek out and call attention 	to negative stories in their particular areas,</strong> especially for 	specific industries or geographical regions.
<ul>
<li>A lot of those exist targeted at 	specific large companies people have grudges against, but otherwise 	they&#8217;re much too hard to find.</li>
</ul>
</li>
<li><strong>Reporters need to be in the 	habit of seeking out stories first uncovered by other people.</strong>
<ul>
<li>They do this already, but they 	need to get better.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">Below, at considerable length, is why I think those developments are both necessary and sufficient to carry the tradition of investigative journalism forward into the new media era.</p>
<p style="margin-bottom: 0in;">For there to be public benefit from reporting, three things generally need to occur:</p>
<ul>
<li><strong>Disclosure or discovery</strong> of 	the raw facts. Without that, you don&#8217;t have reporting or news.</li>
<li><strong>Analysis or interpretation.</strong> This stage can be optional when the purpose of news is 	entertainment, societal bonding, or whatever. But it&#8217;s pretty 	central to investigative journalism.</li>
<li><strong>Distribution and 	popularization.</strong> It doesn&#8217;t do much good to uncover an important 	story unless people notice and care about it. Old media, with its 	emphases on writing, curation, and physical distribution, almost 	defines itself by this stage. (E.g., “paper” is part of the word 	“newspaper.”)</li>
</ul>
<p style="margin-bottom: 0in;">Disclosure and discovery come in two main forms:</p>
<ul>
<li>Serendipity.</li>
<li>Spadework.</li>
</ul>
<p style="margin-bottom: 0in;">The <strong>serendipity</strong> part often seems to work well in the new media. Let&#8217;s go to some examples.</p>
<ul>
<li><a href="http://wikileaks.org/">Wikileaks</a> is a hugely successful case – people send Wikileaks documents or 	other files (a process that only makes sense with modern 	technology), and Wikileaks posts them.
<ul>
<li><em>Note: There was an  article yesterday about 	<a href="http://www.spiegel.de/international/germany/0,1518,719561,00.html">“internal 	strife” at Wikileaks</a> – but the gist turned out to be that 	Wikileaks, already highly influential, could be doing even more than 	it already is.</em></li>
</ul>
</li>
<li>Michael Arrington found out about 	a meeting of major angel investors – perhaps originally via a 	tweet – and kicked off <a href="http://techcrunch.com/2010/09/21/so-a-blogger-walks-into-a-bar/">a 	major technology industry news story now known as “Angelgate”</a>.</li>
<li>An anonymous tipster spent 2 ½ 	hours IMing with me to reveal <a href="http://www.dbms2.com/2010/09/17/jp-morgan-chase-oracle-database-outage/">the true 	cause of the JP Morgan Chase site outages</a>.
<ul>
<li>Motivation: Because s/he felt 	Chase&#8217;s technology organization was being unfairly maligned by prior 	coverage.</li>
<li>Why me: Because <a href="http://www.dbms2.com/2010/09/16/chase-authentication-database-outage/">my 	previous speculative post about the JP Morgan Chase outages</a> had 	shown up in the search engines and looked pretty credible.)</li>
<li>Result:  Enough accurate tech 	details of a major consumer embarrassment to create a “teachable 	moment,” even though the concerned parties were trying to cover 	them up.</li>
</ul>
</li>
<li>An assisted living/nursing home in 	Dublin, Ohio called <a href="http://www.dbms2.com/2010/09/13/friendship-village-of-dublin-medical-information/">Friendship 	Village</a> misbehaved toward my parents and me. I blogged about the 	problem, and it&#8217;s in the search engines now. If this turns out to be 	a pattern of behavior rather than an isolated incident, they&#8217;ll have 	some deserved trouble.</li>
</ul>
<p style="margin-bottom: 0in;">The story on the <strong>spadework</strong> side is more mixed. For example, there&#8217;s evidence I did as good a job on the JP Morgan Chase story as conventional media could today –  <a href="http://www.dbms2.com/2010/09/24/a-little-more-on-the-jpmorgan-chase-oracle-outage/"><em>Computerworld</em> ran a story based on my post</a>, without being able to uncover a single detail I hadn&#8217;t already found. But perhaps in the old-media-economics days, perhaps <em>Computerworld</em> would have had the resources to try harder and find something I didn&#8217;t. (E.g., I screwed up and didn&#8217;t actually get the details of the specific Oracle bug.) A bigger problem is outlined in this <a href="http://www.npr.org/templates/story/story.php?storyId=130108851">story on the uncovering of massive corruption in the California town of Bell</a>. To wit (emphasis mine):</p>
<blockquote>
<p style="margin-bottom: 0in;">The new media ecosystem, in which citizen bloggers, small news outlets and big old-school media outlets effectively draw upon one another&#8217;s work to collaborate, didn&#8217;t quite work out in this case.</p>
<p style="margin-bottom: 0in;">One blogger actually has anonymously and exhaustively alleged corruption in Bell for years …</p>
<p>The paper&#8217;s reporters say the blogger gave them tips. Though he&#8217;s a bit frustrated not to get more credit, he says the newspaper&#8217;s reporting muscle and much bigger audience gave life to the story in a way his website simply couldn&#8217;t. He counts his readers in the scores; The <em>L.A. Times </em>has hundreds of thousands of subscribers &#8230;</p>
<p>&#8230; some residents said they had gone to city hall to get their own answers. In essence, they were trying to do their own reporting on why their tax bills were so high and on rumors city officials were making a ton of money.</p>
<p>They got nowhere. &#8230;</p>
<p><strong>&#8220;As a common citizen, I don&#8217;t know what my rights are with the city. I don&#8217;t know really how to attack them,&#8221;</strong> Sanchez said. <span style="font-style: normal;"><strong>&#8220;The</strong></span><em><span style="font-style: normal;"><strong> </strong></span></em><em> </em><em><strong>Times</strong></em><em><strong>,</strong></em><span style="font-style: normal;"><strong> they have their legal departments. Of course, they&#8217;re able to get it more than a regular Joe like me.&#8221;</strong></span></p></blockquote>
<p style="margin-bottom: 0in;"><strong>The citizens of Bell needed some place to turn for help, other than the overworked </strong><em><strong>LA Times</strong></em><strong> reporters who eventually uncovered the story on their own.</strong> Hence my first recommendation near the top of this post.</p>
<p style="margin-bottom: 0in;">In many ways, <strong>analysis and interpretation</strong> work well in the new media era already. After all, there&#8217;s a whole world wide-web of self-appointed volunteer analysts on any issue you&#8217;d care to name! Yes, there are legitimate concerns about fragmentation and echo chambers, in which people only listen to the analysis of those folks who shared their biases to begin with. But those are hardly a barrier to muckraking – if anything, quite the contrary, as illustrated by the <a href="http://www.huffingtonpost.com/john-atlas/fake-acorn-pimp-pleads-gu_b_591708.html">bogus ACORN prostitute/pimp advice scandal</a>. (If your politics lean to the conservative side, think instead of something like a Michael Moore film.)</p>
<p style="margin-bottom: 0in;">Or returning to the examples above:</p>
<ul>
<li>Wikileaks&#8217; biggest leaks are 	widely analyzed by all sorts of commentators, including top-flight 	mainstream media people and a broad variety of online commentators 	alike. I&#8217;ll confess I didn&#8217;t find any analysis of Wikileaks&#8217; 	revelations about, say, Iceland or the Turks &amp; Caicos Islands, 	but I&#8217;ll also confess to not looking very hard.</li>
<li>For the technology news uncovered respectively 	by Arrington and me, pretty much the ideal people to analyze it 	were, respectively – well, they were Arrington and me.
<ul>
<li>In the case of Angelgate, much 	<a href="http://gigaom.com/2010/09/22/angelgate-update-what-the-web-is-saying/">other</a> <a href="http://techcrunch.com/2010/09/23/ron-conway-angel-email/">analysis</a> (and news) ensued.</li>
<li>Analysis of the JP Morgan Chase 	outage details hasn&#8217;t yet gone all that far past me – but I 	already turned it into <a href="http://www.dbms2.com/2010/09/24/a-little-more-on-the-jpmorgan-chase-oracle-outage/">a 	“don&#8217;t make the same mistake JP Morgan Chase did” lesson</a>.</li>
</ul>
</li>
<li>The Friendship Village case is 	being used as a cornerstone of my slowly-unfolding analysis of the 	general problem with medical records.</li>
</ul>
<p style="margin-bottom: 0in;">And that brings us to <strong>distribution and popularization.</strong> The most brilliant sleuthing in the world doesn&#8217;t help people very much if they – or their lawmakers/regulators/advisers/whatever – don&#8217;t find out about it.</p>
<ul>
<li>Wikileaks has that problem solved 	for its biggest leaks, but perhaps not for the others.</li>
<li>Arrington&#8217;s <a href="http://www.texttechnologies.com/2009/05/23/techcrunch-offers-to-pay-a-sources-legal-expenses/">TechCrunch</a> is a top news outlet in 	his area, so the problem was automatically solved for him.</li>
<li><a href="http://www.dbms2.com/"><em>DBMS 	2</em></a> is a fairly serious outlet for database-related news. But 	in any case the JP Morgan Chase story was picked up by general trade 	press and financial-industry-specific press alike.</li>
<li>As noted in the story on Bell, CA, 	nobody was paying attention to a blogger who apparently had worked 	quite a bit of it out.</li>
<li>And if there&#8217;s anything you found 	lacking in my list of analysis/interpretation examples – well, if 	a story were picked up more broadly, then analysis/interpretation 	might also be stronger as well.</li>
</ul>
<p>Almost nobody would ever see my 	Friendship Village story if I didn&#8217;t happen to own some websites 	with strong search engine authority. And how high it stays in the 	rankings as it ages still remains to be seen.</p>
<p style="margin-bottom: 0in;">Possible answers take two main forms:</p>
<ul>
<li><strong>Aggregation and <a href="http://recoveringjournalist.typepad.com/recovering_journalist/2009/04/editors-as-curators-whats-taking-so-long.html">curation</a>,*</strong> in which various contributions are bundled together at go-to 	websites or the like.</li>
<li>A <strong>reporting feeding chain,</strong> in which journalists with broader reach:
<ul>
<li>Steal/borrow/take ideas from more 	specialized contributors.</li>
<li>Repackage them.</li>
<li>Perhaps add additional value in 	reporting, analysis, or presentation. (Several examples of this may 	be found in the links above.)</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Investigative reporting needs more of each.</strong></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>*The latter is the more high-falutin&#8217; version of the former.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">
<p style="margin-bottom: 0in; font-style: normal;">Consider my story about Friendship Village. Standing alone, it&#8217;s not going to influence much of anybody, except insofar as I can personally influence the course of medical database design or privacy law. But suppose one person each reported similar things at 20 different institutions. A journalist who wrote a story based on those reports could carry a lot of sway, perhaps:</p>
<ul>
<li><span style="font-style: normal;">Influencing 	the course of medical information exchange in the United States, or 	at least</span></li>
<li><span style="font-style: normal;">Alerting 	people to the lengths they have to go to get proper information 	about and before their sick relatives. </span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">Similarly, suppose there were a go-to website for complaints about assisted living facilities. Well, people considering moving into Friendship Village would have a little concern to address. Even better, the very existence of that site might help motivate people to share more stories. Bad institutions would need to reform, and bad practices might be reformed under the spotlight of public scrutiny.</p>
<p style="margin-bottom: 0in;"><em>If this isn&#8217;t my longest blog post ever, it&#8217;s surely close. So while I have much more to say on these subjects, I&#8217;ll stop here. Comments and examples are warmly encouraged.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2010/09/26/how-to-preserve-investigative-reporting-in-the-new-media-era/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Ike Pigott on the future of reporting</title>
		<link>http://www.texttechnologies.com/2010/04/04/ike-pigott-on-the-future-of-reporting/</link>
		<comments>http://www.texttechnologies.com/2010/04/04/ike-pigott-on-the-future-of-reporting/#comments</comments>
		<pubDate>Sun, 04 Apr 2010 13:47:06 +0000</pubDate>
		<dc:creator><![CDATA[Curt Monash]]></dc:creator>
				<category><![CDATA[Blogosphere]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[Online media]]></category>
		<category><![CDATA[Social software and online media]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=392</guid>
		<description><![CDATA[Ike Pigott argues that, as the number of conventional journalists plummets, corporations will have to hire their own &#8220;embedded&#8221; journalists to fill the void. As he puts it: The embeds of the future will work for the company, and be paid by the company to provide news about the company in a multitude of formats. [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Ike Pigott argues that, as the number of conventional journalists plummets, corporations will have to hire their own <a href="http://www.mediabullseye.com/mb/2010/04/dear-journalist.html">&#8220;embedded&#8221; journalists</a> to fill the void. <span id="more-392"></span>As he puts it:</p>
<blockquote><p>The embeds of the future will work for the company, and be paid by the company to provide news about the company in a multitude of formats. Print, newsletter, video, blog, podcast, moving billboards, tattoos — whatever it takes. Because the bits and pieces of Corporate America that have a story to tell will still have their stories – just no ready outlets.</p>
<p>How is this different than what you have today? Surely there are corporate PR departments and external agencies already doing these things, right?</p>
<p>No.</p>
<p>What is required is an internal producer who writes in external voice — like the neutral point-of-view so often described by Wikipedia. People can smell marketing and propaganda coming around the corner, and they know when the pitches and puff pieces are missing that edge of neutrality. An accurate and fair piece is accurate and fair, no matter who writes it.</p></blockquote>
<p>It&#8217;s an interesting theory, but it seems to presuppose dual marketing communication efforts, with separate departments of &#8220;Straightforwardness&#8221; and &#8220;Hype&#8221;. That may work at some companies, but in most cases I think it will be more practical to try to infuse straightforwardness through multiple parts of the marcom effort.</p>
<p>My more specific quick responses include:</p>
<ul>
<li>That sure sounds a lot like Robert Scoble in his Microsoft days.</li>
<li>It also sounds like &#8220;community managers&#8221; at MMO game companies. (Both of the MMOs I&#8217;ve played have had great ones.) They often only use one or two channels (forums and the associated general website), but otherwise they fit the bill.</li>
<li>Ike&#8217;s views fit very well with mine on <a href="http://www.texttechnologies.com/2009/03/29/where-i-think-the-information-ecosystem-is-headed/">the future of the information ecosystem</a>.</li>
<li>I&#8217;m getting ever more sympathetic to the idea that you need people whose main job is external communication of a straightforward kind. Reasons include:
<ul>
<li>Senior executives who write great blogs commonly don&#8217;t keep them up. And even when they&#8217;re active, the blogging is pretty sparse. E.g., among companies I follow closely, <a href="http://databasecolumn.vertica.com/">Vertica</a>, <a href="http://www.asterdata.com/blog/index.php/category/statements/">Aster Data</a>, and <a href="http://www.netezzacommunity.com/people/pfrancisco?view=overview">Netezza</a> have all done some outstanding blogging in the past, but do very little of it now. Only <a href="http://www.kellblog.com/">Dave Kellogg</a> at Mark Logic really keeps going.</li>
<li>It&#8217;s not obvious that senior executives are wrong to spend their time at something other than blogging. One of the greatest vendor blogs ever was <a href="http://blogs.sun.com/jonathan/">Jonathan Schwartz&#8217;s</a> at Sun.  Umm &#8212; how sure are we that he actually did much good for his company with that effort?</li>
<li> I frequently tell vendors &#8220;If you tell Story X in your own words, I&#8217;ll gladly point to it or post it for you.&#8221; They usually agree this is a wonderful idea &#8212; but then usually don&#8217;t free up the rather limited resources that would be required to take me up on it.</li>
</ul>
</li>
<li>That said, the kinds of people who provide customer support (pre- or post-sales) are often very well suited to fill the role Ike is describing. At least, that&#8217;s the case in enterprise tech companies.</li>
<li>The media mix isn&#8217;t really as complex as Ike was suggesting. It basically falls into two groups: Text, and audio/video.</li>
<li>That said, text/graphics and audio/video media people are increasingly the same. (Just think of sports media, where the newspaper folks make their big bucks on radio or TV. That&#8217;s a harbinger of the future. Or think again of Scoble.)</li>
<li>One flaw of Ike&#8217;s idea is that in its pure form it only makes sense for companies large enough to have multi-person PR staffs. Other firms would have to use part-timers, or outsource.  And if you&#8217;re going to do that, might it not make more sense to pay part of the cost of sponsoring, you guessed it, an independent blog?</li>
<li>I know that&#8217;s text/graphics-only, or at least text/graphics-mainly, but I happen to think audio/visual business news/PR is minor anyway. People may give enough attention to, for example, listen to audio from a company if it purports to teach them something. But news ABOUT a company? Who&#8217;s so interested in that to sit still for audio/video, unless they happen to be employees, or investors in its stock?</li>
</ul>
<p><strong>Bottom line:</strong> I think he&#8217;s wrong about some of his detailed views, but Ike Pigott is directionally very right in suggesting that <strong>newsmakers will increasingly become content creators</strong> for news about themselves.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2010/04/04/ike-pigott-on-the-future-of-reporting/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>