<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Not this&#8230;</title>
	<atom:link href="https://blog.timbunce.org/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.timbunce.org</link>
	<description>Listen. Reflect. Explore. Solve.</description>
	<lastBuildDate>Sat, 23 May 2020 12:03:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<site xmlns="com-wordpress:feed-additions:1">2562816</site><cloud domain='blog.timbunce.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>https://s0.wp.com/i/buttonw-com.png</url>
		<title>Not this&#8230;</title>
		<link>https://blog.timbunce.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="https://blog.timbunce.org/osd.xml" title="Not this..." />
	<atom:link rel='hub' href='https://blog.timbunce.org/?pushpress=hub'/>
	<item>
		<title>A Comparison of Automatic Speech Recognition (ASR) Systems, part 3</title>
		<link>https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/</link>
					<comments>https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Sun, 17 May 2020 18:45:31 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[transcription]]></category>
		<guid isPermaLink="false">http://blog.timbunce.org/?p=1782</guid>

					<description><![CDATA[In my two previous posts I evaluated a number of Automatic Speech Recognition systems and selected Google and Speechmatics as the best fit for my needs. Here, after another long gap, I&#8217;m returning with updated results and discussion, including new excellent results from Rev.ai, 3Scribe and AssemblyAI. For this evaluation I&#8217;m using the same method and the &#8230; <a href="https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/" class="more-link">Continue reading <span class="screen-reader-text">A Comparison of Automatic Speech Recognition (ASR) Systems, part&#160;3</span></a>]]></description>
										<content:encoded><![CDATA[<p>In my two <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/" target="_blank" rel="noopener">previous</a> <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/" target="_blank" rel="noopener">posts</a> I evaluated a number of Automatic Speech Recognition systems and selected Google and Speechmatics as the best fit for my needs. Here, after another long gap, I&#8217;m returning with updated results and discussion, including new excellent results from <a href="https://www.rev.ai/" target="_blank" rel="noopener">Rev.ai</a>, <a href="https://3scri.be/" target="_blank" rel="noopener">3Scribe</a> and <a href="https://www.assemblyai.com" rel="noopener" target="_blank">AssemblyAI</a>.</p>
<p><span id="more-1782"></span></p>
<p>For this evaluation I&#8217;m using the same method and the same twelve audio clips as I described in my <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/">previous post</a>.</p>
<h2>Results</h2>
<p>The table below presents the results. The first column includes the approximate date of the ASR processing in YYYY-MM format. The rows are ordered by the median of their WER scores across all 12 files. Each cell is color coded according to the degree to which the WER score is better (lower, deeper green) or worse (higher, deeper red) than the median of this set of results for that file.</p>
<table style="border:1px #aaa;border-style:solid;border-collapse:collapse;border-spacing:0;">
<tbody>
<tr>
<td style="padding:2px;">Service</td>
<td style="padding:2px;">Median</td>
<td style="padding:2px;">F10<br />
A41</td>
<td style="padding:2px;">F11<br />
A97</td>
<td style="padding:2px;">F13<br />
B52</td>
<td style="padding:2px;">F14<br />
C18</td>
<td style="padding:2px;">F14<br />
C42</td>
<td style="padding:2px;">F15<br />
C96</td>
<td style="padding:2px;">F16<br />
D64</td>
<td style="padding:2px;">F17<br />
D83</td>
<td style="padding:2px;">F17<br />
E03</td>
<td style="padding:2px;">F18<br />
E82</td>
<td style="padding:2px;">F18<br />
E83</td>
<td style="padding:2px;">F18<br />
E84</td>
</tr>
<tr>
<td style="padding:2px;">3Scribe<br />
2020-05 <span style="float:right;text-align:right;">$7.8/hr</span></td>
<td style="text-align:center;padding:2px;" title="8.72">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="14.22">14</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="8.12">8</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="7.88">8</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="6.33">6</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="8.39">8</td>
<td style="background-color:#f5ffe2;text-align:center;padding:2px;" title="7.40">7</td>
<td style="background-color:#ecffc7;text-align:center;padding:2px;" title="12.10">12</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="8.77">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="11.51">12</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="14.15">14</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="8.67">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="10.59">11</td>
</tr>
<tr>
<td style="padding:2px;">Rev.ai<br />
2020-04 <span style="float:right;text-align:right;">$2.1/hr</span></td>
<td style="text-align:center;padding:2px;" title="10.98">11</td>
<td style="background-color:#fcfff8;text-align:center;padding:2px;" title="18.44">18</td>
<td style="background-color:#fff2eb;text-align:center;padding:2px;" title="11.67">12</td>
<td style="background-color:#fff9f6;text-align:center;padding:2px;" title="10.29">10</td>
<td style="background-color:#ebffc4;text-align:center;padding:2px;" title="6.51">7</td>
<td style="background-color:#efffd0;text-align:center;padding:2px;" title="9.14">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="6.59">7</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="14.47">14</td>
<td style="background-color:#f7ffe7;text-align:center;padding:2px;" title="9.27">9</td>
<td style="background-color:#f5ffe1;text-align:center;padding:2px;" title="12.18">12</td>
<td style="background-color:#f5ffe3;text-align:center;padding:2px;" title="16.79">17</td>
<td style="background-color:#ecffc8;text-align:center;padding:2px;" title="9.36">9</td>
<td style="background-color:#fcfff7;text-align:center;padding:2px;" title="12.18">12</td>
</tr>
<tr>
<td style="padding:2px;">AssemblyAI<br />
2020-05 <span style="float:right;text-align:right;">$0.9/hr</span></td>
<td title="11.04" style="text-align:center;padding:2px;">11</td>
<td title="20.06" style="background-color:#fffaf7;text-align:center;padding:2px;">20</td>
<td title="9.83" style="background-color:#fafff0;text-align:center;padding:2px;">10</td>
<td title="11.09" style="background-color:#ffede4;text-align:center;padding:2px;">11</td>
<td title="8.86" style="background-color:#fff8f5;text-align:center;padding:2px;">9</td>
<td title="10.48" style="background-color:#fafff1;text-align:center;padding:2px;">10</td>
<td title="9.73" style="background-color:#ffdccb;text-align:center;padding:2px;">10</td>
<td title="13.82" style="background-color:#fcfff6;text-align:center;padding:2px;">14</td>
<td title="9.88" style="background-color:#fff8f5;text-align:center;padding:2px;">10</td>
<td title="13.11" style="background-color:#fff5f1;text-align:center;padding:2px;">13</td>
<td title="16.00" style="background-color:#f2ffda;text-align:center;padding:2px;">16</td>
<td title="10.98" style="background-color:#f7ffe8;text-align:center;padding:2px;">11</td>
<td title="11.98" style="background-color:#fafff2;text-align:center;padding:2px;">12</td>
</tr>
<tr>
<td style="padding:2px;">Google<br />
2019-07</td>
<td style="text-align:center;padding:2px;" title="11.54">12</td>
<td style="background-color:#ffefe7;text-align:center;padding:2px;" title="20.64">21</td>
<td style="background-color:#f0ffd2;text-align:center;padding:2px;" title="8.94">9</td>
<td style="background-color:#f8ffeb;text-align:center;padding:2px;" title="9.34">9</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="8.38">8</td>
<td style="background-color:#fffcfa;text-align:center;padding:2px;" title="11.34">11</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="8.04">8</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="11.74">12</td>
<td style="background-color:#fbfff5;text-align:center;padding:2px;" title="9.45">9</td>
<td style="background-color:#f2ffd8;text-align:center;padding:2px;" title="12.00">12</td>
<td style="background-color:#ffdac8;text-align:center;padding:2px;" title="22.71">23</td>
<td style="background-color:#fffaf8;text-align:center;padding:2px;" title="13.97">14</td>
<td style="background-color:#ffddcd;text-align:center;padding:2px;" title="13.78">14</td>
</tr>
<tr>
<td style="padding:2px;">Google<br />
2020-02 <span style="float:right;text-align:right;">$1.4/hr</span></td>
<td style="text-align:center;padding:2px;" title="11.64">12</td>
<td style="background-color:#fff2ec;text-align:center;padding:2px;" title="20.29">20</td>
<td style="background-color:#f6ffe5;text-align:center;padding:2px;" title="9.76">10</td>
<td style="background-color:#fff6f1;text-align:center;padding:2px;" title="10.44">10</td>
<td style="background-color:#fff8f5;text-align:center;padding:2px;" title="8.68">9</td>
<td style="background-color:#fdfffa;text-align:center;padding:2px;" title="10.97">11</td>
<td style="background-color:#ffede5;text-align:center;padding:2px;" title="8.62">9</td>
<td style="background-color:#eeffcc;text-align:center;padding:2px;" title="12.31">12</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="9.57">10</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="12.74">13</td>
<td style="background-color:#ffdccb;text-align:center;padding:2px;" title="22.45">22</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="13.47">13</td>
<td style="background-color:#ffdac8;text-align:center;padding:2px;" title="13.91">14</td>
</tr>
<tr>
<td style="padding:2px;">Speechmatics<br />
2018-12</td>
<td style="text-align:center;padding:2px;" title="11.77">12</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="18.90">19</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="10.85">11</td>
<td style="background-color:#fcfff6;text-align:center;padding:2px;" title="9.71">10</td>
<td style="background-color:#fff7f3;text-align:center;padding:2px;" title="8.74">9</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="11.15">11</td>
<td style="background-color:#ffeae0;text-align:center;padding:2px;" title="8.74">9</td>
<td style="background-color:#ffe6da;text-align:center;padding:2px;" title="16.05">16</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="10.56">11</td>
<td style="background-color:#ffeee5;text-align:center;padding:2px;" title="13.23">13</td>
<td style="background-color:#fffbf9;text-align:center;padding:2px;" title="19.16">19</td>
<td style="background-color:#fff7f3;text-align:center;padding:2px;" title="14.35">14</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="12.38">12</td>
</tr>
<tr>
<td style="padding:2px;">Rev.ai<br />
2019-07</td>
<td style="text-align:center;padding:2px;" title="11.91">12</td>
<td style="background-color:#ffece3;text-align:center;padding:2px;" title="20.92">21</td>
<td style="background-color:#ffe8dd;text-align:center;padding:2px;" title="12.29">12</td>
<td style="background-color:#ffe4d7;text-align:center;padding:2px;" title="11.31">11</td>
<td style="background-color:#efffd0;text-align:center;padding:2px;" title="6.87">7</td>
<td style="background-color:#fff9f6;text-align:center;padding:2px;" title="11.53">12</td>
<td style="background-color:#f8ffec;text-align:center;padding:2px;" title="7.63">8</td>
<td style="background-color:#ffd5c0;text-align:center;padding:2px;" title="17.13">17</td>
<td style="background-color:#ffd7c4;text-align:center;padding:2px;" title="10.31">10</td>
<td style="background-color:#ffd2bc;text-align:center;padding:2px;" title="14.03">14</td>
<td style="background-color:#fcfff6;text-align:center;padding:2px;" title="18.17">18</td>
<td style="background-color:#f2ffd8;text-align:center;padding:2px;" title="10.54">11</td>
<td style="background-color:#ffe3d6;text-align:center;padding:2px;" title="13.52">14</td>
</tr>
<tr>
<td style="padding:2px;">Speechmatics<br />
2020-04 <span style="float:right;text-align:right;">$4.4/hr</span></td>
<td style="text-align:center;padding:2px;" title="12.10">12</td>
<td style="background-color:#fcfff7;text-align:center;padding:2px;" title="18.38">18</td>
<td style="background-color:#fffdfd;text-align:center;padding:2px;" title="10.92">11</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="10.00">10</td>
<td style="background-color:#ffe8dd;text-align:center;padding:2px;" title="9.46">9</td>
<td style="background-color:#fff2ec;text-align:center;padding:2px;" title="11.95">12</td>
<td style="background-color:#ffe8dd;text-align:center;padding:2px;" title="8.80">9</td>
<td style="background-color:#ffe1d3;text-align:center;padding:2px;" title="16.34">16</td>
<td style="background-color:#ffd0b9;text-align:center;padding:2px;" title="10.44">10</td>
<td style="background-color:#fff8f5;text-align:center;padding:2px;" title="12.92">13</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="18.76">19</td>
<td style="background-color:#fff6f2;text-align:center;padding:2px;" title="14.41">14</td>
<td style="background-color:#fdfffa;text-align:center;padding:2px;" title="12.25">12</td>
</tr>
</tbody>
</table>
<p>One of the few benefits of taking <em>years</em> to work through this process is that I can see how ASR results for a service change over time. While Google&#8217;s and Speechmatic&#8217;s score dropped a little, Rev.ai has significantly improved. 3Scribe and AssemblyAI are newcomers to my testing.</p>
<p>The prices shown are the approximate USD cost per hour, ignoring any free tier or bulk discounts.</p>
<p>It&#8217;s important to note that these results are <em>all very good</em>. The nature of the informal testing I&#8217;m doing means there&#8217;s really little value in distinguishing between small differences in WER scores. At this level the scores are significantly affected by differences in how &#8220;verbatim&#8221; the systems try to be, such as when a speaker hesitates and repeats a word or two. For example, here&#8217;s a section of vimdiff showing Google, Rev.ai, and Speechmatics making different choices:</p>
<p><a href="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png"><img data-attachment-id="1797" data-permalink="https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/untitled-2/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png" data-orig-size="1696,1098" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Example differences" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=676" class="aligncenter size-large wp-image-1797" src="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=676&#038;h=438" alt="Example differences" width="676" height="438" srcset="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=676 676w, https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=1352 1352w, https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=150 150w, https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=300 300w, https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=768 768w, https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=1024 1024w" sizes="(max-width: 676px) 100vw, 676px" /></a></p>
<p>The effect of actual transcription errors on the WER score has become less significant and I don&#8217;t have the time to sift through which differences are significant or not. I&#8217;m content these services are all good enough for my needs.</p>
<h2>Google</h2>
<p>Google&#8217;s score had an insignificant drop (-0.1) since July last year. The problem in my previous test, where the transcript of the F18.E82 clip was missing a chunk of text, was still present.</p>
<h2>Speechmatics</h2>
<p>When I submitted each audio file to Speechmatics a pop-up alert said &#8220;Duplicate file. You already have a job that used a file with this name. Are you sure you want to select it again?&#8221; I said yes. Speechmatics uploaded the file, took some time to transcribe each one, and charged me for the service. When I downloaded the transcripts I found that they were identical to the previous transcripts generated in 2018. This seemed suspicious so I edited an audio file to remove a tiny moment of silence and tried again. This time the transcript was different, so I did the same for all the other files. That sure seems like a bug.</p>
<p>Speechmatics score has dropped since December 2018. It&#8217;s a slightly larger drop (-0.3) than Google&#8217;s but still small.</p>
<h2>Rev.ai</h2>
<p>Rev.ai is a newcomer to my testing. Jay Lee, the General Manager of speech services at Rev.com, contacted me in July, prompted by my <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/">previous blog post</a>. Rev.ai is the enterprise version of their ASR API. We had a call where we talked over the project, my methods, the results. Full disclosure: Jay very kindly donated enough minutes of Rev.ai time to cover my needs for this project.</p>
<p>I tested their Rev.ai service in July and the results were good then. They&#8217;re even better now (+0.9).</p>
<h2>3Scribe</h2>
<p>As I was drafting this post Eddie Gahan from 3Scribe contacted me with an invitation to try out their new service. Their scores are impressive. They have an API but don&#8217;t yet offer features like word-level timings or confidence scores. They&#8217;re one to watch. I wish them well, not least because they&#8217;re <a href="https://3scri.be/aboutus" target="_blank" rel="noopener">an Irish company</a>.</p>
<h2>AssemblyAI</h2>
<p>I&#8217;d overlooked AssemblyAI thus far. They only offer <a href="https://docs.assemblyai.com/overview/getting-started" rel="noopener" target="_blank">an API</a> interface, though it&#8217;s simple to use and well documented. They don&#8217;t provide speaker diarisation, but do provide excellent results at an excellent price, with punctuation and word level timing and confidence. Their free tier is 300 minutes/month.</p>
<h2>Differential Analysis</h2>
<p>Each service has a relatively high WER score when using the transcript from one of the others as the ground truth. <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/#differential-analysis" target="_blank" rel="noopener">This is good</a>. It means the services are making <em>different</em> mistakes/decisions and those differences could be used to highlight likely errors in the others.</p>
<h2>Diversions</h2>
<p>A couple of issues diverted me for a while.</p>
<h4>Exploring the Parameter Space of the Google API</h4>
<p>Unlike the other services, the Google API offers <a href="https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig" target="_blank" rel="noopener"><em>many</em> configuration options</a> which provide &#8220;information to the recognizer that specifies how to process the request&#8221;. There&#8217;s an implication, to me at least, that providing more detailed configuration options <em>could</em> result in more accurate transcriptions. But which ones would have a significant effect?</p>
<p>I picked a number of parameters and ran transcriptions with various combinations of likely-looking parameter values. I was especially hopeful that specifying a <a href="https://www.naics.com/search/" target="_blank" rel="noopener">NAICS code</a> for the topic of the podcast would have a positive effect. To cut a long story short, nothing made a significant difference, except providing a vocabulary. I suspect Google may use the configuration details provided by the users to help train their system.</p>
<h4>Extra Vocabulary</h4>
<p>Both the Google and Rev.ai APIs provide a way to improve transcription accuracy by specifying extra words and phrases.</p>
<p>Google&#8217;s <a href="https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig" target="_blank" rel="noopener">SpeechContext</a> has <code>phrases</code>: &#8220;A list of strings containing words and phrases &#8216;hints&#8217; so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases&#8221; There&#8217;s also an integer <code>boost</code> value: &#8220;Positive value will increase the probability that a specific phrase will be recognized over other similar sounding phrases. The higher the boost, the higher the chance of false positive recognition as well. We recommend using a binary search approach to finding the optimal value for your use case.&#8221;</p>
<p>Rev.ai describe <a href="https://www.rev.ai/docs#operation/SubmitTranscriptionJob" target="_blank" rel="noopener">custom_vocabularies</a> like this: &#8220;An array of words or phrases not found in the normal dictionary. Add your specific technical jargon, proper nouns and uncommon phrases as strings in this array to add them to the lexicon for this job.&#8221;</p>
<p>Specifying a vocabulary of extra words seemed like a very appealing way to improve the transcription accuracy. Certainly worth spending some time exploring. I needed a list of words that the transcriptions tended to get wrong. For each file I compared the ground truth transcript with all the ASR generated transcripts and extracted a list of all the words the ASRs had got wrong, regardless of circumstances. I called this a &#8216;commonly wrong words&#8217; list.</p>
<p>I tried Google first, submitting jobs with the commonly wrong words and various boost levels. A boost level of 3 had the best effect, reducing the WER from around 11.5 to 9.5. An impressive gain!</p>
<p>Then I tried the same with Rev.ai. This time the results got worse. I was rather puzzled and disappointed. I contacted Rev and they kindly arranged a meeting to explain how the feature worked. My take-aways were that it&#8217;s useful for specific terms and especially phrases that are not already known to their system. That my &#8220;shotgun&#8221; approach, using lots of individual words, wasn&#8217;t a good use. And that it&#8217;s hard to predict the effect.</p>
<p>Later on it dawned on me that my &#8216;commonly wrong words&#8217; approach was simply not valid. I was effectively cheating by strongly hinting to Google what words it had got wrong. Moreover it would not be possible to <em>automatically</em> generate a suitable list of words and phrases for each audio file to be transcribed. The closest viable approach might be to extract unusual words and phrases, such as uncommon names, from podcast show notes. I may return to exploring the creation and use of a custom vocabulary later, but for now I&#8217;m shelving it.</p>
<h2>Other ASR Services Using Rev?</h2>
<p>When talking to Rev.ai they mentioned <a href="https://fireflies.ai" target="_blank" rel="noopener">Fireflies.ai</a>. (A service which simplifies recording and transcription of business meetings. You invite their bot, called Fred, to join the meeting via your calendar app and the rest is automatic.) Fireflies use Rev.ai as the ASR. I tried them out and was puzzled to see some results were better than Rev.ai&#8217;s.</p>
<p>That reminded me of <a href="https://www.descript.com" target="_blank" rel="noopener">Descript</a> who, when I tested them <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/" target="_blank" rel="noopener">previously</a>, were using Google as the ASR yet had some results that were better than Google&#8217;s.</p>
<p>It seems there are two factors at play: pre-processing of the audio before it&#8217;s sent to the ASR, and post-processing of the raw ASR results.</p>
<p>Here&#8217;s a comparison of results from Fireflies.ai, Rev.ai, and Descript:</p>
<table style="border:1px #aaa;border-style:solid;border-collapse:collapse;border-spacing:0;">
<tbody>
<tr>
<td style="padding:2px;">Service</td>
<td style="padding:2px;">Median</td>
<td style="padding:2px;">F10<br />
A41</td>
<td style="padding:2px;">F11<br />
A97</td>
<td style="padding:2px;">F13<br />
B52</td>
<td style="padding:2px;">F14<br />
C18</td>
<td style="padding:2px;">F14<br />
C42</td>
<td style="padding:2px;">F15<br />
C96</td>
<td style="padding:2px;">F16<br />
D64</td>
<td style="padding:2px;">F17<br />
D83</td>
<td style="padding:2px;">F17<br />
E03</td>
<td style="padding:2px;">F18<br />
E82</td>
<td style="padding:2px;">F18<br />
E83</td>
<td style="padding:2px;">F18<br />
E84</td>
</tr>
<tr>
<td style="padding:2px;">Fireflies 2020-04</td>
<td style="text-align:center;padding:2px;" title="10.02">10</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="18.09">18</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="10.24">10</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="9.20">9</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="7.53">8</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="9.80">10</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="6.82">7</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="14.54">15</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="9.33">9</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="12.55">13</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="15.40">15</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="9.05">9</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="12.58">13</td>
</tr>
<tr>
<td style="padding:2px;">Rev.ai 2020-04</td>
<td style="text-align:center;padding:2px;" title="10.98">11</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="18.44">18</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="11.67">12</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="10.29">10</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="6.51">7</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="9.14">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="6.59">7</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="14.47">14</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="9.27">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="12.18">12</td>
<td style="background-color:#fffcfb;text-align:center;padding:2px;" title="16.79">17</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="9.36">9</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="12.18">12</td>
</tr>
<tr>
<td style="padding:2px;">Descript 2020-04</td>
<td style="text-align:center;padding:2px;" title="11.84">12</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="20.35">20</td>
<td style="background-color:#ffeee6;text-align:center;padding:2px;" title="12.22">12</td>
<td style="background-color:#ffdfd0;text-align:center;padding:2px;" title="11.09">11</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="6.75">7</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="11.46">11</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="7.34">7</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="15.98">16</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="9.33">9</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="12.31">12</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="16.72">17</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="8.86">9</td>
<td style="background-color:#ffccb2;text-align:center;padding:2px;" title="13.12">13</td>
</tr>
</tbody>
</table>
<p>I&#8217;ve included Descript because the differential WER scores between the three are low. Specifically the WER between Descript and Rev.ai is half of that between Descript and Google, suggesting that Descript is using Rev in their ASR process.</p>
<p>It&#8217;s interesting that Fireflies.ai did especially well on the three oldest files, with relatively lower quality audio, and the more recent F18.E82 file that had clipping. It made me wonder if I should experiment with some audio pre-processing of my own but, after spending 4 years getting this far, I&#8217;ll pass!</p>
<h2>Revisiting The Past</h2>
<p>I thought it would be interesting to revisit the 2 hour audio file I used in the <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/" target="_blank" rel="noopener">first</a> of my ASR comparison posts, back in May 2018.</p>
<p>The figures in the Sentences, Commas, and Questions columns are the number of full-stop, comma, and question mark characters in the transcript. The figures in the Names column are a rough approximation to the number of Proper Nouns.</p>
<table class="tg" style="border:1px #aaa;border-style:solid;border-collapse:collapse;border-spacing:0;">
<tbody>
<tr>
<th class="tg-yw4l">Service</th>
<th class="tg-yw4l">WER</th>
<th class="tg-yw4l">Sentences</th>
<th class="tg-yw4l">Commas</th>
<th class="tg-yw4l">Questions</th>
<th class="tg-yw4l">Names</th>
</tr>
<tr>
<td class="tg-yw4l">Human range<br />
low – high</td>
<td class="tg-yw4l" style="text-align:right;">4.10<br />
— 5.10</td>
<td class="tg-yw4l" style="text-align:right;">840<br />
— 1261</td>
<td class="tg-yw4l" style="text-align:right;">1450<br />
— 1748</td>
<td class="tg-yw4l" style="text-align:right;">49<br />
— 76</td>
<td class="tg-yw4l" style="text-align:right;">1056<br />
— 1208</td>
</tr>
<tr>
<td class="tg-yw4l">Rev.ai<br />
2020-05</td>
<td class="tg-yw4l" style="text-align:right;">8.48</td>
<td class="tg-yw4l" style="text-align:right;">731</td>
<td class="tg-yw4l" style="text-align:right;">1383</td>
<td class="tg-yw4l" style="text-align:right;">59</td>
<td class="tg-yw4l" style="text-align:right;">884</td>
</tr>
<tr>
<td class="tg-yw4l">3Scribe<br />
2020-05</td>
<td class="tg-yw4l" style="text-align:right;">8.71</td>
<td class="tg-yw4l" style="text-align:right;">688</td>
<td class="tg-yw4l" style="text-align:right;">664</td>
<td class="tg-yw4l" style="text-align:right;">67</td>
<td class="tg-yw4l" style="text-align:right;">995</td>
</tr>
<tr>
<td class="tg-yw4l">AssemblyAI<br />
2020-05</td>
<td class="tg-yw4l" style="text-align:right;">9.34</td>
<td class="tg-yw4l" style="text-align:right;">805</td>
<td class="tg-yw4l" style="text-align:right;">643</td>
<td class="tg-yw4l" style="text-align:right;">69</td>
<td class="tg-yw4l" style="text-align:right;">996</td>
</tr>
<tr>
<td class="tg-yw4l">Speechmatics<br />
2020-05</td>
<td class="tg-yw4l" style="text-align:right;">9.61</td>
<td class="tg-yw4l" style="text-align:right;">667</td>
<td class="tg-yw4l" style="text-align:right;">0</td>
<td class="tg-yw4l" style="text-align:right;">0</td>
<td class="tg-yw4l" style="text-align:right;">931</td>
</tr>
<tr>
<td class="tg-yw4l">Google<br />
2018-04</td>
<td class="tg-yw4l" style="text-align:right;">10.03</td>
<td class="tg-yw4l" style="text-align:right;">641</td>
<td class="tg-yw4l" style="text-align:right;">421</td>
<td class="tg-yw4l" style="text-align:right;">29</td>
<td class="tg-yw4l" style="text-align:right;">1232</td>
</tr>
<tr>
<td class="tg-yw4l">Google<br />
2020-05</td>
<td class="tg-yw4l" style="text-align:right;">10.36</td>
<td class="tg-yw4l" style="text-align:right;">462</td>
<td class="tg-yw4l" style="text-align:right;">238</td>
<td class="tg-yw4l" style="text-align:right;">20</td>
<td class="tg-yw4l" style="text-align:right;">1325</td>
</tr>
<tr>
<td class="tg-yw4l">Speechmatics<br />
2018-02</td>
<td class="tg-yw4l" style="text-align:right;">11.65</td>
<td class="tg-yw4l" style="text-align:right;">672</td>
<td class="tg-yw4l" style="text-align:right;">0</td>
<td class="tg-yw4l" style="text-align:right;">0</td>
<td class="tg-yw4l" style="text-align:right;">892</td>
</tr>
</tbody>
</table>
<p>The Rev.ai results are particularly impressive. Beyond the good WER score they also come remarkably close to the Human transcripts in terms of identifying sentences, commas and questions. As do 3Scribe and AssemblyAI. Accurate recognition of sentences and especially questions should be helpful in segmenting the transcript into topics.</p>
<p>Over the last two years Google&#8217;s results have got slightly worse and Speechmatics have improved enough to jump ahead of them. (The 2018 figures for Google and Speechmatics don&#8217;t exactly match those in my earlier post due to small changes in my analysis scripts, such as recognizing more compound words.)</p>
<h2>Conclusions</h2>
<p>For my needs, on this project, <a href="https://www.rev.ai" target="_blank" rel="noopener">Rev.ai</a> have the best results. Their kind donation of time credits is the icing on the cake. <a href="https://www.assemblyai.com" rel="noopener" target="_blank">AssemblyAI</a> lacks speaker identification. <a href="https://www.speechmatics.com" target="_blank" rel="noopener">Speechmatics</a> has a better WER score than <a href="https://cloud.google.com/speech-to-text/" target="_blank" rel="noopener">Google</a> but doesn&#8217;t recognise questions. All except 3Scribe have word-level timing and confidence indicators.</p>
<p>It&#8217;s time for me to start doing some bulk processing. Finally.</p>
<p>I&#8217;ll start with a few recent 2-hour podcast audio files. Transcribe them via the <a href="https://www.rev.ai/docs" target="_blank" rel="noopener">Rev.ai API</a>. Then process the transcripts from the raw JSON returned by the API into various formats including Markdown and basic HTML. Once I&#8217;ve a pipeline setup I&#8217;ll start working backwards through older episodes. Then I&#8217;ll iterate on whatever extra features seems most interesting at the time. Probably starting with search, and probably using <a href="https://www.elastic.co/what-is/elasticsearch" target="_blank" rel="noopener">Elasticsearch</a>.</p>
<p>Remember, these are just my results for this project and with these specific audio files and subject matter. Your mileage <em>will</em> vary. Work out what features you need and do your own testing with your own audio to work out which services will work best for you. Have fun.</p>
<p>Updates:</p>
<ul>
<li>22nd May 2020: Added results for <a href="https://assemblyai.com/" target="_blank">AssemblyAI</a>.</li>
<li>23rd May 2020: Added the approximate cost per hour for the services.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1782</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2020/04/untitled-2.png?w=676" medium="image">
			<media:title type="html">Example differences</media:title>
		</media:content>
	</item>
		<item>
		<title>A Comparison of Automatic Speech Recognition (ASR) Systems, part 2</title>
		<link>https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/</link>
					<comments>https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Mon, 11 Feb 2019 20:38:50 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[tech]]></category>
		<category><![CDATA[transcription]]></category>
		<guid isPermaLink="false">http://blog.timbunce.org/?p=1739</guid>

					<description><![CDATA[In my previous post I evaluated a number of Automatic Speech Recognition systems. That evaluation was useful but limited in an important way: it only used a single good quality audio file with a single pair of speakers (who both happened to be males with clear North American accents). Consequently there was no evaluation of &#8230; <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/" class="more-link">Continue reading <span class="screen-reader-text">A Comparison of Automatic Speech Recognition (ASR) Systems, part&#160;2</span></a>]]></description>
										<content:encoded><![CDATA[<p>In my <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/">previous post</a> I evaluated a number of Automatic Speech Recognition systems. That evaluation was useful but limited in an important way: it only used a single good quality audio file with a single pair of speakers (who both happened to be males with clear North American accents). Consequently there was no evaluation of performance across a variety of accents and varying audio quality etc.</p>
<p>To address that limitation I&#8217;ve tested 14 ASR systems with 12 different audio files, covering a range of accents and audio quality. This post presents the results.</p>
<p><span id="more-1739"></span><br />
<strong>Update: In May 2020 I wrote a <a href="https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/">follow-up post, part 3, with updated results for the best systems</a>, including Rev.ai, AssemblyAI, Google, Speechmatics, and 3Scribe.</strong></p>
<h2>Audio Samples</h2>
<p>For this evaluation I picked a number of interviews, spread over a range of years with a mix of accents and audio qualities, and used a 10 minute section of each one. Below I&#8217;ve listed some details of the audio files. Label is the identifier for the audio file used in the results table, the first two digits are the year of the recording.</p>
<table>
<tbody>
<tr>
<th>Label</th>
<th>MP3 Attributes (all 16-bit)</th>
<th>Interviewees</th>
</tr>
<tr>
<td>F10.A41</td>
<td>48 kbps, 44.1 kHz, Joint Stereo</td>
<td>Female, Irish accent</td>
</tr>
<tr>
<td>F11.A97</td>
<td>96 kbps, 44.1kHz, Mono</td>
<td>Male, Caribbean accent</td>
</tr>
<tr>
<td>F13.B52</td>
<td>64 kbps, 44.1kHz, Joint Stereo</td>
<td>Female, British accent</td>
</tr>
<tr>
<td>F14.B18</td>
<td>96 kbps, 44.1kHz, Mono</td>
<td>Female, North American accent</td>
</tr>
<tr>
<td>F14.C42</td>
<td>96 kbps, 44.1kHz, Mono</td>
<td>Male, North American accent</td>
</tr>
<tr>
<td>F15.C96</td>
<td>96 kbps, 44.1kHz, Mono</td>
<td>Male, North American accent</td>
</tr>
<tr>
<td>F16.D64</td>
<td>64 kbps, 48kHz, Mono</td>
<td>Male, Indian accent</td>
</tr>
<tr>
<td>F17.D83</td>
<td>64 kbps, 48kHz, Mono</td>
<td>Male, North American accent</td>
</tr>
<tr>
<td>F17.E03</td>
<td>64 kbps, 48kHz, Mono</td>
<td>Female, North American accent</td>
</tr>
<tr>
<td>F18.E82</td>
<td>128 kbps, 44.1 kHz, Joint Stereo</td>
<td>One male, two female (<em>crosstalk, clipping</em>)</td>
</tr>
<tr>
<td>F18.E83</td>
<td>256 kbps, 48 kHz, Joint Stereo</td>
<td>Male, French accent</td>
</tr>
<tr>
<td>F18.E84</td>
<td>256 kbps, 48 kHz, Joint Stereo</td>
<td>Male, North American accent</td>
</tr>
</tbody>
</table>
<p>I used roughly the same methodology as <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/" target="_blank" rel="noopener noreferrer">before</a>. I purchased verbatim transcripts, made and checked by humans, from three services: <a href="https://www.rev.com" target="_blank" rel="noopener noreferrer">Rev</a>, <a href="https://scribie.com/" target="_blank" rel="noopener noreferrer">Scribie</a>, and <a href="https://cielo24.com" target="_blank" rel="noopener noreferrer">Cielo24</a>. I compared the transcripts and wherever they differed I listened to the audio and decided on the &#8216;ground truth&#8217; to use for the evaluation.</p>
<p>I want to take a moment to give credit to <a href="https://www.rev.com" target="_blank" rel="noopener noreferrer">Rev</a> for great service. They cost $1/min yet delivered all the transcripts within 4 hours and had the lowest WER score of 3.8, compared with 4.2 for Scribie ($1/min) and 5.5 for Cielo24&#8217;s top &#8220;Best+&#8221; service ($2/min).</p>
<p>For Microsoft I had to convert the files to WAV format (16-bit mono 16kHz) because that&#8217;s the only format their SDK supports. Similarly for Google I converted the files to FLAC (16-bit mono 16kHz). Both are lossless conversions. All the other services accepted the original MP3 format.</p>
<h2>Results</h2>
<p>The table below presents the results. The &#8216;Humans&#8217; row of the table shows the median WER score for the three human transcripts. The service rows are ordered by the median of their WER scores across all 12 files. Each cell is color coded according to the degree to which the WER score is better (lower, deeper green) or worse (higher, deeper red) than the median of the ASR results for that file (shown in a middle row).</p>
<table style="border:1px #aaa;border-style:solid;border-collapse:collapse;border-spacing:0;">
<tbody>
<tr>
<td style="padding:2px;">Service</td>
<td style="padding:2px;">Median</td>
<td style="padding:2px;">F10<br />
A41</td>
<td style="padding:2px;">F11<br />
A97</td>
<td style="padding:2px;">F13<br />
B52</td>
<td style="padding:2px;">F14<br />
C18</td>
<td style="padding:2px;">F14<br />
C42</td>
<td style="padding:2px;">F15<br />
C96</td>
<td style="padding:2px;">F16<br />
D64</td>
<td style="padding:2px;">F17<br />
D83</td>
<td style="padding:2px;">F17<br />
E03</td>
<td style="padding:2px;">F18<br />
E82</td>
<td style="padding:2px;">F18<br />
E83</td>
<td style="padding:2px;">F18<br />
E84</td>
</tr>
<tr>
<td style="padding:2px;">Humans</td>
<td style="text-align:center;padding:2px;" title="4.24">&nbsp;&nbsp;4.2</td>
<td style="text-align:center;padding:2px;" title="4.22">4</td>
<td style="text-align:center;padding:2px;" title="3.96">4</td>
<td style="text-align:center;padding:2px;" title="2.85">3</td>
<td style="text-align:center;padding:2px;" title="3.32">3</td>
<td style="text-align:center;padding:2px;" title="4.90">5</td>
<td style="text-align:center;padding:2px;" title="3.50">4</td>
<td style="text-align:center;padding:2px;" title="2.95">3</td>
<td style="text-align:center;padding:2px;" title="5.50">6</td>
<td style="text-align:center;padding:2px;" title="4.25">4</td>
<td style="text-align:center;padding:2px;" title="11.06">11</td>
<td style="text-align:center;padding:2px;" title="6.05">6</td>
<td style="text-align:center;padding:2px;" title="4.86">5</td>
</tr>
<tr>
<td style="padding:2px;">Google Enh. Video</td>
<td style="text-align:center;padding:2px;" title="11.27">11.3</td>
<td style="background-color:#e9ffbf;text-align:center;padding:2px;" title="20.64">21</td>
<td style="background-color:#c1ff45;text-align:center;padding:2px;" title="9.28">9</td>
<td style="background-color:#d9ff8f;text-align:center;padding:2px;" title="10.07">10</td>
<td style="background-color:#cfff71;text-align:center;padding:2px;" title="7.96">8</td>
<td style="background-color:#c6ff54;text-align:center;padding:2px;" title="10.91">11</td>
<td style="background-color:#ccff65;text-align:center;padding:2px;" title="8.33">8</td>
<td style="background-color:#beff3c;text-align:center;padding:2px;" title="12.24">12</td>
<td style="background-color:#b2ff19;text-align:center;padding:2px;" title="8.77">9</td>
<td style="background-color:#caff61;text-align:center;padding:2px;" title="11.63">12</td>
<td style="background-color:#ffe2d3;text-align:center;padding:2px;" title="22.45">22</td>
<td style="background-color:#f2ffd9;text-align:center;padding:2px;" title="13.85">14</td>
<td style="background-color:#f5ffe1;text-align:center;padding:2px;" title="13.12">13</td>
</tr>
<tr>
<td style="padding:2px;">Descript</td>
<td style="text-align:center;padding:2px;" title="11.38">11.4</td>
<td style="background-color:#e7ffb7;text-align:center;padding:2px;" title="20.35">20</td>
<td style="background-color:#beff3d;text-align:center;padding:2px;" title="9.15">9</td>
<td style="background-color:#c7ff57;text-align:center;padding:2px;" title="9.05">9</td>
<td style="background-color:#d4ff7f;text-align:center;padding:2px;" title="8.14">8</td>
<td style="background-color:#cfff70;text-align:center;padding:2px;" title="11.40">11</td>
<td style="background-color:#baff32;text-align:center;padding:2px;" title="7.63">8</td>
<td style="background-color:#c3ff4b;text-align:center;padding:2px;" title="12.67">13</td>
<td style="background-color:#b2ff19;text-align:center;padding:2px;" title="9.08">9</td>
<td style="background-color:#e1ffa5;text-align:center;padding:2px;" title="13.05">13</td>
<td style="background-color:#c6ff56;text-align:center;padding:2px;" title="18.10">18</td>
<td style="background-color:#c0ff42;text-align:center;padding:2px;" title="11.35">11</td>
<td style="background-color:#f7ffe9;text-align:center;padding:2px;" title="13.25">13</td>
</tr>
<tr>
<td style="padding:2px;">Speechmatics</td>
<td style="text-align:center;padding:2px;" title="11.77">11.8</td>
<td style="background-color:#daff90;text-align:center;padding:2px;" title="18.90">19</td>
<td style="background-color:#e1ffa5;text-align:center;padding:2px;" title="10.85">11</td>
<td style="background-color:#d3ff7c;text-align:center;padding:2px;" title="9.71">10</td>
<td style="background-color:#e4ffaf;text-align:center;padding:2px;" title="8.74">9</td>
<td style="background-color:#caff62;text-align:center;padding:2px;" title="11.16">11</td>
<td style="background-color:#d6ff84;text-align:center;padding:2px;" title="8.74">9</td>
<td style="background-color:#e9ffbd;text-align:center;padding:2px;" title="16.05">16</td>
<td style="background-color:#c7ff57;text-align:center;padding:2px;" title="10.56">11</td>
<td style="background-color:#e4ffae;text-align:center;padding:2px;" title="13.23">13</td>
<td style="background-color:#dcff97;text-align:center;padding:2px;" title="19.42">19</td>
<td style="background-color:#fcfff7;text-align:center;padding:2px;" title="14.35">14</td>
<td style="background-color:#e6ffb6;text-align:center;padding:2px;" title="12.38">12</td>
</tr>
<tr>
<td style="padding:2px;">TranscribeMe</td>
<td style="text-align:center;padding:2px;" title="11.89">11.9</td>
<td style="background-color:#d0ff72;text-align:center;padding:2px;" title="17.80">18</td>
<td style="background-color:#c6ff56;text-align:center;padding:2px;" title="9.56">10</td>
<td style="background-color:#d3ff7c;text-align:center;padding:2px;" title="9.71">10</td>
<td style="background-color:#dfffa1;text-align:center;padding:2px;" title="8.56">9</td>
<td style="background-color:#d6ff85;text-align:center;padding:2px;" title="11.77">12</td>
<td style="background-color:#c2ff48;text-align:center;padding:2px;" title="7.93">8</td>
<td style="background-color:#f2ffd8;text-align:center;padding:2px;" title="16.85">17</td>
<td style="background-color:#ccff68;text-align:center;padding:2px;" title="10.81">11</td>
<td style="background-color:#d0ff73;text-align:center;padding:2px;" title="12.00">12</td>
<td style="background-color:#e5ffb3;text-align:center;padding:2px;" title="20.01">20</td>
<td style="background-color:#e7ffb7;text-align:center;padding:2px;" title="13.29">13</td>
<td style="background-color:#edffca;text-align:center;padding:2px;" title="12.72">13</td>
</tr>
<tr>
<td style="padding:2px;">Temi</td>
<td style="text-align:center;padding:2px;" title="12.65">12.7</td>
<td style="background-color:#fbfff4;text-align:center;padding:2px;" title="22.60">23</td>
<td style="background-color:#ffbd9d;text-align:center;padding:2px;" title="13.92">14</td>
<td style="background-color:#fcfff8;text-align:center;padding:2px;" title="11.97">12</td>
<td style="background-color:#d3ff7b;text-align:center;padding:2px;" title="8.08">8</td>
<td style="background-color:#c8ff5c;text-align:center;padding:2px;" title="11.04">11</td>
<td style="background-color:#f4ffdf;text-align:center;padding:2px;" title="9.97">10</td>
<td style="background-color:#ffd8c5;text-align:center;padding:2px;" title="19.65">20</td>
<td style="background-color:#eaffc0;text-align:center;padding:2px;" title="12.11">12</td>
<td style="background-color:#ffceb6;text-align:center;padding:2px;" title="16.43">16</td>
<td style="background-color:#d2ff7a;text-align:center;padding:2px;" title="18.83">19</td>
<td style="background-color:#cdff6b;text-align:center;padding:2px;" title="12.04">12</td>
<td style="background-color:#f6ffe5;text-align:center;padding:2px;" title="13.18">13</td>
</tr>
<tr>
<td style="padding:2px;">Otter.ai</td>
<td style="text-align:center;padding:2px;" title="12.99">13.0</td>
<td style="background-color:#f8ffeb;text-align:center;padding:2px;" title="22.25">22</td>
<td style="background-color:#f0ffd3;text-align:center;padding:2px;" title="11.60">12</td>
<td style="background-color:#fffaf8;text-align:center;padding:2px;" title="12.19">12</td>
<td style="background-color:#e7ffb9;text-align:center;padding:2px;" title="8.86">9</td>
<td style="background-color:#f4ffdf;text-align:center;padding:2px;" title="13.37">13</td>
<td style="background-color:#ffece2;text-align:center;padding:2px;" title="10.78">11</td>
<td style="background-color:#f3ffdb;text-align:center;padding:2px;" title="16.92">17</td>
<td style="background-color:#f5ffe1;text-align:center;padding:2px;" title="12.60">13</td>
<td style="background-color:#fffefd;text-align:center;padding:2px;" title="14.95">15</td>
<td style="background-color:#daff90;text-align:center;padding:2px;" title="19.29">19</td>
<td style="background-color:#d5ff82;text-align:center;padding:2px;" title="12.41">12</td>
<td style="background-color:#fff1ea;text-align:center;padding:2px;" title="13.98">14</td>
</tr>
<tr>
<td style="padding:2px;">SimonSays.ai</td>
<td style="text-align:center;padding:2px;" title="13.43">13.4</td>
<td style="background-color:#fff8f4;text-align:center;padding:2px;" title="23.35">23</td>
<td style="background-color:#ffe1d3;text-align:center;padding:2px;" title="13.04">13</td>
<td style="background-color:#fbfff5;text-align:center;padding:2px;" title="11.90">12</td>
<td style="background-color:#ffad85;text-align:center;padding:2px;" title="11.27">11</td>
<td style="background-color:#f9ffed;text-align:center;padding:2px;" title="13.61">14</td>
<td style="background-color:#ffe0d1;text-align:center;padding:2px;" title="11.01">11</td>
<td style="background-color:#ffe7db;text-align:center;padding:2px;" title="19.01">19</td>
<td style="background-color:#e7ffb7;text-align:center;padding:2px;" title="11.98">12</td>
<td style="background-color:#fcfff7;text-align:center;padding:2px;" title="14.77">15</td>
<td style="background-color:#e8ffba;text-align:center;padding:2px;" title="20.14">20</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="14.47">14</td>
<td style="background-color:#f7ffe9;text-align:center;padding:2px;" title="13.25">13</td>
</tr>
<tr>
<td style="padding:2px;"><em>Median of ASR results</em></td>
<td style="text-align:center;padding:2px;" title="13.77"><em>13.8</em></td>
<td style="text-align:center;padding:2px;" title="22.98"><em>23</em></td>
<td style="text-align:center;padding:2px;" title="12.32"><em>12</em></td>
<td style="text-align:center;padding:2px;" title="12.08"><em>12</em></td>
<td style="text-align:center;padding:2px;" title="9.74"><em>10</em></td>
<td style="text-align:center;padding:2px;" title="13.92"><em>14</em></td>
<td style="text-align:center;padding:2px;" title="10.40"><em>10</em></td>
<td style="text-align:center;padding:2px;" title="17.97"><em>18</em></td>
<td style="text-align:center;padding:2px;" title="13.04"><em>13</em></td>
<td style="text-align:center;padding:2px;" title="14.92"><em>15</em></td>
<td style="text-align:center;padding:2px;" title="21.56"><em>22</em></td>
<td style="text-align:center;padding:2px;" title="14.47"><em>14</em></td>
<td style="text-align:center;padding:2px;" title="13.62"><em>14</em></td>
</tr>
<tr>
<td style="padding:2px;">Go Transcribe</td>
<td style="text-align:center;padding:2px;" title="14.22">14.2</td>
<td style="background-color:#ffdfcf;text-align:center;padding:2px;" title="24.74">25</td>
<td style="background-color:#d3ff7b;text-align:center;padding:2px;" title="10.17">10</td>
<td style="background-color:#fff8f5;text-align:center;padding:2px;" title="12.26">12</td>
<td style="background-color:#fcfff7;text-align:center;padding:2px;" title="9.64">10</td>
<td style="background-color:#fff3ee;text-align:center;padding:2px;" title="14.22">14</td>
<td style="background-color:#efffd1;text-align:center;padding:2px;" title="9.79">10</td>
<td style="background-color:#ffb18b;text-align:center;padding:2px;" title="21.38">21</td>
<td style="background-color:#ffcaaf;text-align:center;padding:2px;" title="14.21">14</td>
<td style="background-color:#fcfff7;text-align:center;padding:2px;" title="14.77">15</td>
<td style="background-color:#ffad84;text-align:center;padding:2px;" title="24.09">24</td>
<td style="background-color:#ffab82;text-align:center;padding:2px;" title="16.53">17</td>
<td style="background-color:#f5ffe1;text-align:center;padding:2px;" title="13.12">13</td>
</tr>
<tr>
<td style="padding:2px;">Spext</td>
<td style="text-align:center;padding:2px;" title="14.51">14.5</td>
<td style="background-color:#fff4ee;text-align:center;padding:2px;" title="23.58">24</td>
<td style="background-color:#efffcf;text-align:center;padding:2px;" title="11.54">12</td>
<td style="background-color:#f2ffd8;text-align:center;padding:2px;" title="11.39">11</td>
<td style="background-color:#ffaa80;text-align:center;padding:2px;" title="11.33">11</td>
<td style="background-color:#ffe5d9;text-align:center;padding:2px;" title="14.59">15</td>
<td style="background-color:#ffe0d1;text-align:center;padding:2px;" title="11.01">11</td>
<td style="background-color:#d8ff8a;text-align:center;padding:2px;" title="14.54">15</td>
<td style="background-color:#ffebe1;text-align:center;padding:2px;" title="13.47">13</td>
<td style="background-color:#ffb38d;text-align:center;padding:2px;" title="17.29">17</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="30.28">30</td>
<td style="background-color:#ffffff;text-align:center;padding:2px;" title="14.47">14</td>
<td style="background-color:#ff894f;text-align:center;padding:2px;" title="16.64">17</td>
</tr>
<tr>
<td style="padding:2px;">Happy Scribe</td>
<td style="text-align:center;padding:2px;" title="14.94">14.9</td>
<td style="background-color:#ddff9b;text-align:center;padding:2px;" title="19.31">19</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="28.53">29</td>
<td style="background-color:#fff2ec;text-align:center;padding:2px;" title="12.41">12</td>
<td style="background-color:#fffaf7;text-align:center;padding:2px;" title="9.83">10</td>
<td style="background-color:#ffe8dc;text-align:center;padding:2px;" title="14.53">15</td>
<td style="background-color:#f5ffe2;text-align:center;padding:2px;" title="10.02">10</td>
<td style="background-color:#eaffc0;text-align:center;padding:2px;" title="16.13">16</td>
<td style="background-color:#ffbf9f;text-align:center;padding:2px;" title="14.45">14</td>
<td style="background-color:#fefffd;text-align:center;padding:2px;" title="14.89">15</td>
<td style="background-color:#ffad84;text-align:center;padding:2px;" title="24.09">24</td>
<td style="background-color:#ffb895;text-align:center;padding:2px;" title="16.22">16</td>
<td style="background-color:#ffcaaf;text-align:center;padding:2px;" title="14.98">15</td>
</tr>
<tr>
<td style="padding:2px;">AWS Transcribe</td>
<td style="text-align:center;padding:2px;" title="17.43">17.4</td>
<td style="background-color:#ffaf88;text-align:center;padding:2px;" title="27.34">27</td>
<td style="background-color:#ffb28c;text-align:center;padding:2px;" title="14.20">14</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="17.74">18</td>
<td style="background-color:#ff9764;text-align:center;padding:2px;" title="11.69">12</td>
<td style="background-color:#ff864a;text-align:center;padding:2px;" title="17.11">17</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="13.52">14</td>
<td style="background-color:#ff945f;text-align:center;padding:2px;" title="22.68">23</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="18.22">18</td>
<td style="background-color:#ffe0d1;text-align:center;padding:2px;" title="15.88">16</td>
<td style="background-color:#ffd1ba;text-align:center;padding:2px;" title="22.98">23</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="21.27">21</td>
<td style="background-color:#ff915a;text-align:center;padding:2px;" title="16.44">16</td>
</tr>
<tr>
<td style="padding:2px;">Scribie Auto</td>
<td style="text-align:center;padding:2px;" title="18.65">18.7</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="37.23">37</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="20.55">21</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="20.36">20</td>
<td style="background-color:#ff9764;text-align:center;padding:2px;" title="11.69">12</td>
<td style="background-color:#ff9e6d;text-align:center;padding:2px;" title="16.49">16</td>
<td style="background-color:#ffe6da;text-align:center;padding:2px;" title="10.90">11</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="30.02">30</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="17.36">17</td>
<td style="background-color:#ff6c23;text-align:center;padding:2px;" title="19.51">20</td>
<td style="background-color:#f0ffd3;text-align:center;padding:2px;" title="20.67">21</td>
<td style="background-color:#ff9c6b;text-align:center;padding:2px;" title="16.91">17</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="17.78">18</td>
</tr>
<tr>
<td style="padding:2px;">Cielo24</td>
<td style="text-align:center;padding:2px;" title="19.07">19.1</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="32.14">32</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="17.54">18</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="20.22">20</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="14.23">14</td>
<td style="background-color:#ffa477;text-align:center;padding:2px;" title="16.31">16</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="13.69">14</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="32.25">32</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="17.91">18</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="20.37">20</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="30.68">31</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="24.58">25</td>
<td style="background-color:#ff6d24;text-align:center;padding:2px;" title="17.38">17</td>
</tr>
<tr>
<td style="padding:2px;">Microsoft</td>
<td style="text-align:center;padding:2px;" title="20.25">20.3</td>
<td style="background-color:#ff9662;text-align:center;padding:2px;" title="28.73">29</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="16.18">16</td>
<td style="background-color:#ff712b;text-align:center;padding:2px;" title="15.91">16</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="12.78">13</td>
<td style="background-color:#ffa97e;text-align:center;padding:2px;" title="16.19">16</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="21.04">21</td>
<td style="background-color:#ffb38d;text-align:center;padding:2px;" title="21.31">21</td>
<td style="background-color:#ff8c53;text-align:center;padding:2px;" title="15.57">16</td>
<td style="background-color:#ff6e26;text-align:center;padding:2px;" title="19.45">19</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="29.56">30</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="25.33">25</td>
<td style="background-color:#ff6519;text-align:center;padding:2px;" title="21.64">22</td>
</tr>
</tbody>
</table>
<p>I tested <a href="https://www.descript.com" target="_blank" rel="noopener noreferrer">Descript</a> as an afterthought. Descript <a href="https://help.descript.com/faq-and-troubleshooting/app-features-and-functionality/how-accurate-is-descripts-automatic-transcription" target="_blank" rel="noopener noreferrer">use Google as the backend ASR service</a> (with some custom post-processing, I&#8217;m told) and has a very nice app with a rich feature set. Testing Descript turned out to be helpful in highlighting what appears to be a bug in the Google service.</p>
<p>Let&#8217;s explore the odd results for F18.E82. That audio was by far the most challenging in this evaluation. There were four speakers, informal banter and cross-talking, and the audio was slightly <a href="https://en.wikipedia.org/wiki/Clipping_(audio)" target="_blank" rel="noopener noreferrer">clipped</a>. The Human WER score of 11 reflects differences in how the humans rendered the speakers talking over one another and their <a href="https://en.wikipedia.org/wiki/Speech_disfluency" target="_blank" rel="noopener noreferrer">disfluencies</a>.</p>
<p>Google&#8217;s unusually poor result for this file was due to missing chunks of the transcript. When I first tried it there were two large chunks (~50s each) and some smaller chunks missing, and the WER score was 35! I tried rerunning the transcription, and then again with different audio formats, but it didn&#8217;t help.</p>
<p>A few days later I tested Descript. It scored an inconsistent mix of good and bad results with a median of 14. That seemed odd for a service that uses Google, especially as it had a better score (28) than Google for F18.E82. I retested Google and it improved to 22 (with 254 more words than in the previous Google transcript). I retested Descript and it improved to 18 (with 183 more words than in the previous Descript transcript). Those results haven&#8217;t changed with further testing. Using Google directly for that file still gets a worse result than using Descript, mostly due to Google&#8217;s transcript missing a 16 second chunk. Odd.</p>
<p>I regenerated Descript transcripts for the four files that had much worse results than Google&#8217;s and they all improved (F10.A41 23.6→20.4; F11.A97 14.6→9.2; F14.B18 11.87→8.14; F14.C42 14.53→11.40; F15.C96 17.60→7.63; F16.D64 15.91→12.67).</p>
<p>This seems like a significant problem with the Google service. I&#8217;ve reported it to Descript and had an acknowledgement but haven&#8217;t heard back yet.</p>
<h2>Non-runners</h2>
<p>I didn&#8217;t retest <strong>Trint</strong> or <strong>Sonix</strong> because, as noted in my previous post, Trint, Sonix, and Speechmatics have very little difference between their transcripts, a differential WER of just 1.4. That suggests those three services are using very similar models and training data.</p>
<p><strong>VoiceBase</strong> are now represented by Cielo24, who have taken over the web service.</p>
<p>I had included IBM&#8217;s <strong>Watson</strong> service in this test, hoping it had improved (especially as it now takes MP3 so I didn&#8217;t need to transcode as I had before). It was consistently the worst performer, with a median WER of 24, so I dropped it from the results.</p>
<p>I&#8217;d also planned to include <a href="https://remeeting.com/" target="_blank" rel="noopener noreferrer"><strong>Remeeting</strong></a> which I came across after my previous testing and looked promising. Their results were generally similar or worse than Cielo24&#8217;s, with a couple of transcripts much worse due to extra duplicated fragments of text. They seem to do a good job with speaker identification so I&#8217;ll include them in any future testing I do for that.</p>
<p>I was contacted by <a href="https://unravelhq.com">Unravel</a> shortly before posting this. They, like Descript, use Google to provide the transcripts. Their service is basic and their pricing is low ($15 for 300 mins/month) with a free tier (60 mins/month). While testing the service I encountered the same problem with missing chunks that I described above.</p>
<h2>Pre-trainers</h2>
<p>A valid concern with the previous evaluation was that a transcript for the audio I used was available on the internet and so may have been included in the training data for the ASR systems. I doubted that would make much difference in practice, given the quantity of training data needed by ASR systems, but wanted to check.</p>
<p>The last three files (F18.E82, F18.E83, and F18.E84) in this new evaluation were all transcribed before being published on the internet. It&#8217;s interesting to note that Scribie was one of the services I used to generate human transcripts and the Scribie Auto ASR service did unusually well on the F18.E82 file. Scribie also did well in my <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/">previous testing</a> where I&#8217;d also used them to generate the human transcript. (The F15.C96 file in this test is a 10 minute section of that same file and again Scribie Auto ASR did unusually well on that file.)</p>
<p>On the other hand, Scribie Auto ASR did poorly on all the other files even though I&#8217;d used Scribie for the human transcripts of them. Similarly Cielo24 doesn&#8217;t appear to have gained noticeable advantage from having generated human transcripts of the files.</p>
<p>Another data point is that Microsoft performed poorly for those last three files. If those files are removed from the results then Microsoft&#8217;s ranking rises above Amazon&#8217;s.</p>
<h2>Conclusions</h2>
<p>The clear winners in this test are Google&#8217;s enhanced video <em>($0.048/min)</em> and Speechmatics <em>($0.08/min)</em>, which came a close second on accuracy <em>and</em> price. (Though clearly there&#8217;s an issue with Google missing chunks in the transcript.)</p>
<p>TranscribeMe <em>($0.25/min)</em> is relatively accurate but also three times the price and lacks features I want. Temi <em>($0.10/min)</em> is only slightly worse yet less than half the price of TranscribeMe. Otter.ai <em>($0 up to 600mins/month, 6,000mins for $9.99/mo)</em> is good, though not as good as they appeared to be in my previous test.</p>
<p>Remember, these are just my results with these specific audio files and subject matter. Your mileage <em>will</em> vary. Do your own testing with your own audio to work out which services will work best for you.</p>
<p>Automatic Speech Recognition is amazingly good, yet still <em>far</em> from human levels of accuracy, especially for poor quality audio. Comparing transcripts from multiple services still looks like an appealing way to identify likely errors to aid human editing.</p>
<h2>What Next?</h2>
<p>Now there&#8217;s a clear winner (Google) I have confidence in the next step is to start generating transcripts for all the podcast episodes. Finally.</p>
<p>Once I&#8217;ve a workflow in place for that I can circle back and investigate how to add a workflow for human review and editing. That&#8217;s where I&#8217;d look more deeply into comparing the &#8216;master&#8217; transcript from Google with another, e.g. from Speechmatics, to identify and highlight likely errors.</p>
<p>I also have ideas for a simple way to compare the quality of speaker identification across services, which will likely prompt another blog post, one day.</p>
<p>There are more of my rambling thoughts in the What Next? section of my <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/">previous post</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1739</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>A Comparison of Automatic Speech Recognition (ASR) Systems</title>
		<link>https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/</link>
					<comments>https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Tue, 15 May 2018 14:19:00 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[tech]]></category>
		<category><![CDATA[transcription]]></category>
		<guid isPermaLink="false">http://blog.timbunce.org/?p=1542</guid>

					<description><![CDATA[Back in March 2016 I wrote Semi-automated podcast transcription about my interest in finding ways to make archives of podcast content more accessible. Please read that post for details of my motivations and goals. Some 11 months later, in February 2017, I wrote Comparing Transcriptions describing how I was exploring measuring transcription accuracy. That turned out to &#8230; <a href="https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/" class="more-link">Continue reading <span class="screen-reader-text">A Comparison of Automatic Speech Recognition (ASR)&#160;Systems</span></a>]]></description>
										<content:encoded><![CDATA[<p>Back in March 2016 I wrote <a href="https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/">Semi-automated podcast transcription</a> about my interest in finding ways to make archives of podcast content more accessible. Please read that post for details of my motivations and goals.</p>
<p>Some 11 months later, in February 2017, I wrote <a href="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/">Comparing Transcriptions</a> describing how I was exploring measuring transcription accuracy. That turned out to be more tricky, and interesting, than I’d expected. Please read that post for details of the methods I&#8217;m using and what the WER (word error rate) score means.</p>
<p>Here, after another over-long gap, I&#8217;m returning to post the current results, and start thinking about next steps. One cause of the delay has been that whenever I returned to the topic there had been significant changes in at least one of the results, most recently when <a href="https://siliconangle.com/blog/2018/04/09/google-improves-transcription-new-training-models-cloud-speech-text/" target="_blank" rel="noopener noreferrer">Google announced</a> their <a href="https://cloud.google.com/speech-to-text/docs/enhanced-models" target="_blank" rel="noopener noreferrer">enhanced models</a>. In the end the delay turned out to be helpful.</p>
<p><span id="more-1542"></span></p>
<h1>The Scores</h1>
<p>The table below shows the results of my tests on many automated speech recognition services, ordered by WER score (lower is better). I&#8217;ll note a major caveat up front: <strong>I only used a single audio file for these tests</strong>. An almost two hour interview in English between two North American males with no strong accents and good audio quality. I can&#8217;t be sure how the results would differ for female voices, more accented voices, lower audio quality etc. I plan to retest the top tier services with at least one other file in due course.</p>
<p><strong>Updates:</p>
<ul>
<li>In February 2019 I wrote a <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/">follow-up post, part 2, which presents the results of evaluating 14 ASR systems with 12 different audio files</a> covering a variety of speakers, accents, and audio quality. Naturally that gives more representative results.</li>
<li>In May 2020 I wrote a further <a href="https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/">follow-up post, part 3, with updated results for the best systems</a>, including Rev.ai, AssemblyAI, Google, Speechmatics, and 3Scribe.</li>
</ul>
<p></strong></p>
<p>You can&#8217;t beat a human, at least not yet. All the human services scored between 4 and 6. I described them in my <a href="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/" target="_blank" rel="noopener noreferrer">previous post</a>, so I won&#8217;t dwell on them here.</p>
<table class="tg" style="border:1px #aaa;border-style:solid;border-collapse:collapse;border-spacing:0;">
<tbody>
<tr>
<th class="tg-yw4l">Service</th>
<th class="tg-yw4l">WER</th>
<th class="tg-yw4l">Punctuation<br />
( <code>.</code> / <code>,</code> / <code>?</code> / names )</th>
<th class="tg-yw4l">Timing</th>
<th class="tg-yw4l">Other Features</th>
<th class="tg-yw4l">Approx Cost<br />
(not bulk)</th>
</tr>
<tr>
<td class="tg-yw4l">Human (<a href="https://www.voicebase.com" target="_blank" rel="noopener noreferrer">Voicebase</a>)</td>
<td class="tg-yw4l">4.10</td>
<td class="tg-yw4l">1090/1626/57/1056</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">$1.5/min</td>
</tr>
<tr>
<td class="tg-yw4l">Human (<a href="http://www.3playmedia.com/" target="_blank" rel="noopener noreferrer">3PlayMedia</a>)</td>
<td class="tg-yw4l">4.11</td>
<td class="tg-yw4l">1261/1470/76/1064</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">$3/min</td>
</tr>
<tr>
<td class="tg-yw4l">Human (<a href="https://scribie.com/" target="_blank" rel="noopener noreferrer">Scribie</a>)</td>
<td class="tg-yw4l">4.72</td>
<td class="tg-yw4l">923/1450/49/1153</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">$0.75/min</td>
</tr>
<tr>
<td class="tg-yw4l">Human (Volunteer)</td>
<td class="tg-yw4l">5.10</td>
<td class="tg-yw4l">840/1748/60/1208</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">Goodwill</td>
</tr>
<tr style="border-top:3px double #aaa;">
<td class="tg-yw4l"><a href="https://cloud.google.com/speech-to-text/" target="_blank" rel="noopener noreferrer">Google Speech-to-Text</a> (video model, not enhanced)</td>
<td class="tg-yw4l">10.06</td>
<td class="tg-yw4l">792/421/29/1238</td>
<td class="tg-yw4l">Words</td>
<td class="tg-yw4l">C, A, V</td>
<td class="tg-yw4l">$0.048/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://www.spext.co/" target="_blank" rel="noopener noreferrer">Spext</a></td>
<td class="tg-yw4l">10.44</td>
<td class="tg-yw4l">813/369/30/1263</td>
<td class="tg-yw4l">Lines</td>
<td class="tg-yw4l">E</td>
<td class="tg-yw4l">$0.16/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://otter.ai/" target="_blank" rel="noopener noreferrer">Otter AI</a></td>
<td class="tg-yw4l">10.79</td>
<td class="tg-yw4l">786/1166/35/1030</td>
<td class="tg-yw4l">Pgfs</td>
<td class="tg-yw4l">E, S</td>
<td class="tg-yw4l">Free up to 600 mins/month</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://www.speechmatics.com" target="_blank" rel="noopener noreferrer">Speechmatics</a></td>
<td class="tg-yw4l">11.35</td>
<td class="tg-yw4l">955/0/0/929</td>
<td class="tg-yw4l">Words</td>
<td class="tg-yw4l">S, C</td>
<td class="tg-yw4l">$0.08/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://trint.com" target="_blank" rel="noopener noreferrer">Trint</a></td>
<td class="tg-yw4l">11.39</td>
<td class="tg-yw4l">968/0/0/894</td>
<td class="tg-yw4l">Lines</td>
<td class="tg-yw4l">E</td>
<td class="tg-yw4l">$0.33/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://go-transcribe.com" target="_blank" rel="noopener noreferrer">Go-Transcribe</a></td>
<td class="tg-yw4l">11.46</td>
<td class="tg-yw4l">979/0/0/922</td>
<td class="tg-yw4l">Pgfs</td>
<td class="tg-yw4l">E</td>
<td class="tg-yw4l">$0.22/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://simonsays.ai" target="_blank" rel="noopener noreferrer">SimonSays</a></td>
<td class="tg-yw4l">11.64</td>
<td class="tg-yw4l">941/0/0/893</td>
<td class="tg-yw4l">Line</td>
<td class="tg-yw4l">E, S</td>
<td class="tg-yw4l">$0.17/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://sonix.ai" target="_blank" rel="noopener noreferrer">Sonix</a></td>
<td class="tg-yw4l">11.66</td>
<td class="tg-yw4l">943/0/0/900</td>
<td class="tg-yw4l">Lines</td>
<td class="tg-yw4l">D, S, E</td>
<td class="tg-yw4l">$0.083/min+$15/mon</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://www.temi.com" target="_blank" rel="noopener noreferrer">Temi</a></td>
<td class="tg-yw4l">11.95</td>
<td class="tg-yw4l">915/1329/51/862</td>
<td class="tg-yw4l">Pgfs</td>
<td class="tg-yw4l">S, E</td>
<td class="tg-yw4l">$0.10/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://scribie.com/transcription/free" target="_blank" rel="noopener noreferrer">Scribie ASR</a></td>
<td class="tg-yw4l">12.36</td>
<td class="tg-yw4l">970/1307/48/973</td>
<td class="tg-yw4l">None</td>
<td class="tg-yw4l">E</td>
<td class="tg-yw4l">Currently free</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://transcribeme.com/machine-express/" target="_blank" rel="noopener noreferrer">TranscribeMe</a></td>
<td class="tg-yw4l">12.55</td>
<td class="tg-yw4l">1203/0/63/836</td>
<td class="tg-yw4l">Lines</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">$0.25/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://support.google.com/youtube/answer/6373554?hl=en&amp;ref_topic=7296114" target="_blank" rel="noopener noreferrer">YouTube Captions</a></td>
<td class="tg-yw4l">13.68</td>
<td class="tg-yw4l">0/0/0/1075</td>
<td class="tg-yw4l">Lines</td>
<td class="tg-yw4l">S</td>
<td class="tg-yw4l">Currently free</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://www.voicebase.com" target="_blank" rel="noopener noreferrer">Voicebase</a></td>
<td class="tg-yw4l">15.40</td>
<td class="tg-yw4l">116/0/0/1119</td>
<td class="tg-yw4l">Lines</td>
<td class="tg-yw4l">E, V</td>
<td class="tg-yw4l">$0.02/min</td>
</tr>
<tr style="border-top:3px double #aaa;">
<td class="tg-yw4l"><a href="https://aws.amazon.com/transcribe/" target="_blank" rel="noopener noreferrer">AWS Transcribe</a></td>
<td class="tg-yw4l">21.70</td>
<td class="tg-yw4l">772/0/85/67</td>
<td class="tg-yw4l">Words</td>
<td class="tg-yw4l">S, C, A, V</td>
<td class="tg-yw4l">$0.02/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://www.ibm.com/watson/services/speech-to-text/" target="_blank" rel="noopener noreferrer">IBM Watson</a></td>
<td class="tg-yw4l">24.50</td>
<td class="tg-yw4l">11/0/0/896</td>
<td class="tg-yw4l">Words</td>
<td class="tg-yw4l">C, A, V</td>
<td class="tg-yw4l">$0.02/min</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="http://shop.nuance.co.uk/store/nuanceeu/en_GB/Content/pbPage.microsite-dragon-mac" target="_blank" rel="noopener noreferrer">Dragon</a> +vocabulary</td>
<td class="tg-yw4l">24.86</td>
<td class="tg-yw4l">9/7/0/967</td>
<td class="tg-yw4l">None</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">Free + €300 for app</td>
</tr>
<tr>
<td class="tg-yw4l"><a href="https://www.deepgram.com" target="_blank" rel="noopener noreferrer">Deepgram</a></td>
<td class="tg-yw4l">27.54</td>
<td class="tg-yw4l">715/1262/52/443</td>
<td class="tg-yw4l">Pgfs</td>
<td class="tg-yw4l">S, E</td>
<td class="tg-yw4l">$0.0183</td>
</tr>
<tr style="border-top:3px double #ccc;">
<td class="tg-yw4l"><a href="https://www.spokendata.com" target="_blank" rel="noopener noreferrer">SpokenData</a></td>
<td class="tg-yw4l">35.92</td>
<td class="tg-yw4l">1457/0/0/680</td>
<td class="tg-yw4l">Words</td>
<td class="tg-yw4l">S, E</td>
<td class="tg-yw4l">$0.12/min</td>
</tr>
</tbody>
</table>
<p><strong>WER</strong>: Word error rate (lower is better).</p>
<ul>
<li><strong>Punctuation</strong>: Number of sentences / commas / question marks / capital letters not at the start of a sentence (a rough proxy for proper nouns).</li>
<li><strong>Timing</strong>: Approximate highest precision timing: <strong>Words</strong> typically means a data format like JSON or XML with timing information for each word, <strong>Lines</strong> typically means a subtitle format like SRT, <strong>Pgfs</strong> (paragraphs) means some lower precision.</li>
<li><strong>Other Features</strong>: <strong>E</strong>=online editor, <strong>S</strong>=speaker identification (diarisation), <strong>A</strong>=suggested alternatives, <strong>C</strong>=confidence score, <strong>V</strong>=custom vocabulary (not used in these tests).</li>
<li><strong>Approx Cost</strong>: base cost, before any bulk discount, in USD.</li>
</ul>
<p>Note the clustering of WER scores. After the human services scoring from 4–6, the top-tier ASR services all score 10–16, with most around 12. The scores in the next tier are roughly double: 22–28. Seems likely that the top-tier systems are using more <a href="https://en.wikipedia.org/wiki/Speech_recognition#Modern_systems" target="_blank" rel="noopener noreferrer">modern technology</a>.</p>
<p>For <a href="https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/">my goals</a> I prioritise these features:</p>
<ul>
<li><strong>Accuracy</strong> is a priority, naturally, so most systems in the top-tier would do.</li>
<li>A <strong>custom vocabulary</strong> would further improve accuracy.</li>
<li><strong>Cost</strong>. Clearly $0.02/min is <em>much</em> more attractive than $0.33/min when there are hundreds of hours of archives to transcribe. (I&#8217;m ignoring bulk discounts for now.)</li>
<li><strong>Word level timing</strong> enables accurate linking to audio segments and helps enable comparison/merging of transcripts from multiple sources (such as taking punctuation from one transcript and applying it to another).</li>
<li>Good <strong>punctuation</strong> reduces the manual review effort required to polish the automated transcript into something pleasantly readable. Recognition of questions would also help with <a href="https://en.wikipedia.org/wiki/Text_segmentation#Topic_segmentation" target="_blank" rel="noopener noreferrer">topic segmentation</a>.</li>
<li><strong>Speaker identification</strong> would also help identify questions and enable multiple &#8216;timelines&#8217; to help resolve transcripts where there&#8217;s cross-talk.</li>
</ul>
<p>Before Google released their updated Speech-to-Text service in April there wasn&#8217;t a clear winner for me. Now there is. Their new <code>video</code> premium model is significantly better than anything else I&#8217;ve tested.</p>
<p>I also tested their <a href="https://cloud.google.com/speech-to-text/docs/enhanced-models" target="_blank" rel="noopener noreferrer">enhanced models</a> a few weeks after I initially posted this. It didn&#8217;t help for my test file. I also tried setting <code>interactionType</code> and <a href="https://www.naics.com/search/" target="_blank" rel="noopener noreferrer"><code>industryNaicsCodeOfAudio</code></a> in the <a href="https://cloud.google.com/speech-to-text/docs/recognition-metadata" target="_blank" rel="noopener noreferrer">recognition metadata</a> of the video model but that made the WER slightly worse. Perhaps they will improve over time.</p>
<p>Punctuation is clearly subjective but both Temi and Scribie get much closer than Google to the number of question marks and commas used by the human transcribers. Google did very well on capital letters though (a rough proxy for proper nouns).</p>
<p>I think we&#8217;ll see a growing ecosystem of tools and services using Google Speech-to-Text service as a backend. The <a href="https://www.descript.com" target="_blank" rel="noopener noreferrer">Descript app</a> is an interesting example.</p>
<h3 id="differential-analysis">Differential Analysis</h3>
<p>While working on <a href="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/">Comparing Transcriptions</a> I&#8217;d realized that comparing transcripts from multiple services is a good way to find errors because they tend to make different mistakes.</p>
<p>So for this post I also compared most of the top-tier services against one another, i.e. using the transcript from one as the &#8216;ground truth&#8217; for scoring others. <em>A higher WER score in this test is good</em>. It means the services are making <em>different</em> mistakes and those differences would highlight errors.</p>
<p>Google, Otter AI, Temi, Voicebase, Scribie, and TranscribeMe all scored a high WER, over 10, against all the others. Go-Transcribe vs Speechmatics had a WER of 6.1. SimonSays had a WER of 5.2 against Sonix, Trint, and Speechmatics. Trint, Sonix, and Speechmatics have very little difference between the transcripts, a WER of just 1.4. That suggests those three services are using very similar models and training data.</p>
<h3>What Next?</h3>
<p>My primary goal is to get the transcripts available and searchable, so the next phase would be developing a simple process to transcribe each podcast and convert the result into web pages. That much seems straightforward using the Google API. Then there&#8217;s working with the podcast host to integrate with their website, style, menus etc.</p>
<p>After that the steps are a more fuzzy. I&#8217;ll be crossing the river by feeling the stones&#8230;</p>
<p>The automated transcripts will naturally have errors that people notice (and more that they won&#8217;t). To improve the quality it&#8217;s important to make it <em>very</em> easy for them to contribute corrections. Being able to listen to the corresponding section of audio would be a great help. All that will require a web-based user interface backed by a service and a suitable data model.</p>
<p>The suggested corrections will need reviewing and merging. That will require its own low-friction workflow. I have a vague notion of using <a href="https://guides.github.com" target="_blank" rel="noopener noreferrer">GitHub</a> for this.</p>
<p>Generating transcripts from at least one other service would provide a way to highlight possible errors, in both words and punctuation. Those highlights would be useful for readers and also encourage the contribution of corrections. Otter API, Speechmatics and Voicebase are attractive low-cost options for these extra transcriptions, as are any contributed by volunteers. This kind of multi-transcription functionality has significant implications for the data model.</p>
<p>I&#8217;d like to directly support translations of the transcriptions. The original transcription is a moving target as corrections are submitted over time, so the translations would need to track corrections applied to the original transcription since the translation was created. Translators are also very likely to notice errors in the original, especially if they&#8217;re working from the audio.</p>
<p>Before getting into any design or development work, beyond the basic transcriptions, I&#8217;d want to do another round of due-dilligence research, looking for what services and open source projects might be useful components or form good foundations. <a href="https://www.amara.org/en/" target="_blank" rel="noopener noreferrer">Amara</a> springs to mind. If you know of any existing projects or services that may be relevant please add a comment or let me know in some other way.</p>
<p>I&#8217;m not sure when, or even if, I&#8217;ll have any further updates on this hobby project. If you&#8217;re interested in helping out feel free to email me.</p>
<p>I hope you&#8217;ve found my rambling explorations interesting.</p>
<p>Updates:</p>
<ul>
<li>25th May 2018: Updated SimonSays.ai with much improved score</li>
<li>10th June 2018: Updated notes about Google enhanced model (not helping WER score).</li>
<li>8th September 2018: Added Otter AI, prompted by <a href="https://medium.com/descript/which-automatic-transcription-service-is-the-most-accurate-2018-2e859b23ed19" target="_blank" rel="noopener noreferrer">a note in a blog post by Descript comparing ASR systems</a>.</li>
<li>10th September 2018: Emphasised that I only used a single audio file for these tests. Noted that Otter.ai is free up to 600 mins/month.</li>
<li>14th September 2018: Added Spext.</li>
<li>14th September 2018: <a href="https://news.ycombinator.com/item?id=17986941">Discussion</a> about this post on Hacker News.</li>
<li>15th November 2018: Removed results for Vocapia at their request since they &#8220;do not consider that the testing was done in a scientifically rigorous manner&#8221;.</li>
<li>9th January 2019: Updated WER scores resulting from updated cleanup and normalization code. (The code now removes terms that are commonly <a href="https://en.wikipedia.org/wiki/Speech_disfluency">speech disfluencies</a> such as &#8220;you know&#8221;, and &#8220;like&#8221;. This avoids penalizing services due to differences in how &#8220;verbatim&#8221; their results are.) All results got better, but some more than others. Spext overtook Otter.ai to take 2nd place.
<p>Trint overtook GoTranscribe and SimonSays to take 4th place, and Scribie ASR overtook TranscribeMe. You can see the <a href="https://web.archive.org/web/20181129211756/https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/">previous results on archive.org</a>. Note that I reused the same transcripts, only the scoring code changed. I&#8217;m working on a new blog post comparing many ASR services with 12 different audio files.</li>
<li>12th January 2019: Another update like the previous one, this time removing &#8220;yeah&#8221; which I had neglected to do previously. It&#8217;s one of the most frequent word errors in the ASR transcripts. (&#8220;Yeah&#8221; plays an interesting role in English discourse. Whole papers have been written about it, such as <a href="https://files.eric.ed.gov/fulltext/EJ1176966.pdf">Turn-initial Yeah in Nonnative Speakers’ Speech: A Routine Token for Not-so-routine Interactional Projects</a>.) Again, all the scores improved as expected, but some more than others. Speechmatics score dropped from 11.71 to 11.35, raising it 3 places and overtaking SimonSays, GoTranscribe, and Trint. Otherwise the ASR ranking was unchanged.</li>
<li>Feb 11th 2019: I&#8217;ve written a <a href="https://blog.timbunce.org/2019/02/11/a-comparison-of-automatic-speech-recognition-asr-systems-part-2/">follow-up post which presents the results of evaluating 14 ASR systems with 12 different audio files</a> covering a variety of speakers, accents, and audio quality.</li>
<li>May 20th 2020: I&#8217;ve added a link to my <a href="https://blog.timbunce.org/2020/05/17/a-comparison-of-automatic-speech-recognition-asr-systems-part-3/">follow-up post, part 3, which has updated results for the best systems</a>, including Rev.ai, AssemblyAI, Google, Speechmatics, and 3Scribe.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2018/05/15/a-comparison-of-automatic-speech-recognition-asr-systems/feed/</wfw:commentRss>
			<slash:comments>21</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1542</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Comparing Transcriptions</title>
		<link>https://blog.timbunce.org/2017/02/09/comparing-transcriptions/</link>
					<comments>https://blog.timbunce.org/2017/02/09/comparing-transcriptions/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Thu, 09 Feb 2017 18:18:57 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[tech]]></category>
		<category><![CDATA[transcription]]></category>
		<guid isPermaLink="false">http://blog.timbunce.org/?p=681</guid>

					<description><![CDATA[After a pause I am working again on my semi-automated podcast transcription project. The first part involves evaluating the quality of various methods of transcription. But how? In this post I&#8217;ll explore how I&#8217;ve been comparing transcripts to evaluate transcription services. I&#8217;ll include the results for some human-powered services. I&#8217;ll write up the results for automated services in a &#8230; <a href="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/" class="more-link">Continue reading <span class="screen-reader-text">Comparing Transcriptions</span></a>]]></description>
										<content:encoded><![CDATA[<p>After a pause I am working again on my <a href="https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/">semi-automated podcast transcription</a> project. The first part involves evaluating the quality of various methods of transcription. But how?</p>
<p>In this post I&#8217;ll explore how I&#8217;ve been comparing transcripts to evaluate transcription services. I&#8217;ll include the results for some human-powered services. I&#8217;ll write up the results for automated services in a later post.</p>
<p><span id="more-681"></span></p>
<h1>Accuracy</h1>
<p>The key metric for transcription is accuracy. How closely the words in the generated transcript match the spoken words in the original audio.</p>
<p>To compare the words in the transcript with the audio you need a <em>reference</em> transcript that&#8217;s deemed to be completely accurate; a <a href="https://en.wikipedia.org/wiki/Ground_truth" target="_blank">ground truth</a> upon which comparisons can be based. Then the sequence of words in that reference transcript can be compared against the sequence of words in the <em>hypothesis</em> transcript from each system being compared.</p>
<h2>Word Error Rate</h2>
<p>The <a href="https://en.wikipedia.org/wiki/Word_error_rate" target="_blank">Word Error Rate</a> (WER) is a very simple and widely use metric for transcription accuracy. It&#8217;s a number, calculated as the number of words that need to be changed or inserted or deleted to convert the hypothesis transcript into the reference transcript, divided by the number of words in the reference transcript. (It&#8217;s the <a href="https://en.wikipedia.org/wiki/Levenshtein_distance" target="_blank">Levenshtein distance</a> for words, measuring the minimum number of single word edits to correct the transcript.) A perfect match has a WER of zero, larger values indicate lower accuracy and thus more editing.</p>
<p>Of course there are <a href="https://en.wikipedia.org/wiki/Word_error_rate#Other_metrics" target="_blank">other metrics</a>, and arguments pointing out that <a href="https://www.microsoft.com/en-us/research/publication/why-word-error-rate-is-not-a-good-metric-for-speech-recognizer-training-for-the-speech-translation-task/" target="_blank">not all words are equally important</a>. For our purposes though, the simplicity of WER is very appealing and <a href="https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f">widely used in the industry</a>.</p>
<h2>One Word Per Line</h2>
<p>I decided early on that I&#8217;d simply convert a transcript text file into a file containing one word per line, and then use a simple <a href="https://en.wikipedia.org/wiki/Diff_utility" target="_blank">diff command</a> to identify the words that need to be changed/inserted/deleted, and the <a href="https://linux.die.net/man/1/diffstat" target="_blank">diffstat command</a> to count them up.</p>
<p>Simple, in theory. In practice a significant amount of &#8216;normalization&#8217; work, which I&#8217;ll describe below, was needed to reduce spurious differences.</p>
<h2>Visualizing the Differences</h2>
<p>A very useful command for inspecting and comparing these &#8216;word files&#8217; is <a href="http://vimcasts.org/episodes/comparing-buffers-with-vimdiff/" target="_blank">vimdiff</a>. It gives a clear colour-coded view of the differences between up to four files.</p>
<p>Here&#8217;s an example comparing word files that haven&#8217;t been normalized. The left-most column is the transcript produced by a human volunteer. The other three columns, from left to right, were generated by <em>automated</em> systems (in this case VoiceBase API, Dragon by Nuance, and Watson by IBM).</p>
<p><img data-attachment-id="855" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/vimdiff-nonorm/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png" data-orig-size="1954,1310" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="vimdiff-nonorm" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=676" class="alignnone size-full wp-image-855" src="https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=676" alt="vimdiff-nonorm.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png 1954w, https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=150&amp;h=101 150w, https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=300&amp;h=201 300w, https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=768&amp;h=515 768w, https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=1024&amp;h=687 1024w, https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png?w=1440&amp;h=965 1440w" sizes="(max-width: 1954px) 100vw, 1954px" /></p>
<p>This very small section of the transcript has several interesting differences.</p>
<p>Before I talk about normalization I want to draw your attention to the second column, the automated transcript by VoiceBase. Note the &#8220;two thousand <em>and</em> two&#8221; vs &#8220;two thousand two&#8221; in the fourth column, and &#8220;just <em>down</em> the road&#8221; vs &#8220;just <em>on</em> the road&#8221; in the other three columns. The phrases &#8220;just <em>down</em> the road&#8221; and &#8220;two thousand <em>and</em> two&#8221; would be more common than the alternatives, yet the alternatives are correct in this case.</p>
<p>I suspect this is an example of automated services giving too much weight to their training data when selecting the best hypothesis for a sentence. (It&#8217;s a similar situation to autocorrect correcting your miss-spellings with correctly spelt but <a href="http://www.damnyouautocorrect.com/75844/best-autocorrects-april-2014/" target="_blank">inappropriate replacements</a>.) A key point here is that, because the chosen hypothesis is likely to read well, it&#8217;s harder for a human to notice this kind of error.</p>
<h2>Normalization</h2>
<p>You can see from the vimdiff output above that numbers can cause differences to be flagged even though the words convey the correct meaning: &#8220;2,000&#8221; vs &#8220;2000&#8221; vs &#8220;two thousand&#8221;, and &#8220;2002&#8221; vs &#8220;two thousand and two&#8221; vs &#8220;two thousand two&#8221;. To address this I wrote some code to &#8216;normalize&#8217; the words. For numbers it converts them to word form and handles years (&#8220;1960&#8221; to &#8220;nineteen sixty&#8221;) and pluralized years (&#8220;1960s&#8221; to to &#8220;nineteen sixties&#8221;) as special cases.</p>
<p>Words containing apostrophes are another case of differences: I&#8217;m splitting on any non-word character so &#8220;they&#8217;re&#8221; is being split into two words. Some transcription systems might produce &#8220;they are&#8221; (not shown in this example). <a href="http://www.grammarbook.com/punctuation/apostro.asp" target="_blank">Apostrophes are tricky</a>. To address this the normalization code expands common contractions, like &#8220;they&#8217;re&#8221; into &#8220;they are&#8221;, handles some non-possessive uses of &#8220;&#8216;s&#8221; and removes the apostrophe in all other cases. It&#8217;s a fudge of course but seems to work well enough.</p>
<p>Another significant area for normalization are <a href="https://www.grammarly.com/handbook/mechanics/compound-words/" target="_blank">compound words</a>. Some people, and systems, might write &#8220;audio book&#8221; while others write &#8220;audiobook&#8221; or &#8220;audio-book&#8221;. To address this the normalization code expands closed compound words, like &#8220;audiobook&#8221; into the separate words &#8220;audio book&#8221;. It seemed be too much work to generalize this so the normalization code has a hard-coded list that detects around 70 compound words that I encountered while testing. (Remember that the goal here isn&#8217;t perfection, it&#8217;s simply reducing the number of insignificant differences for my test cases so the WER scores are a more meaningful.)</p>
<p>Other normalizations the code handles include ordinals (&#8220;20th&#8221; becomes &#8220;twentieth&#8221;), informal terms (&#8220;gotta&#8221; becomes &#8220;got to&#8221;), spellings (&#8220;realise&#8221; becomes &#8220;realize&#8221;), and some abbreviations are collapsed (&#8220;L.A.&#8221; becomes &#8220;LA&#8221;).</p>
<h2>Transcripts Produced by Humans</h2>
<p>For my evaluation I chose a podcast episode of just under two hours length, that had good quality audio, and already had a transcript produced by a volunteer. The primary voice was an American male who spoke clearly but quickly and without a strong accent.</p>
<p>I suspected that a single human-produced transcript wouldn&#8217;t be sufficiently reliable so I ordered human-produced transcripts from three separate transcription services. This turned out to be more interesting, and more useful, than I&#8217;d anticipated.</p>
<ul>
<li><a href="http://www.3playmedia.com" target="_blank">3PlayMedia</a> &#8220;Extended (10 Days)&#8221;
<ul>
<li>Rate $3/min = $333.44 total.</li>
<li>Completed in 7 days.</li>
<li>Seven &#8220;flags&#8221; in the transcript, mostly &#8220;inaudible&#8221; or &#8220;interposing voices&#8221;.</li>
</ul>
</li>
<li><a href="https://scribie.com/" target="_blank">Scribie</a> &#8220;Flex 5 (3-5 Days)&#8221;
<ul>
<li>Rate $0.75/min = $83.35 total.</li>
<li>Completed in 3 days. Formats: TXT, DOC, ODF, PDF. (No timecodes)</li>
</ul>
</li>
<li><a href="http://www.voicebase.com/human-transcription/" target="_blank">VoiceBase</a> &#8220;Premium 99%, 5-7 Days, 3 human reviews&#8221;
<ul>
<li>Rate $1.5/min = $168.00 total.</li>
<li>Completed in 4 hours. Formats: PDF, RTF, SRT (timecodes).</li>
<li>Note that VoiceBase provide both human and automated transcription services.</li>
</ul>
</li>
</ul>
<p>The 4:1 difference in cost is notable! 3PlayMedia certainly charge a premium. Let&#8217;s see how their transcripts compare.</p>
<p>With the normalization I was expecting these transcripts, all produced by humans, to be closer to each other than they turned out to be. Here&#8217;s an example of some differences:</p>
<p><img data-attachment-id="1001" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-6/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png" data-orig-size="1952,430" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=676" class="alignnone size-full wp-image-1001" src="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png 1952w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=150&amp;h=33 150w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=300&amp;h=66 300w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=768&amp;h=169 768w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=1024&amp;h=226 1024w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png?w=1440&amp;h=317 1440w" sizes="(max-width: 1952px) 100vw, 1952px" /></p>
<p>(In this vimdiff image, and all the ones that follow, the columns from left-to-right are: Volunteer, 3PlayMedia, Scribie, VoiceBase.)</p>
<p>That shows three different transcriptions of the &#8220;to use&#8221; phrase. It&#8217;s not very clear on the audio and I&#8217;d agree with the majority here and say that &#8220;you used&#8221; was correct. In this example it isn&#8217;t significant but it does illustrate that imperfect human judgement is needed when the audio is isn&#8217;t completely clear. Transcribers have to write <em>something</em> and can&#8217;t easily express their degree of confidence. If it falls too low they might write something like &#8220;[inaudible]&#8221; or &#8220;[crosstalk]&#8221;, but above that threshold they take a guess <em>and</em> <em>you don&#8217;t know that</em>. And because the guess is likely to read well it&#8217;s harder for a human to notice this kind of error.</p>
<p>The second difference shown in that image relates to the difference between a <em>clean transcript</em> and a <em>verbatim transcript</em>. In a clean transcript conformational affirmations (“Uh-huh.”, “I see.”), filler words (&#8220;ah&#8221;, &#8220;um&#8221;) and other forms of <a href="https://en.wikipedia.org/wiki/Speech_disfluency" target="_blank">speech disfluency</a> aren&#8217;t included.</p>
<p>That &#8220;you are&#8221; is in the audio, so would be in a verbatim transcript, but three of the four humans decided that it wasn&#8217;t significant and should be left out of the clean transcript. But one of the four decided it should be kept in. Other common examples include &#8220;I mean&#8221; and &#8220;you know&#8221;. There&#8217;s no right answer here, it&#8217;s a judgement call case-by-case.</p>
<p>It works the other way as well. Sometimes the transcriber will <em>add</em> a word or two that they think makes the text more clear. Compare &#8220;everything submits to and is accountable to&#8221; with &#8220;everything submits to <em>it</em> and is accountable to&#8221;. Two of the four humans decided to add an &#8220;it&#8221; that wasn&#8217;t in the audio. Similarly &#8220;believe it&#8221; vs &#8220;believe <em>in</em> it&#8221;, here again two of the four added an &#8220;in&#8221;, only this time it was not the same transcribers adding the word. Transcribers are likely to &#8220;clean&#8221; transcripts in a way that&#8217;s biased towards their own speaking style. Speakers have a distinct verbal style and changes like these by a transcriber can be more distracting than helpful if not in keeping with the speakers own style.</p>
<p>Generally these interpretations of the audio, and writing the corresponding text, are made with care and don&#8217;t alter the meaning for the reader. At least that&#8217;s what I was thinking until I came across this: <img loading="lazy" data-attachment-id="1071" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-7/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png" data-orig-size="1952,108" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=676" class="alignnone size-full wp-image-1071" src="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png 1952w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=150&amp;h=8 150w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=300&amp;h=17 300w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=768&amp;h=42 768w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=1024&amp;h=57 1024w, https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png?w=1440&amp;h=80 1440w" sizes="(max-width: 1952px) 100vw, 1952px" /> To be fair this part of the audio is a little garbled due to crosstalk. I&#8217;m sure the speaker said &#8220;doesn&#8217;t matter&#8221; (which got normalized to &#8220;does not matter&#8221;). It&#8217;s another example of where the lack of confidence indicators in human transcripts is a problem.</p>
<p>Here are a few other examples of differences in these human transcripts that caught my eye:</p>
<p><img loading="lazy" data-attachment-id="1324" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-12/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png" data-orig-size="1950,256" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=676" class="alignnone size-full wp-image-1324" src="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png 1950w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=150&amp;h=20 150w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=300&amp;h=39 300w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=768&amp;h=101 768w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=1024&amp;h=134 1024w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png?w=1440&amp;h=189 1440w" sizes="(max-width: 1950px) 100vw, 1950px" /><img loading="lazy" data-attachment-id="1314" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-8/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png" data-orig-size="1946,110" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=676" class="alignnone size-full wp-image-1314" src="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png 1946w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=150&amp;h=8 150w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=300&amp;h=17 300w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=768&amp;h=43 768w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=1024&amp;h=58 1024w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png?w=1440&amp;h=81 1440w" sizes="(max-width: 1946px) 100vw, 1946px" /><img loading="lazy" data-attachment-id="1317" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-9/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png" data-orig-size="1952,188" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=676" class="alignnone size-full wp-image-1317" src="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png 1952w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=150&amp;h=14 150w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=300&amp;h=29 300w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=768&amp;h=74 768w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=1024&amp;h=99 1024w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png?w=1440&amp;h=139 1440w" sizes="(max-width: 1952px) 100vw, 1952px" /><img loading="lazy" data-attachment-id="1319" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-10/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png" data-orig-size="1950,184" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=676" class="alignnone size-full wp-image-1319" src="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png 1950w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=150&amp;h=14 150w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=300&amp;h=28 300w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=768&amp;h=72 768w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=1024&amp;h=97 1024w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png?w=1440&amp;h=136 1440w" sizes="(max-width: 1950px) 100vw, 1950px" /><img loading="lazy" data-attachment-id="1321" data-permalink="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/ssp_temp_capture-11/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png" data-orig-size="1948,114" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ssp_temp_capture" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=676" class="alignnone size-full wp-image-1321" src="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=676" alt="ssp_temp_capture.png"   srcset="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png 1948w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=150&amp;h=9 150w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=300&amp;h=18 300w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=768&amp;h=45 768w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=1024&amp;h=60 1024w, https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png?w=1440&amp;h=84 1440w" sizes="(max-width: 1948px) 100vw, 1948px" /></p>
<h2>Getting to Ground Truth</h2>
<p>To compare transcription services I needed a <em>reference</em> <em>transcript</em> &#8211; a <a href="https://en.wikipedia.org/wiki/Ground_truth" target="_blank">ground truth</a> &#8211; against which to compare the others.</p>
<p>Since the transcripts varied significantly I had little choice but to create my own &#8216;ground truth&#8217; transcript manually. I copied the transcript generated by the volunteer, then listened to the audio in the places where the various transcripts differed in a non-obvious way &#8211; well over 200 of them. For each place I decided what the most accurate transcription was and edited the ground truth transcript to match. The most difficult places were where multiple speakers talk over one another. It&#8217;s very hard to convey the intent clearly and accurately in a linear sequence of words. Often &#8216;clearly&#8217; and &#8216;accurately&#8217; are at odds with each other.</p>
<p>I repeated this process with the automated transcripts in order to add-in the disfluencies and conformational affirmations etc. present in the audio. In other words, to shift the ground truth transcript from a &#8216;clean&#8217; transcript to being closer to a &#8216;verbose&#8217; transcript. Without that work the apparent Word Error Rate of the automated transcripts would be unfairly higher. They would all be equally effected but that effect would reduce the visibility of the genuine differences. (With hindsight I should have used a separate file for this step but the overall process was iterative and exploratory rather than the linear sequence outlined here.)</p>
<p>The final ground truth transcript has 21,629 words.</p>
<h1>Other Attributes</h1>
<h2> Speaker Diarisation</h2>
<p>The transcripts produced by the volunteer and by Scribie identified the speakers. The transcript from Voicebase identified transitions from one speaker to another, but didn&#8217;t identify the specific speaker. The transcript from 3PlayMedia didn&#8217;t identify speakers or transitions, despite costing three to four times as much.</p>
<h2>Quality Flags</h2>
<p>3PlayMedia flagged seven places in the transcript with [? &#8230; ?] where the transcriber was unsure of the words but had made a reasonable guess, plus three instances of [INTERPOSING VOICES] and nine [INAUDIBLE]. Voicebase flagged five [CROSSTALK] and two [INAUDIBLE]. Scribie flagged none.</p>
<p>Most of 3PlayMedia transcription text flagged as unsure were correct. About half of the  INTERPOSING VOICES and  INAUDIBLE in the 3PlayMedia transcript the other services had accurate transcriptions for.</p>
<h2>Segmentation / Punctuation</h2>
<p>The volunteer transcript had 842 sentences in 158 paragraphs.<br />
The 3PlayMedia transcript had 1259 sentences in 331 paragraphs.<br />
The Scribie transcript had 915 sentences in 223 paragraphs.<br />
The Voicebase transcript had 1077 sentences in 648 paragraphs.</p>
<p>I&#8217;m not sure what to make of those wide differences. The figures are a little noisy due to artifacts in the way the files were processed, but most the differences seem to be due simply to style.</p>
<p>The figures for the automated systems (in a future post) highlight those which do a very poor job of segmentation. The big downside of Dragon by Nuance, for example, is that it doesn&#8217;t do segmentation. You simply get a very long stream of words. So you&#8217;ll still have a lot of work to do to make them usable, no matter how accurate they might be.</p>
<h1>Results for Human Transcription</h1>
<pre>Service      WER   Diarisation     Timing      Cost
==========   ===   ===========     ======      =======
3PlayMedia   4.5   None            Subtitles   $333.44
VoiceBase    4.6   Transitions     Subtitles   $168.00
Scribie      5.1   Speaker names   Paragraph   $ 83.35
Volunteer    5.3   Speaker names   None        N/A</pre>
<p>The important number here is the Word Error Rate (WER). Lower is better. The difference between 4.5 and 5.3 is quite small in practice. Most of the &#8216;errors&#8217; are in parts of the transcript are due to insignificant differences or ambiguous sections due to cross-talk.</p>
<p>I suspect a WER around 5 represents a reasonable &#8216;best case&#8217; for transcription of an interview. For comparison, the best of the automated transcription services I&#8217;m testing have WER of 12 to 16, with some in the 30 to 40 range.</p>
<p>All this work was to understand how to judge the accuracy of a transcription in order to evaluate automated systems. Comparing human transcription services turned out to be a useful approach to understanding the issues and help develop the tools.</p>
<p>It&#8217;s clear that for the highest accuracy it&#8217;s very helpful to use more than one service and check the places where they differ. Of course that significantly increases the cost and effort.</p>
<p>I&#8217;m testing a number of automated systems currently and I&#8217;ll include those results in a later blog post.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2017/02/09/comparing-transcriptions/feed/</wfw:commentRss>
			<slash:comments>19</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">681</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2016/12/vimdiff-nonorm.png" medium="image">
			<media:title type="html">vimdiff-nonorm.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture5.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2016/12/ssp_temp_capture6.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture4.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture1.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture2.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2017/01/ssp_temp_capture3.png" medium="image">
			<media:title type="html">ssp_temp_capture.png</media:title>
		</media:content>
	</item>
		<item>
		<title>Semi-automated podcast transcription</title>
		<link>https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/</link>
					<comments>https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Tue, 22 Mar 2016 23:01:17 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[transcription]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=609</guid>

					<description><![CDATA[The medium of podcasting continues to grow in popularity. Americans, for example, now listen to over 21 million hours of podcasts per day. Few of those podcasts have transcripts available, so the content isn&#8217;t discoverable, searchable, linkable, reusable. It&#8217;s lost. The typical solution is to pay a commercial transcription service, which charge roughly $1/minute and &#8230; <a href="https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/" class="more-link">Continue reading <span class="screen-reader-text">Semi-automated podcast transcription</span></a>]]></description>
										<content:encoded><![CDATA[<p>The medium of podcasting <a href="http://www.journalism.org/2015/04/29/podcasting-fact-sheet/">continues to grow in popularity</a>. Americans, for example, now listen to <a href="http://www.macrumors.com/2015/02/25/podcasts-growth-2014-serial/">over 21 million hours of podcasts per day</a>. Few of those podcasts have transcripts available, so the content <a href="https://scribie.com/blog/2015/09/english-podcast-and-transcription/">isn&#8217;t discoverable, searchable, linkable, reusable</a>. It&#8217;s lost.</p>
<p>The typical solution is to pay a commercial transcription service, which charge roughly $1/minute and claim around 98% accuracy. For a podcast producing an hour of content a week, that would add an overhead of around $250 a month. A back catalogue of a year of podcasts would cost over $3,100 to transcribe.</p>
<p>When I remember fragments of some story or idea that I recall hearing on a podcast, I&#8217;d like to be able to find it again. Without searchable transcripts I can&#8217;t. It&#8217;s impractical to listen to hundreds of old episodes, so the content is effectively lost.</p>
<p>Given the advances in automated speech recognition in recent years, I began to wonder if some kind of automated transcription system would be practical. This led on to some thinking about interesting user interfaces.</p>
<p>This (long) post is a record of my research and ponderings around this topic. I sketch out some goals, constraints, and a rough outline of what I&#8217;m thinking of, along with links to many tools, projects, and references to information that might help. I&#8217;ve also been updating it as I&#8217;ve come across extra information and new services.</p>
<p>I&#8217;m hoping someone will tell me that such a system, or parts of it, already exist so that I can contribute to those existing projects. If not then I&#8217;m interested in starting a new project – or projects – and would welcome any help. Read on if you&#8217;re interested&#8230;<span id="more-609"></span></p>
<h1>My Goals</h1>
<p>Here is an outline of functionality that I&#8217;d like from a basic automated system:</p>
<ol>
<li>Produce podcast transcripts as plain text on static web pages that are indexed by search engines.</li>
<li>Provide anchors to make it easy for people to link to a particular section, or sections, in the transcript.</li>
<li>Provide buttons to play the audio/video from that point. This requires the transcription to have <a href="https://en.wikipedia.org/wiki/Timecode">timecode</a> data.</li>
<li>Identify and show who is speaking, e.g. via <a href="https://en.wikipedia.org/wiki/Speaker_diarisation">speaker diarisation</a>.</li>
</ol>
<p>Of course, an automated transcription is likely to have errors. <a href="https://scribie.com/blog/2015/06/humans-are-better-than-machines-for-transcription/">Perhaps many</a>. For a popular podcast there are likely to be <em>some</em> members of the audience (perhaps many) who are willing to contribute <em>some</em> amount of time to checking and correcting errors, somewhat like <a href="https://en.wikipedia.org/wiki/Wikipedia:Contributing_to_Wikipedia">Wikipedia</a>. A low-friction user experience makes that more likely.</p>
<p>In other words, <a href="https://en.wikipedia.org/wiki/Crowdsourcing">crowdsourcing</a> of error checking and correction may be a viable way to close the &#8220;quality gap&#8221; between manual and automated transcriptions. At this point I&#8217;ve no idea how big that gap will be, though I&#8217;m confident it can be made small enough for this whole endeavour to be worthwhile. (I&#8217;m assuming that the podcasts will have clear high-quality audio.)</p>
<p>I have explored the options for transcription in more detail below.</p>
<p>Beyond the basic transcription, presentation, and editing features there are many interesting possibilities for future enhancements.</p>
<h2>Natural language processing</h2>
<p>Automated <a href="https://en.wikipedia.org/wiki/Natural_language_processing">natural language processing</a> is becoming <a href="http://cacm.acm.org/magazines/2016/3/198856-deep-or-shallow-nlp-is-breaking-out/fulltext">a lot more powerful</a> and could be used to <em>enrich</em> the transcript with extra information. <a href="http://gitxiv.com/category/natural-language-processing-nlp">For example</a>:</p>
<ul>
<li>Using <a href="https://en.wikipedia.org/wiki/Keyword_extraction">keyword extraction </a>to automatically identify suitable keywords for indexing, to aid search and discovery. Also <a href="https://en.wikipedia.org/wiki/Named-entity_recognition" target="_blank" rel="noopener">entity extraction</a> to identify the names of things, such as people, companies, or locations.</li>
<li>Identification of <em><a href="https://en.wikipedia.org/wiki/Text_segmentation#Topic_segmentation">topic segments</a></em> within a podcast is much more difficult, but also more useful. This is an interesting area of research, e.g. <a href="http://maui-indexer.blogspot.ie/2009/05/what-is-maui-about.html">Maui</a> (<a href="http://www.medelyan.com/software" target="_blank" rel="noopener">software</a>). I&#8217;d like to support overlapping segments to cover both high-level themes and the specifics within them.</li>
<li>The keyword extraction could then be applied to individual segments, as well as whole podcasts, to aid finer-grained indexing.</li>
<li>Some kind of classification of topics into, or with, a <a href="https://en.wikipedia.org/wiki/Taxonomy_(general)">taxonomy</a> might also be helpful for someone exploring a large topic space.</li>
<li>Generate <a href="https://en.wikipedia.org/wiki/Automatic_summarization">automatic summaries</a> of segments. The summaries for all the segments would form a summary of the episode.</li>
</ul>
<p>Those would open up alternative ways to search and explore a collection of podcasts. You&#8217;d be able to easily read or listen to all the segments that touch on a given topic across many episodes. Perhaps stitching them into a thread or &#8216;playlist&#8217; you can share with others, somewhat like <a href="https://storify.com">Storify.com</a>.</p>
<p>There are also more immediate, practical problems such as <a href="https://en.wikipedia.org/wiki/Sentence_boundary_disambiguation">recognising the boundary between sentences</a> and <a href="https://en.wikipedia.org/wiki/Truecasing">fixing the casing of words</a>. These aren&#8217;t critical but would significantly reduce the error checking and correction required to create a high quality transcript.</p>
<h2>Database Storage</h2>
<p>It should be clear by now that the underlying transcript data will need to be stored in some kind of database where it can be augmented with timecodes, speakers, segment details, keywords etc.</p>
<p>The database would also support user interfaces for error checking and correction, fine-tuning segments, and keywords etc.</p>
<p>From there the transcripts could be output in a variety of forms, from static web pages to rich interactive tools for exploration and sharing.</p>
<h2>Full Text Search</h2>
<p>Web search engines like Google and Bing are very good at what they do. Yet they are very general tools, trying to do the best they can for <em>all</em> the web pages on the internet. There are better tools for specific jobs.</p>
<p>One that I&#8217;m familiar with is <a href="https://www.elastic.co/products/elasticsearch">Elasticsearch</a> which has a rich set of features for <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/languages.html">dealing with human language</a> and powerful <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/full-text-search.html">full-text search capabilities</a>. Beyond its general capabilities, it can be taught <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms.html">synonyms</a> <em>specific to the topics covered by the podcast</em>. This would significantly improve the quality of search results.</p>
<h2>Video Subtitles/Captions</h2>
<p>I generally listen to podcasts as audio, while driving or resting, even those that are available as videos. I hadn&#8217;t given any thought to subtitles as another output format until I started researching what transcription tools, projects and services already existed. I&#8217;ll talk more about it below.</p>
<h2>Schematic</h2>
<p>Here&#8217;s a simple schematic, for what it&#8217;s worth: <img style="display:block;margin-left:auto;margin-right:auto;" src="https://blog.timbunce.org/wp-content/uploads/2016/02/transcription-data-flow-001.png?w=392&#038;h=239" alt="Transcription Data Flow Schematic" width="392" height="239" border="0" /></p>
<h1>What&#8217;s Out There</h1>
<h2>Applications that Facilitate Manual Transcription</h2>
<p>These tools typically provide a user-interface that combines a media player with a text editor. You play the media and start typing what you hear (as fast as you can), pause, rewind a bit, repeat.</p>
<p>Here are a selection for reference, in no particular order:</p>
<ul>
<li><a href="https://www.inqscribe.com/compare.html">InqScribe</a> for Mac and Windows. $39-99.</li>
<li><a href="http://www.researchware.com/products/hypertranscribe.html">HyperTRANSCRIBE</a>, Mac and Windows. $40.</li>
<li><a href="http://www.transana.org">Transana</a>, Mac and Windows. $75.</li>
<li><a href="http://transcriber-pro.com/en">Transcriber Pro</a>, Windows only. €10/year</li>
<li><a href="http://www.transcriptiongear.com/gearplayer-transcription-software">GearPlayer</a>, Windows only. $120.</li>
<li><a href="https://pmtrans.codeplex.com">pmTrans</a>, open source for Linux, Mac, and Windows. Free.</li>
<li><a href="http://www.nch.com.au/scribe/">Express Scribe</a>, for Mac and Windows. Free.</li>
<li><a href="https://transcribe.wreally.com">Transcribe</a>, web service, $20/year.</li>
<li><a href="https://scribie.com/transcription/editor">Scribie transcription editor</a>, web service. Free.</li>
<li><a href="http://otranscribe.com">oTranscribe</a>, web app, <a href="https://github.com/oTranscribe">open source</a>.</li>
<li><a href="http://nowtranscribe.com">NowTranscribe</a> combines automatic generation of a draft with <em>predictive correction</em> and automatic control of the audio playback. It&#8217;s an innovative approach that&#8217;s worth <a href="https://www.youtube.com/watch?v=mja9A0KPxLA&amp;list=PLV3AgXByReXw312ApNjjedQYXXFwzjmjn">seeing in action</a>.</li>
</ul>
<p>If you&#8217;re performing manual transcription at the moment, especially with a standard word processor, I&#8217;d urge you to try some of these. They may smooth out the process in many small ways that accumulate to save you a lot of time and effort.</p>
<p>When performing manual transcription it obviously helps to be able to type fast, ideally fast enough to keep up with the speakers. Approximate <a href="https://en.wikipedia.org/wiki/Words_per_minute">words per minute</a> rates are around 150–200 for typical podcast speakers, and 40–80 for average-to-good typists. That difference creates a problem.</p>
<p>Users of <a href="https://en.wikipedia.org/wiki/Dvorak_Simplified_Keyboard">Dvorak keyboards</a> often report significantly faster typing speeds. For maximum speed you might be interested in the <a href="http://www.openstenoproject.org">Open Stenography Project</a>.</p>
<p>Very few transcribers can keep up with typical speakers. The usual solution is to use a <a href="http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Delectronics&amp;field-keywords=transcription+foot+pedal&amp;rh=n%3A172282%2Ck%3Atranscription+foot+pedal">foot pedal</a> to rewind the media by a few seconds whenever needed, that way your fingers can stay on the home row of the keyboard. Yet every time you rewind there&#8217;s a break in your flow and productivity falls.</p>
<p>An alternative approach is to slow the media playing down to match a comfortable typing rate. This can be done with <a href="https://en.wikipedia.org/wiki/Audio_time-scale/pitch_modification">audio time-scale/pitch modification</a> techniques such as <a href="https://en.wikipedia.org/wiki/PSOLA">PSLOA</a> which can change the speed without altering the pitch. Most of the tools I&#8217;ve listed above support variable speed playback, but only a few explicitly mention maintaining the correct pitch. The free web-based <a href="https://scribie.com/transcription/editor">Scribie transcription editor</a> seems particularly good at this.</p>
<h2>Commercial Transcription Services</h2>
<p>These provide a service where you upload an audio or video file and get back a file containing the transcription. You&#8217;re paying some amount of money for someone to use an application (like those above) on your behalf, plus some level of quality checking. I&#8217;ll only list here a few services that provide timecoded transcriptions, including subtitling services.</p>
<p>At the very high-end, <a href="http://www.3playmedia.com/plans-pricing/">3play Media</a> are a traditional transcription service provider offering &#8220;Premium quality with +99% accuracy&#8221; for prices ranging from $2 to $3 per minute. They provide an API for upload/download.</p>
<p>At the very low-end, if you&#8217;re willing to handle the management of the work then <a href="https://www.fiverr.com/">Fiverr</a> have a number of <a href="https://www.fiverr.com/search/gigs?utf8=✓&amp;search_in=category&amp;source=top-bar&amp;locale=en&amp;query=transcribe&amp;category=5&amp;sub_category=109&amp;page=1&amp;layout=auto">people offering transcription services for $5</a> (typically for 10 to 20 minutes of transcription). Your mileage will vary.</p>
<p>In the innovative-middle-ground, <a href="https://scribie.com">Scribie</a> guarantee +98% accuracy, offer prices down to $0.70/min for 20-30 day turnaround, and include time-coding. There&#8217;s an additional charge of $1.00/minute for producing subtitles (SBV/SRT). They provide an <a href="https://scribie.com/docs/api">API</a> and have an interesting <a href="https://scribie.com/blog/">blog</a>. They also make their own <a href="https://scribie.com/transcription/editor">transcription editor</a> web application freely available for anyone to use. I like their technology and &#8216;<a href="https://scribie.com/blog/2014/10/how-is-crowdsourcing-better-than-outsourcing/">managed crowdsourcing</a>&#8216; approach.</p>
<h2>Commercial Transcription Services (behind the scenes)</h2>
<p>Speaking of crowdsourcing, while researching this post I came across <a href="http://crowdsurfwork.com">CrowdSurfWork</a>. This site is an interface for freelance transcribers to work on &#8220;micro-tasks&#8221; related to transcription. Their system is built on Amazon.com&#8217;s <a href="https://www.mturk.com">Mechanical Turk</a> service, which provides a marketplace for &#8220;Human Intelligence Tasks&#8221;. <a href="https://www.mturk.com/mturk/sortsearchbar?searchSpec=HITGroupSearch%23T%231%2310%23-1%23T%23%21keyword_list%212%21rO0ABXQACUNyb3dkc3VyZg--%21Reward%216%21rO0ABXQABDAuMDA-%21%23%21NumHITs%211%21%23%21&amp;selectedSearchType=hitgroups&amp;searchWords=Crowdsurf&amp;sortType=NumHITs%3A1&amp;pageSize=100">Typical micro-tasks</a> include transcribing a chunk of audio (&#8220;up to 35 seconds&#8221;), reviewing and scoring a chunk of transcript, quality checking a whole transcript etc. CrowdSurfWork don&#8217;t say who their clients are. They&#8217;re certainly <a href="https://www.mturk.com/mturk/searchbar?selectedSearchType=hitgroups&amp;searchWords=transcribe&amp;minReward=0.00&amp;x=0&amp;y=0">not the only ones</a> using Mechanical Turk for transcription work.</p>
<p>Commercial services provide a complete transcription service: audio in, high quality transcript out. Internally that work is usually broken down into a transcription phase and a quality check/edit phase. I wonder if some companies could offer a service that takes a raw initial transcript (e.g. generated by an automated transcription system) and just perform the quality check/edit phase, at a lower cost.</p>
<p><del>I also wonder if</del> It seems very likely that some companies are already using automated transcription systems, especially for regular clients where the system could be trained for the clients voice.</p>
<h2>Free Automated Transcription</h2>
<p>Automatic speech recognition has come a long way in recent years, with untrained <em>speaker-independent</em> systems achieving useful levels of accuracy.</p>
<p><strong>Google Docs</strong> now supports <a href="https://support.google.com/docs/answer/4492226?hl=en">Voice typing</a> which you can use to transcribe your voice, or other audio being played at the time. It only works in the Chrome browser, or the Docs app on iOS or Android. Here&#8217;s a <a href="https://www.youtube.com/watch?v=iWNCPj5jTWM">demo</a>. (See also <a href="https://speechlogger.appspot.com/en/">Speechlogger</a> which uses the same underlying Google technology and has some handy tips on improving the quality when transcribing audio files by using a &#8220;virtual line-in cable&#8221;. See also <a href="http://rogueamoeba.com/loopback/">Loopback</a> for Mac.)</p>
<p>Another relevant way to access Google&#8217;s speaker-independent speech recognition is to upload a video to <strong>YouTube</strong> and let it provide <a href="https://support.google.com/youtube/answer/2734796?hl=en&amp;ref_topic=3014331">Automatic Captioning</a> for you. More on that below.</p>
<p>On a <strong>Mac</strong> you can <a href="https://support.apple.com/en-ie/HT202584">use your voice to enter text</a> into almost any application. The default mode uses a web service but you can enable <a href="https://support.apple.com/en-ie/HT202584">Enhanced Dictation</a> which installs the recognition code locally so you don&#8217;t need an internet connection and can &#8220;dictate continuously&#8221;.</p>
<p>These don&#8217;t offer any customisation or training to improve the accuracy.</p>
<p>Microsoft <strong>Windows</strong> offers a similar <a href="http://windows.microsoft.com/en-ie/windows/dictate-text-speech-recognition#1TC=windows-7">Speech Recognition service</a>. It supports a customisable speech dictionary and <a href="https://en.wikipedia.org/wiki/Windows_Speech_Recognition#Overview_and_features">accuracy improves with usage</a>. As far as I can tell this is <a href="https://en.wikipedia.org/wiki/Microsoft_Speech_API">built in to the operating system</a> and doesn&#8217;t use a network service.</p>
<p>There are a number of <a href="https://en.wikipedia.org/wiki/Speech_recognition_software_for_Linux">speech recognition projects for <strong>Linux</strong></a>. I have not looked into them in detail. If you have experience with any that would fit this project I&#8217;d be grateful if you would get in touch with me.</p>
<h2>Commercial Speech-to-text Services</h2>
<p>The <strong>Google</strong> <a href="https://cloud.google.com/speech/" target="_blank" rel="noopener">Cloud Speech API</a> offers access to APIs for applications to “see, hear and translate”. It&#8217;s based on the same neural network tech that powers Google’s<a href="http://googleresearch.blogspot.com/2015/09/google-voice-search-faster-and-more.html" target="_blank" rel="noopener"> voice search</a> in the Google app and voice typing in Google’s Keyboard and Chrome described above. It offers some customization in the form of a list of <a href="https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig#SpeechContext">phrases</a> (up to 500, provided with the API request) that act as &#8220;hints&#8221; to the speech recognizer to favor specific words and phrases in the results. The current <a href="https://cloud.google.com/speech/limits">limits</a> cap audio length at 80 minutes and require use of <em>uncompressed</em> audio.</p>
<p><strong>Nuance</strong>, who currently provide the <a href="http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition/#b5c3956421c0">technology behind Apple&#8217;s Siri and dictation services</a>, offer a HTTP REST <a href="https://developer.nuance.com/public/Help/HttpInterface/HTTP_web_services_for_NCS_clients_1.0_programmer_s_guide.pdf">Cloud speech recognition</a> service that&#8217;s targeted at mobile devices. (I presume this is the service behind their <a href="https://www.yahoo.com/tech/dragon-dictation-offers-superior-voice-recognition-200101487.html">new and expensive, Dragon Anywhere</a> mobile dictation app.)</p>
<p>The service supports uploading <a href="https://developer.nuance.com/downloads/custom_vocabulary/Guide_to_Custom_Vocabularies_v1.5.pdf">custom phrases and vocabularies</a>. It also allows you to specify an ID for the speaker which is used for Speaker-Dependent Acoustic Model Adaptation (SD-AMA). This &#8220;creates adapted acoustic model profiles from audio collected from each user to improve recognition performance over time.&#8221; Both of these should help improve accuracy beyond what&#8217;s possible with speaker-independent services like those from Google or Apple.</p>
<p>The pricing is <a href="https://developer.nuance.com/public/index.php?task=memberServices">$.008 / transaction</a> where a &#8216;transaction&#8217; is a successful HTTP request, presumably about a sentence (I&#8217;ve seen references to 30 seconds as a maximum). Their terms require &#8216;Emerald Level&#8217; payment when the client isn&#8217;t a mobile device. Some negotiation might be required!</p>
<p><strong>Microsoft</strong> provide a <a href="https://www.microsoft.com/cognitive-services/en-us/speech-api">Bing Speech API</a>. The REST API only supports 10 seconds of audio per request, similar to Nuance &#8216;transactions&#8217; described above. Their Client Library supports streaming.</p>
<p><strong>IBM</strong> offers their Watson Developer Cloud <a href="https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html">Speech to Text</a> service. It has both <a href="https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/http.shtml">HTTP REST</a> and <a href="https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/websockets.shtml">WebSocket</a> APIs. The pricing is free for the first thousand minutes per month, then $0.02 per minute. The IBM service doesn&#8217;t support SD-AMA <del>or custom vocabularies</del>. Support for <a href="http://www.ibm.com/watson/developercloud/doc/speech-to-text/custom.shtml" target="_blank" rel="noopener">custom vocabularies</a> was <a href="http://www.ibm.com/watson/developercloud/doc/speech-to-text/relnotes.shtml#September2016" target="_blank" rel="noopener">added</a> in September 2016. (They&#8217;ve said they&#8217;re <a href="http://stackoverflow.com/a/36314354/77193">working on speaker diarization</a>.) The results include timestamps, confidence indicators and alternative suggestions. Here&#8217;s an <a href="https://github.com/dannguyen/watson-word-watcher">example use to translate a ProPublica podcast</a>.</p>
<p><strong>Vocapia</strong> provide a <a href="http://www.vocapia.com/speech-to-text-api.html">Speech to Text API service called VoxSigma</a>. It returns &#8220;XML with speaker diarization, language identification tags, word transcription, punctuation, confidence measures, numeral entities and other specific entities&#8221;.  They also support customization in the form of &#8216;Language Model Adaptation&#8217; by uploading sample text. I&#8217;ve requested technical documentation and pricing details, neither of which are on their web site. They&#8217;ve given me a trial account to test the service.</p>
<p><strong>Speechmatics</strong> provide <a href="https://www.speechmatics.com">speech to text services</a> with a simple <a href="https://www.speechmatics.com/api-details">REST API</a>. The transcript data includes speaker diarization, word transcription, punctuation, and confidence measures. They don&#8217;t offer any customization. <a href="https://speechmatics.com/pricing">Pricing</a> is £0.06/minute (£3.60/hour), with the first hour free. Speechmatics claim to be the <a href="https://www.youtube.com/watch?v=4KdhM_mCVo8">world&#8217;s most accurate</a> transcription service.</p>
<p><strong>Voicebase</strong> provide a <a href="https://voicebase.wpengine.com/transcription/">transcription service</a>. They&#8217;re using a <a href="https://blog.speechmatics.com/2016/09/02/speechmatics-is-considerably-better-concludes-the-sydney-morning-herald-when-comparing-cloud-based-transcription-services/">different version of Speechmatics technology</a> (with <a href="http://www.smh.com.au/digital-life/hometech/m20techknow-20160802-gqje2r.html">slightly lower accuracy</a> it seems). I&#8217;m including them here because they provide interesting keyword extraction features. From a two hour interview I uploaded they extracted 94 keywords (like &#8220;ecological limits&#8221;, &#8220;symbolic language&#8221; etc.) and grouped them under 170 headings (like &#8220;Bioethics&#8221;, &#8220;Ontology&#8221; etc.). Clicking on a keyword or group, or entering search terms manually, shows all the places in the audio timeline where the topic is spoken about. You can then easily listen to just those parts. As you do the relevant portion of the transcript is highlighted. When you sign up they give you $60 (US) free credit. I didn&#8217;t see any rates quoted but it appears to be $0.02/minute. Output formats are PDF, RTF, and SRT.</p>
<p><strong>SpokenData</strong> offers <a href="https://spokendata.com" target="_blank" rel="noopener">automated transcription</a> with an interactive transcription editor, API, and optional human transcription services. It&#8217;s a project of Czech company <a href="http://www.replaywell.com" target="_blank" rel="noopener">ReplayWell</a>. <a href="https://spokendata.com/pricing" target="_blank" rel="noopener">Pricing</a> is €0.10/minute down to under €0.05/minute for bulk. The first hour is free. Other services, including speaker segmentation (diarization), are currently free. Transcript formats include SRT, TXT, TRS, XML.</p>
<p><strong>Deepgram</strong> also provide an <a href="https://www.deepgram.com" target="_blank" rel="noopener">automated transcription</a> service. Pricing is under $0.02/minute. They have a basic transcription viewer and a minimal dashboard. To download a transcript you have to use make an API call with a &#8220;get_object_transcript&#8221; action that&#8217;s not currently documented in their rather minimal  <a href="https://api.deepgram.com/doc" target="_blank" rel="noopener">API documentation</a>. The transcript format is JSON with per-paragraph timings.</p>
<p><strong>Trint</strong> don&#8217;t yet have an API for their <a href="https://www.deepgram.com" target="_blank" rel="noopener">automated transcription</a> service, but they do have a nice interactive editor with pitch-corrected speed control. Pricing is $0.25-$0.20/minute. Trint &#8220;automatically <em>identifies different speakers</em> and segments them into separate paragraphs&#8221; (emphasis mine). That doesn&#8217;t seem quite right. The transcript is segmented into paragraphs but there&#8217;s no identification of speakers that I can find. The editor let&#8217;s you label the speaker for each paragraph, but you still need to do that <em>manually for every single paragraph</em>. Transcript formats are DOCX, SRT, VTT or &#8220;Interactive Transcript&#8221; which is a zip containing HTML and JavaScript. So there&#8217;s no pure-data transcript format available. (The &#8220;Interactive Transcript&#8221; zip contains the transcript in the form of HTML with a span with attributes for each word.) <a href="http://www.johntedesco.net/blog/2017/01/21/how-to-transcribe-with-trint-an-interview-with-ceo-and-chief-beta-tester-jeff-kofman/" target="_blank" rel="noopener">Review</a>.</p>
<p><strong>Pop Up Archive</strong> offers a <a href="https://www.popuparchive.com" target="_blank" rel="noopener">service</a> that seems an ideal fit for these requirements. You upload a file and they tag, index &amp; transcribe it automatically, including timestamps and speaker <a href="https://popuparchiveorg.zendesk.com/hc/en-us/articles/204030770-How-do-I-assign-speakers-" target="_blank" rel="noopener">diarization</a>. They provide an interactive transcript editor synced to the audio, team plans allow concurrent editing by multiple people. Download transcripts in .TXT, .XML, .JSON, .WEBVTT, and .SRT formats, and there&#8217;s an <a href="https://www.popuparchive.com/developer" target="_blank" rel="noopener">API</a>. (Looking at the output it looks like they&#8217;re using Speechmatics as the backend transcription service.) They provide a <a href="https://www.popuparchive.com/explore" target="_blank" rel="noopener">search and browse interface</a> for the thousands of podcast transcripts they&#8217;re hosting, plus a HTML code generator for embedding players on your own website. <a href="https://www.popuparchive.com/pricing" target="_blank" rel="noopener">Pricing</a> ranges from $0.25/min down to $0.20/min on monthly plans. One hour free credit.</p>
<p>Pop Up Archive have an interesting project called <strong><a href="http://audiosear.ch/" target="_blank" rel="noopener">Audiosear.ch</a> </strong>which is billed as &#8220;a full–text search and intelligence engine for podcasts and radio&#8221;. It includes a <a href="http://blog.popuparchive.com/take-the-audiosear-ch-clipmaker-for-a-spin/" target="_blank" rel="noopener">ClipMaker</a> feature that makes it easy for anyone to search for and select a favorite podcast moment and share it on social media as a short auto-playing video of the audio and transcript. <a href="http://blog.popuparchive.com/take-the-audiosear-ch-clipmaker-for-a-spin/" target="_blank" rel="noopener">Take a look</a> and <a href="https://www.audiosear.ch/a/52763/director-jim-jarmusch--sundance-recap" target="_blank" rel="noopener">try it out</a>.</p>
<p><strong><a href="http://spreza.co" target="_blank" rel="noopener">Spreza</a></strong> and <strong><a href="https://Voyz.es" target="_blank" rel="noopener">Voyz.es</a></strong> are two other service providers in this space. They&#8217;re both currently in private beta. I&#8217;ve applied for access.</p>
<p>In November 2017,aAlmost two years after originally writing this post, Amazon launched their <a href="https://aws.amazon.com/blogs/aws/amazon-transcribe-scalable-and-accurate-automatic-speech-recognition/">Amazon Transcribe</a> service which adds inferred punctuation, word-level timestamps, and recognises multiple speakers.</p>
<p>See also Pop Up Podcasting&#8217;s <a href="https://popuppodcasting.ca/blog/automatic-transcription-services-compared">review of automated transcription tools</a>.</p>
<h2>Commercial Speech-to-text Applications</h2>
<p>These are applications which you install and run on your own machine. Modern machines and software are fast enough for high quality results in realtime. A key feature is the ability to <em>train</em> the software to improve the recognition <em>of a particular voice</em>. This, combined with custom vocabularies, greatly improves the accuracy.</p>
<p>Ignoring companies offering niche products (like <a href="http://www.vestec.com/products/">vestec</a>, <a href="http://www.speechatsri.com/products/dynaspeak.shtml">SRI</a>, and <a href="http://www.verbio.com/product/speech-recognition/">verbio</a>) which don&#8217;t provide documentation or prices online, there&#8217;s only one major player <a href="http://www.em-t.com/articles/nuance-acquire-part-ibms-speech-technology">left</a> in this field: <strong>Nuance</strong>, with their Dragon line of products for <a href="http://www.nuance.com/for-individuals/by-product/dragon-for-pc/premium-version/index.htm">PC</a> and <a href="http://www.nuance.com/for-individuals/by-product/dragon-for-mac/index.htm">Mac</a>.</p>
<p>Dragon can learn your vocabulary and likely phrases by <a href="http://www.nuance.com/products/help/dragon/dragon-for-mac/enx/Content/Accuracy/VocabularyTraining.html?Highlight=vocabulary">reading documents</a> or emails you&#8217;ve written. For transcribing podcasts it could be given some existing transcriptions, if you have any. It will also <a href="http://www.nuance.com/products/help/dragon/dragon-for-mac/enx/content/Correction/CorrectionMenu.html">learn from the corrections you make while dictating</a>. All this training is tied to a single voice profile so Dragon will only work well with a single voice at a time.</p>
<p>It&#8217;s also important to note that Dragon, unlike services such as Trint and Speechmatics, will only give you a bare stream of words. There&#8217;s no segmentation into sentences and paragraphs. You&#8217;ll have to do that by hand, along with capitalizing the first word. So even if Dragon was very accurate you&#8217;ll always be left with a lot of work to do.</p>
<h1>Anecdotal Accuracy</h1>
<p>This <a href="https://news.ycombinator.com/item?id=11347872">thread on Ycombinator</a> from March 2016 includes a variety of opinions, including &#8220;As someone who&#8217;s worked with a lot of these engines, Nuance and IBM are the only really high quality players in the space&#8221;; &#8220;If Nuance is 100%, I&#8217;d say CMUSphinx is at least 40%&#8221;; &#8220;As someone who has actually done objective tests, Google are by far the best, Nuance are a clear second. IBM Watson is awful though. Actually the worst I&#8217;ve tested.&#8221;</p>
<p>Spoiler: in my testing so far Trint.com and Speechmatics.com are <em>much</em> better than Watson and (untrained) Nuance. I&#8217;ll post detailed results when I&#8217;ve finshed testing.</p>
<h1>State of the Art</h1>
<p>The state of the art in speech recognition is advancing very rapidly at the moment as <a href="https://en.wikipedia.org/wiki/Deep_learning" target="_blank" rel="noopener">Deep Learning</a> and other modern machine learning techniques are being applied ever-more successfully. One of the most difficult of all human speech recognition tasks is conversational telephone speech, very similar to the conversational podcast speech we&#8217;re exploring in here. Recent research, published in October 2016, has shown that it is now possible to <a href="https://blog.acolyer.org/2016/11/22/achieving-human-parity-in-conversational-speech-recognition/" target="_blank" rel="noopener">achieve human parity in conversational speech recognition</a>. A significant research milestone that should be reflected in commercial systems in the future.</p>
<h1>Verbatim vs Clean Transcription</h1>
<p>Informal speech is often littered with stutters, filler words (&#8216;ah&#8217;, &#8216;um&#8217;, &#8216;like&#8217; etc.), and other forms of <a href="https://en.wikipedia.org/wiki/Speech_disfluency">speech disfluency</a>. Conversational speech often contains &#8216;conformational affirmations&#8217; such as &#8220;Uh-huh.&#8221;, &#8220;I see.&#8221;</p>
<p>Commercial transcription services will, by default, provide you with a &#8216;clean&#8217; transcript that doesn&#8217;t include every utterance in the audio. The disfluencies and conformational affirmations are skipped. A &#8216;verbatim&#8217; transcription service is often available at a higher cost to account for the work that&#8217;s needed to capture the extra details.</p>
<p>Depending on the amount of disfluency, a clean transcript can be significantly easier to read than a verbatim transcript.</p>
<p>An automated transcription system will naturally produce a verbatim transcript. Cleaning up a verbatim transcript automatically is an <a href="http://research.microsoft.com/pubs/218310/IS14-hany.PDF">active</a> <a href="http://www.aclweb.org/anthology/E14-4009">area</a> <a href="http://www.cs.cmu.edu/~tanja/Papers/HonalSchultz_ICASSP05.pdf">of</a> <a href="https://www.sri.com/sites/default/files/publications/automatic_disfluency_removal_for_improving_spoken_language.pdf">research</a>. For our purposes in the short term I imagine some typical cases could be recognised and edited out automatically. The rest would have to be dealt with as part of the crowdsourced manual QA process.</p>
<h1>Podcast Transcription</h1>
<p>So can a viable automated podcast transcription solution be built from these options?</p>
<p>Dragon applications offer the highest accuracy but only work well with a single voice, don&#8217;t provide automatic timecodes, and are hard to automate.</p>
<p>Free automated transcription services offer no training or customisation and don&#8217;t provide timecodes directly.</p>
<p>The Watson Developer Cloud Speech to Text service offers timecodes but no training or customisation. It might be workable but is likely to be relatively poor quality, especially without <a href="https://en.wikipedia.org/wiki/Speaker_diarisation">diarisation</a>.</p>
<p>The Nuance Cloud Speech Recognition service would require me to pre-process the audio into small chunks, presumably based on pauses. That would mean I&#8217;d effectively generate timecodes myself but at the cost of significant extra audio processing upfront. Quality is bound to suffer, especially in segments where pauses aren&#8217;t clear.</p>
<p>Considering pre-processing the audio opens up extra possibilities. In addition to identifying pauses, I could also implement <a href="https://en.wikipedia.org/wiki/Speaker_diarisation">diarisation</a> (e.g. using one of the <a href="https://en.wikipedia.org/wiki/Speaker_diarisation#Open_source_speaker_diarisation_software">open source tools</a>). That would not only improve the chunking, where one speaker starts talking over another, but also open up interesting solutions for the single speaker problem&#8230;</p>
<p>Given the details of who is speaking when, <em>a separate audio file for each speaker</em> could be generated, with the voice of the other speaker replaced by silence. (An audio editor that supports a <a href="https://en.wikipedia.org/wiki/Cue_sheet_(computing)">cue sheet</a> would make that simple.) Each per-speaker file could then be fed to Dragon <em>with the appropriate profile for that speaker</em>. After a short period of training the rest of the transcription for that speaker could proceed automatically and with higher accuracy.</p>
<p>That would solve the single speaker problem but there&#8217;s still a lack of timecodes in the transcript. A few approaches spring to mind but the most interesting is to insert the timecodes into the audio stream as spoken words, e.g. &#8220;zero seven colon one five space&#8221;, perhaps using a text-to-speech tool. Then there would be no need to keep the periods of silence for the other speaker. The audio file would dictate its own timecodes!</p>
<p>The transcripts generated for each speaker could then be merged using the timecodes in the text to interleave them in the correct order. (Though in practice they&#8217;d probably simply be written into a database with the timecode as a key.)</p>
<p>Slicing the audio per-speaker would also enable a neat solution to the problem of poor quality recordings of interviews where the remote person has a poor internet connection. If they made a <em>separate local recording</em> of their voice then that audio file could be sliced up and used for the transcription of the parts of the interview where they were speaking. Neat!</p>
<h1>Video Subtitles/Captions</h1>
<p>When you listen to someone you absorb more than when just reading their words. Transcriptions help you search and discover sections of interest, but then it should be easy to <em>listen</em> to the words.</p>
<p>This is why having timecodes is important. Having searched transcripts to discover sections of interest you could click a button to listen to <em>just those parts</em>. For video podcasts you might choose to <em>watch</em>, giving you the added dimension of all the <a href="https://en.wikipedia.org/wiki/Nonverbal_communication">non-verbal communication</a>.</p>
<p>Where do <a href="https://en.wikipedia.org/wiki/Subtitle_(captioning)">subtitles</a>, and their more feature-full modern cousin, <a href="https://en.wikipedia.org/wiki/Closed_captioning">captions</a>, fit in? For the deaf, the hard of hearing, and non-native speakers, they offer the opportunity to read the words in sync with the added richness of the non-verbal communication.</p>
<p>In theory subtitles/captions could be generated directly from a transcript if it has sufficiently frequent and accurate timecodes. Speaker diarisation would also help. That should be enough to generate at least a good quality draft. Which raises the &#8220;quality gap&#8221; question again: could automatically generated subtitles/captions be made &#8220;good enough&#8221; that the <em>effort of manual correction is significantly less than the effort of manual creation</em>? I think so.</p>
<p>Note that there will almost always be a need for some manual editing. For example, carefully condensing the number of words to fit within typical reading speeds, or adding captions for sounds, like &#8220;[dog barking]&#8221;.</p>
<p>Syncing the timing of the appearance (and disappearance) of each subtitle is a painstaking process that <a href="https://www.dcmp.org/ciy/converting-youtube-to-srt.html">consumes the most time of any portion of the captioning process</a>. Here&#8217;s an <a href="https://www.youtube.com/watch?v=7j4ytsunao8&amp;index=2&amp;list=PLjdLzz0k39ykXZJ91DcSd5IIXrm4YuGgE">example video of the manual syncing process</a>.</p>
<p>One way to avoid the effort is to let YouTube perform <a href="https://support.google.com/youtube/answer/2734796?hl=en&amp;ref_topic=3014331">Set Timings</a> on a plain-text transcript for you (<a href="https://www.youtube.com/watch?v=w4BRY56u2xw#t=41m29s">Video of announcement and demo</a> in 2009.) It&#8217;s &#8220;not recommended for videos that are over an hour long or have poor audio quality&#8221;. If that does work well then it would remove the need to generate timecodes myself.</p>
<p>I presume that having a &#8216;verbatim&#8217; transcript, rather than a &#8216;clean&#8217; one, would help the YouTube Set Timings processing to be more reliable.</p>
<h2>Applications and Services</h2>
<p>Wikipedia has a <a href="https://en.wikipedia.org/wiki/Comparison_of_subtitle_editors">comparison of subtitle editors</a> that provides an incomplete list of free and commercial editors for various platforms. There&#8217;s also a list in the &#8220;Use captioning software &amp; services&#8221; section of the <a href="https://support.google.com/youtube/answer/2734796?hl=en&amp;ref_topic=3014331">Add subtitles &amp; closed captions</a> YouTube help page.</p>
<p>I&#8217;ll just highlight a few interesting ones here:</p>
<p><strong>Voxcribe</strong> offer a commercial Windows application called <a href="https://voxcribe.com/Video%20Speech%20Recognition%20Captioning%20Subtitling%20Software%20VoxcribeCC.html">VoxcribeCC</a> that uses speaker-independent speech recognition technology to automatically caption a video. The first 60 minutes is free, then you pay-as-you-go for $7-$10 per hour. Output formats are Subrip (srt) and Timed Text (xml). It doesn&#8217;t support training or custom vocabularies.</p>
<p><strong>Amara</strong> deserves a special mention: <a href="https://www.amara.org/en/">Amara</a> is an open-source and non-profit collaboration community for captioning and subtitling video. A ‘Wikipedia for Subtitles’, Amara enables volunteers to make videos accessible for people who are deaf and hard of hearing and anyone who doesn’t speak the language of the original video. Amara has more than 100,000 subtitling volunteers and organizations like TED, Khan Academy, and PBS use it to make video accessible.</p>
<p>Amara is a project of the <a href="http://pculture.org">Participatory Culture Foundation</a>. (YouTube also supports <a href="https://support.google.com/youtube/answer/6052538?hl=en&amp;ref_topic=3014331">community-contributed</a> directly, along with <a href="https://support.google.com/youtube/answer/2780526?hl=en&amp;ref_topic=3014331">paid translations</a>.) The <a href="https://github.com/pculture/unisubs">open-source code</a> is being <a href="https://github.com/pculture/unisubs/pulse/monthly">actively developed</a> and includes a rich <a href="http://universal-subtitles.readthedocs.org/en/editor-review-with-actions/api.html">API</a>.</p>
<h2>Workflow</h2>
<p>Here&#8217;s an outline for one (of many) possible workflows:</p>
<ul>
<li>Generate a verbatim transcript from the audio.</li>
<li>Generate and upload a transcript file <a href="https://support.google.com/youtube/answer/2734799?hl=en&amp;ref_topic=3014331">formatted for YouTube</a>.</li>
<li>Request YouTube to perform a Set Timings operation.</li>
<li>Download the subtitles and timecode data.</li>
<li>Clean up the verbatim transcript.</li>
<li>Combine with the speaker diarisation data, if available.</li>
<li>Generate the interactive transcript pages.</li>
<li>Condense subtitle wording to fit reading speed, if needed.</li>
<li>Upload condensed and diarised subtitles back to YouTube.</li>
</ul>
<h1>What Next?</h1>
<h2>Proof of Concept Testing</h2>
<p>I have mentioned lots of services in this post. Next I&#8217;m planning to do some very basic testing of the ones that seem likely to be useful. I&#8217;ll use some podcast audio for which I also have manual transcriptions. I want to get some experience with the various tools from the low-end (speaker independent) through to the high-end (Dragon with vocabulary and voice training). That will give me some sense of how big the &#8220;quality gap&#8221; really is. I&#8217;ll post some results when I have them.</p>
<p>I&#8217;ve written a follow-up post about how I&#8217;m <a href="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/">Comparing Transcriptions</a> &#8211; which turned out to be more tricky, and interesting, than I&#8217;d expected.</p>
<h2>A Project?</h2>
<p>Naturally I&#8217;m glossing over <em>lots</em> of details here, and I know there&#8217;s lots I don&#8217;t know. At this stage I&#8217;m very much in exploratory mode, discovering possibilities to see what might be viable. I&#8217;m encouraged by what I&#8217;ve found so far and can see interesting paths worth exploring.</p>
<p>I have no particular experience with audio processing or bulk transcription, but I am interested in helping more podcasts to have rich searchable transcripts available.</p>
<p>Are you? Great! Please get in touch.</p>
<hr />
<h2>Appendix of Random Notes</h2>
<p>Some of the most common Subtitle and Caption File Formats are:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/SubRip">SRT</a> &#8211; &#8220;SubRip Text&#8221; &#8211; a standard subtitle format supported by most video players</li>
<li><a href="https://en.wikipedia.org/wiki/SubStation_Alpha">SSA</a> &#8211; &#8220;SubStation Alpha&#8221; format that allows more advanced subtitles than the conventional SRT format</li>
<li><a href="https://en.wikipedia.org/wiki/Timed_Text_Markup_Language">TTML</a> &#8211; &#8220;Timed Text Markup Language&#8221;, an XML format that is one of W3C&#8217;s standards regulating timed text</li>
<li><a href="https://en.wikipedia.org/wiki/Timed_Text_Markup_Language">DFXP</a> &#8211; &#8220;Distribution Format Exchange Profile&#8221;, the old name for TTML</li>
<li><a href="https://en.wikipedia.org/wiki/SubViewer">SBV</a> &#8211; &#8220;SubViewer&#8221; plain text format, similar to SRT. Also known as .SUB</li>
<li><a href="https://en.wikipedia.org/wiki/WebVTT">VTT</a> &#8211; &#8220;Web Video Text Tracks&#8221;, very similar to SubRip, supported by most browsers</li>
</ul>
<p>YouTube supports <a href="https://support.google.com/youtube/answer/2734698?hl=en">many subtitle and caption file formats</a>.</p>
<p>Updates:</p>
<ul>
<li>2016-04-23: Added <a href="https://www.speechmatics.com">Speechmatics</a>, <a href="http://nowtranscribe.com">NowTranscribe</a>, Google <a href="https://cloud.google.com/speech/">Cloud Speech API</a>, Microsoft <a href="https://www.microsoft.com/cognitive-services/en-us/speech-api">Bing Speech API</a>, and the Anecdotal Accuracy section.</li>
<li>2016-11-08: Updated IBM Watson entry to note that support for <a href="http://www.ibm.com/watson/developercloud/doc/speech-to-text/custom.shtml" target="_blank" rel="noopener">custom vocabularies</a> was <a href="http://www.ibm.com/watson/developercloud/doc/speech-to-text/relnotes.shtml#September2016" target="_blank" rel="noopener">added</a> in September 2016.</li>
<li>2016-11-22: Added &#8220;State of the Art&#8221; section with a link to the recent <a href="https://blog.acolyer.org/2016/11/22/achieving-human-parity-in-conversational-speech-recognition/" target="_blank" rel="noopener">Achieving human parity in conversational speech recognition</a> paper.</li>
<li>2016-11-27: Added <a href="https://voicebase.wpengine.com/transcription/">voicebase</a> with details of the keyword extraction UI.</li>
<li>2016-12-03: Updated Google Speech API details. Added a link to <a href="https://www.youtube.com/watch?v=4KdhM_mCVo8">a talk</a> where Speechmatics claim to be the world&#8217;s most accurate. Some other minor edits.</li>
<li>2016-12-30: Added  details for SpokenData, Deepgram, Trint, Spreza and Voyz.es.</li>
<li>2017-02-01: Added Pop Up Archive and AudioSear.ch. Plus a note on Dragon pointing out that there&#8217;s no segmentation into sentences.</li>
<li>2017-02-09: Added link to the <a href="https://blog.timbunce.org/2017/02/09/comparing-transcriptions/">Comparing Transcriptions</a> follow-up post.</li>
<li>2018-04-03: Added link to <a href="http://maui-indexer.blogspot.ie/2009/05/what-is-maui-about.html">Maui</a> topic-extraction <a href="http://www.medelyan.com/software" target="_blank" rel="noopener">software</a>, thanks to Rob Wilkinson.</li>
<li>2018-04-10: Added <a href="https://aws.amazon.com/blogs/aws/amazon-transcribe-scalable-and-accurate-automatic-speech-recognition/">Amazon Transcribe</a> and a link to <a href="https://popuppodcasting.ca/blog/automatic-transcription-services-compared">another review of tools</a>.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2016/03/22/semi-automated-podcast-transcription-2/feed/</wfw:commentRss>
			<slash:comments>39</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">609</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2016/02/transcription-data-flow-001.png" medium="image">
			<media:title type="html">Transcription Data Flow Schematic</media:title>
		</media:content>
	</item>
		<item>
		<title>Introducing Data::Tumbler and Test::WriteVariants</title>
		<link>https://blog.timbunce.org/2014/03/23/introducing-datatumbler-and-testwritevariants/</link>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Sun, 23 Mar 2014 14:20:11 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[testing]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=600</guid>

					<description><![CDATA[For some time now Jens Rehsack (‎Sno‎), H.Merijn Brand (‎Tux‎) and I have been working on bootstrapping a large project to provide a common test suite for the DBI that can be reused by drivers to test their conformance to the DBI specification. This post isn&#8217;t about that. This post is about two spin-off modules &#8230; <a href="https://blog.timbunce.org/2014/03/23/introducing-datatumbler-and-testwritevariants/" class="more-link">Continue reading <span class="screen-reader-text">Introducing Data::Tumbler and Test::WriteVariants</span></a>]]></description>
										<content:encoded><![CDATA[<p>For some time now <a href="http://act.qa-hackathon.org/qa2014/user/8669">Jens Rehsack (‎Sno‎)</a>, <a href="http://act.qa-hackathon.org/qa2014/user/268">H.Merijn Brand (‎Tux‎)</a> and I have been working on bootstrapping a large project to provide a common test suite for the DBI that can be reused by drivers to test their conformance to the DBI specification.</p>
<p>This post isn&#8217;t about that. This post is about two spin-off modules that might seem unrelated: <a href="https://metacpan.org/pod/Data::Tumbler">Data::Tumbler</a> and <a href="https://metacpan.org/pod/Test::WriteVariants">Test::WriteVariants</a>, and the Perl QA Hackathon that saw them released.</p>
<p><span id="more-600"></span></p>
<hr />
<p>This was my first year attending a <a href="http://act.qa-hackathon.org/qa2014/">Perl QA Hackathon</a>. An annual event where <a href="http://www.flickr.com/photos/wendyga/13194549304/in/set-72157642437424235">key developers</a> get together to discuss and develop the code, services, and standards at the core of the Perl ecosystem.</p>
<p>See the <a href="http://act.qa-hackathon.org/qa2014/wiki?node=Results">Results</a> and <a href="http://act.qa-hackathon.org/qa2014/wiki?node=Blogs">Blogs</a> pages to get a sense of the important work that <em>gets done</em> at these events and the in weeks that follow. What&#8217;s less clear but just as important are the personal connections made and renewed here.</p>
<p>These events take a lot of work to put together. Special thanks are due to <a href="http://act.qa-hackathon.org/qa2014/user/1">Philippe Bruhat (BooK)</a> and <a href="https://metacpan.org/author/ELBEHO">Laurent Boivin (elbeho)</a> for organising it so well; to Wendy for looking after our nourishment and caffination so joyfully; to <a href="http://booking.com">Booking.com</a> for the venue; and all the other sponsors for helping to make this QA Hackathon the great success it was. In no particular order, <a href="http://www.splio.com/">SPLIO</a>, <a href="http://www.grantstreet.com/">Grant Street Group</a>, <a href="http://www.dyn.com/">DYN</a>, <a href="http://www.campusexplorer.com/">Campus Explorer</a>, <a href="http://www.evozon.com/">EVOZON</a>, <a href="http://www.elasticsearch.com/">elasticsearch</a>, <a href="http://www.eligo.co.uk/">Eligo</a>, <a href="http://www.mongueurs.pm/">Mongueurs de Perl</a>, WenZPerl for <a href="http://perl6.org/">the Perl6 Community</a>, <a href="http://www.procura.nl/">PROCURA</a>, <a href="http://madeinlove.co.uk/">Made In Love</a> and <a href="http://www.perlfoundation.org/">The Perl Foundation</a>. Thank you one and all.</p>
<p>My focus at the hackathon was on pushing the DBI Test project forward with Sno and Tux. Getting Data::Tumbler and Test::WriteVariants polished up and released was a key part of that. We also had valuable discussions with BooK about useful enhancements to <a href="https://metacpan.org/pod/Test::Database">Test::Database</a>.</p>
<hr />
<p>So, what are Data::Tumbler and Test::WriteVariants? To explain that I&#8217;ll start 10 years ago&#8230;</p>
<p>The DBI distribution includes <a href="https://metacpan.org/source/TIMB/DBI-1.631/lib/DBI/PurePerl.pm#L1099">DBI::PurePerl</a>, a fairly-complete implementation of DBI in pure-perl, and <a href="https://metacpan.org/pod/DBD::Gofer">DBD::Gofer</a>, a fairly-transparent proxy.</p>
<p>Both these modules need testing, and both should behave very much like using the normal DBI. The best way to test that was to re-run the DBI tests while using DBI::PurePerl, re-run them again using DBD::Gofer, and re-run them again using DBI::PurePerl and DBD::Gofer at the same time. So, since 2004, that&#8217;s what the DBI does.</p>
<p>When you run Makefile.PL in the DBI distribution it looks at the 44 test files and generates 141 new test files with various combinations of contexts. These generated test files look something like this:</p>
<blockquote>
<pre>#!perl -w
$ENV{DBI_AUTOPROXY} = 'dbi:Gofer:transport=null;policy=pedantic';
END { delete $ENV{DBI_AUTOPROXY}; }; # for VMS
$ENV{DBI_PUREPERL} = 2;
END { delete $ENV{DBI_PUREPERL}; }; # for VMS
require './t/06attrs.t';</pre>
</blockquote>
<p>They setup a &#8216;context&#8217; and then execute the original test. In this case the context is DBD::Gofer + DBI::PurePerl.</p>
<p>This arrangement has proved to be extremely effective. I&#8217;ve frequently made a change to the DBI and forgotten to make corresponding changes to DBD::Gofer and/or DBI::PurePerl, only to be forcefully reminded by the tests which worked for plain-DBI failing noisily when run in the extra test contexts.</p>
<p>It was clear that something like this was needed for the DBI Test project. We wanted to generate test variants not only for DBI::PurePerl and DBD::Gofer but also each available database driver. Each driver might also want to add test variants of their own. (DBD::DBM, for example, supports a number of <a href="https://metacpan.org/pod/DBD::DBM#dbm_type">DBM backends</a> and <a href="https://metacpan.org/pod/DBD::DBM#dbm_mldbm">serialization formats</a> that all need testing in combination).</p>
<p>After lots of experimentation and refactoring the relevant logic was extracted out into the Data::Tumbler and Test::WriteVariants modules, generalised, polished up and released during the hackathon.</p>
<hr />
<p>For some reason I struggle when trying to explain what <a href="https://metacpan.org/pod/Data::Tumbler">Data::Tumbler</a> is or does. The summary in the documentation says &#8220;Dynamic generation of nested combinations of variants&#8221;, which is a bit of a mouthful.</p>
<p>It&#8217;s basically a <a href="https://metacpan.org/source/TIMB/Data-Tumbler-0.003/lib/Data/Tumbler.pm#L141">single simple subroutine</a> that recurses into itself driven by the results of calling <em>provider</em> callbacks. As it recurses it builds up a <em>path</em> and a <em>context</em> from the keys and values returned by the providers.</p>
<p>The provider callbacks are passed the current path and context plus a cloned copy of a <em>payload</em> which they can edit. Because it&#8217;s cloned, any changes made to the payload will only be visible to any later providers and the <em>consumer</em>.</p>
<p>The recursion bottoms-out when there are no more providers. At this point a <em>consumer</em> callback is called with the current path, context, and payload.</p>
<p>That&#8217;s an abstract description, which is fitting as it&#8217;s an abstract algorithm. I hope it&#8217;s reasonably clear. There are a couple of examples in the <a href="https://metacpan.org/pod/Data::Tumbler#SYNOPSIS">documentation synopsis</a>. Currently Test::WriteVariants, described next, is the only use-case. I&#8217;d love to find some more, if only to help improve the documentation. Let me know if you can think of any!</p>
<hr />
<p><a href="https://metacpan.org/pod/Test::WriteVariants">Test::WriteVariants</a> directly addresses the use-case of writing a tree of perl <code>.../*.t</code> test files, each setting up various combinations of context values before invoking the test code.</p>
<p>Hopefully you can see where Data::Tumbler fits in: the <em>payload</em> is a hash of tests for which you&#8217;d like extra variant tests written; the <em>providers</em> define variants of the contexts in which you&#8217;d like the tests executed, typically by setting environment variables. The <em>consumer</em> writes a new <code>*.t</code> file for each element in the payload hash, using the <em>path</em> to build a directory tree, and using the <em>context</em> to set environment variables, etc., in each test file written.</p>
<p>The providers can also remove tests from the <em>payload</em> that aren&#8217;t relevant in a given <em>context</em>, or add more that are only relevant to a given context.</p>
<p>Test::WriteVariants allows providers to be specified not just as code references but also as namespaces. In this case it uses <a href="https://metacpan.org/pod/Module::Pluggable::Object">Module::Pluggable::Object</a> to find installed plugins within that namespace and wraps them in a code reference for Data::Tumbler. This allows extra test variants to be added by installing other modules.</p>
<p>This is used in DBI Test. The DBD::DBM driver, for example, can install <a href="https://github.com/perl5-dbi/DBI-Test/blob/28efa4e34e9a98fc7c5491631f6d5bbc0208bf9e/sandbox/tim/lib/DBI/Test/VariantDBD/DBM.pm">a provider plugin module that adds extra variants</a> when the context indicates that DBD::DBM is being tested. The plugin also arranges to <a href="https://github.com/perl5-dbi/DBI-Test/blob/master/sandbox/tim/lib/DBI/Test/VariantDBD/DBM.pm#L45">add DBD::DBM specific tests</a> in those contexts.</p>
<p>Although Test::WriteVariants is new, and still evolving quite fast, it&#8217;s already proving very useful. Jens is experimenting with using it for improving the testing of <a href="https://github.com/perl5-utils/List-MoreUtils">List::MoreUtils</a>, especially covering both the <a href="https://github.com/perl5-utils/List-MoreUtils/blob/155b94eb11ce9ac031cdb27cbf75f08e7bc317d5/inc/Tumble.pm#L137">XS and pure-perl</a> variants.</p>
<p>I hope you can see uses for Test::WriteVariants in improving the testing of your own modules. If so, please do try it out and let me know how it work out for you and if there&#8217;s anything that needs improving.</p>
<p>Happy testing!</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">600</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Migrating a complex search query from DBIx::Class to Elasticsearch</title>
		<link>https://blog.timbunce.org/2013/07/29/migrating-a-complex-search-query-from-dbixclass-to-elasticsearch/</link>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Mon, 29 Jul 2013 21:25:13 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[dbi]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[postgresql]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=584</guid>

					<description><![CDATA[At the heart of one of our major web applications at TigerLead is a property listing search. The search supports all the obvious criteria, like price range and bedrooms, more complex ones like school districts, plus a &#8220;full-text&#8221; search field. This is the story of moving the property listing search logic from querying a PostgreSQL &#8230; <a href="https://blog.timbunce.org/2013/07/29/migrating-a-complex-search-query-from-dbixclass-to-elasticsearch/" class="more-link">Continue reading <span class="screen-reader-text">Migrating a complex search query from DBIx::Class to&#160;Elasticsearch</span></a>]]></description>
										<content:encoded><![CDATA[<p>At the heart of one of our major web applications at <a href="http://www.tigerlead.com">TigerLead</a> is a property listing search. The search supports all the obvious criteria, like price range and bedrooms, more complex ones like school districts, plus a &#8220;full-text&#8221; search field.</p>
<p>This is the story of moving the property listing search logic from querying a PostgreSQL instance to querying an ElasticSearch cluster.<span id="more-584"></span>The initial motivation for using ElasticSearch was to improve the full-text search feature. We&#8217;d been using the <a href="http://www.postgresql.org/docs/current/static/textsearch.html">full text search features built into PostgreSQL</a> which was functional but limited. I&#8217;m sure we could have made better use of it but we wanted to take a bigger leap forward.</p>
<p>At the time, early in 2012, we looked at various options, including Sphinx and Solr. Elasticsearch was new and relatively immature but had a compelling feature set and momentum. The availability of powerful feature-rich APIs for Perl, i.e., the <a href="https://metacpan.org/module/ElasticSearch">ElasticSearch</a>, <a href="https://metacpan.org/module/ElasticSearch::SearchBuilder">ElasticSearch::SearchBuilder</a>, and <a href="https://metacpan.org/module/Elastic::Model">Elastic::Model</a> modules, was also a key factor. We began to see Elasticsearch as not just a solution for full-text search but as a strategic technology, applicable to a wide range of applications.</p>
<p>I found the learning curve quite steep. There was little in the way of guides and tutorials at the time and the reference documentation was patchy and often assumed familiarity with the terminology for Lucene, the foundation that underlies both Solr and Elasticsearch. Thankfully the <a href="http://www.elasticsearch.org/guide/">documentation</a> and other <a href="http://www.elasticsearch.org/resources/">resources</a> have improved since then. Also many companies are using Elasticsearch now (github, stackoverflow, foursquare, <a href="http://backstage.soundcloud.com/tag/elastic-search/">soundcloud</a>, <a href="http://blog.wajam.com/2013/08/scalable-architecture-behind-wajam-social-search/">Wajam</a> and <a href="http://www.kickstarter.com/backing-and-hacking/elasticsearch-at-kickstarter">kickstarter</a>, to name a few) and blogging about their experience of what to do and <a href="https://github.com/blog/1397-recent-code-search-outages">what not to do</a>.</p>
<p>I&#8217;d especially like to thank <a href="https://github.com/clintongormley">Clinton Gormley</a> for kindly giving me much help and support as I climbed the learning curve and stumbled over assorted issues.</p>
<h2>Index Building</h2>
<p>Our PostgreSQL database remains the &#8216;source of truth&#8217;. We build a new Elasticsearch index from the PostgreSQL data each day and feed changes into Elasticsearch every few minutes. Each new index has a name that includes the date and time it was created and we use <a href="http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases/">aliases</a> to <a href="http://www.elasticsearch.org/guide/reference/api/search/">route</a> queries relating to subsets of the data to the appropriate index.</p>
<p>Index and alias definition and management, along with the loading of data, is managed via the delightful <a href="https://metacpan.org/module/Elastic::Manual::Intro">Elastic::Model</a> module. (Clinton&#8217;s <a href="http://www.slideshare.net/clintongormley/to-infinity-and-beyond-14027777">presentation</a> is well worth a look.)</p>
<p>Aside: We&#8217;re exploring the use of <a href="http://skytools.projects.pgfoundry.org/skytools-3.0/">PgQ</a> for in-PostgreSQL transaction-safe queuing. That would let us keep Elasticsearch in sync in near-realtime in a more efficient way using triggers on relevant tables to queue change messages.</p>
<p>We only sync on-market property listings to Elasticsearch, of which there are millions across the US and Canada. Each has many numeric fields, many boolean fields, a number of text fields, plus a number of lists-of-integers fields.</p>
<h2>Query Building</h2>
<p>We use <a href="https://metacpan.org/module/DBIx::Class">DBIx::Class</a> as an abstract interface to the property listings in PostgreSQL, and that means using <a href="https://metacpan.org/module/SQL::Abstract">SQL::Abstract</a> to construct the search query. So we have a module that, for each web search query parameter, adds the corresponding elements to the SQL::Abstract data structure.</p>
<p>Most are pretty trivial, like</p>
<blockquote><p><code>$sql_abstract-&gt;{price} = { '&gt;=' =&gt; $price_min };</code></p></blockquote>
<p>Others are a little more tricky, like our basic textsearch query:</p>
<blockquote><p><code>$sql_abstract-&gt;{ts_index_col} = { '@@' =&gt; \[ "plainto_tsquery(?)", [ ts_index_col =&gt; $plain_ft_query ] ] };</code></p></blockquote>
<p>Elasticsearch has a <em>very</em> rich query language. It&#8217;s not actually a language at all, but something more like an <a href="http://www.elasticsearch.org/guide/reference/query-dsl">Abstract Syntax Tree expressed in JSON</a>. The Perl interface to this is the <a href="https://metacpan.org/module/ElasticSearch::SearchBuilder">ElasticSearch::SearchBuilder</a>. It looks a little like SQL::Abstract but is much richer.</p>
<p>I thought for a while about translating the SQL::Abstract data structure that we already generated into a corresponding ElasticSearch::SearchBuilder structure. In the end I decided this wouldn&#8217;t leave us in a good place. It proved better to modify every place that built the SQL::Abstract data structure to also build an ElasticSearch::SearchBuilder structure, tuned to the semantics of the field. For example, in some cases it can be better to use &#8216;<code>lte</code>&#8216; and &#8216;<code>gte</code>&#8216; instead of &#8216;<code>&lt;=</code>&#8216; and &#8216;<code>&gt;=</code>&#8216; as comparison operators.</p>
<h2>Full Records or IDs Only?</h2>
<p>Another early design decision was whether to store (and return) <em>all</em> the property details in Elasticsearch, or just enough to perform searches and return only IDs that could then be used to fetch the full details from PostgreSQL.</p>
<p>In the end I decided to store only search fields and return only IDs. The big downside was that every query would require two round-trips: one to query Elasticsearch to get the IDs and one to query PostgreSQL to get the full details. That might seem a little odd. The major motivation was how the new code would interface with the existing logic in the web application.</p>
<h2>Execution</h2>
<p>The existing code executed the listing search using the standard DBIx::Class search() method:</p>
<blockquote><p><code>$c-&gt;model($schema_name)-&gt;search( $sqla, \%attr )<br />
</code></p></blockquote>
<p>Here %attr contained two joins, two prefetches, paging, order by, cache control, and some extra fields via <code>'+select'</code>. The resulting resultset was then inflated via a series of five <code>with_*()</code> method calls based on <a href="https://metacpan.org/module/DBIx::Class::ResultSet::WithMetaData">DBIx::Class::ResultSet::WithMetaData</a> (which was fashionable at the time the code was written).</p>
<p>At this stage using Elasticsearch was just an experiment and, frankly, I didn&#8217;t want to mess with all that code! Returning just IDs let me integrate Elasticsearch with hardly any changes.</p>
<p>The trick was to replace the $sqla data structure that had been constructed to perform the full search with one that would just fetch the IDs that had been returned by Elasticsearch:</p>
<blockquote><p><code>$sqla = { 'me.id' =&gt; { -in =&gt; \@ids_from_es } };</code></p></blockquote>
<p>There was a little fiddling with paging and ordering, but that trick was the heart of it making the integration quite simple.</p>
<p>Another benefit was that we have a simple way to recover from problems. If ES fails for any reason then we simply don&#8217;t alter $sqla, so the original query runs against PG.</p>
<h2>Runtime Control</h2>
<p>We needed to be able to soft-launch this, to enable it for only a subset of requests. We already had the infrastructure for that, so once the code was in production we could enable it for specific users and control the overall percentage of search requests using Elasticsearch.</p>
<p>This was obviously very useful but also, as it turned out, our initial timidness hid interesting performance behaviour.</p>
<h2>Performance</h2>
<p>Property search is a key feature of the service and performance was naturally a concern. Would the <em>presumed</em> benefits of Elasticsearch (ES) outweigh the cost of having to run two queries, on two different databases? I was fairly confident but not certain.</p>
<p>We were using a cluster of three ES nodes, each with 8GB memory and 4 CPUs. Once the code was ready I&#8217;d done some performance testing, firing randomly generated search requests at the web servers, in our staging environment. Those stress test results looked good.</p>
<p>When we started routing 2-5% of actual production search requests to ES, however, the results were not good. Here&#8217;s a chart of the performance of PostgreSQL (PG) in green with Elasticsearch+PostgreSQL (ES+PG) in red:<br />
<a href="https://blog.timbunce.org/wp-content/uploads/2013/07/es-vs-pg-low-traffic-hr.png" target="_blank"><img style="display:block;margin-left:auto;margin-right:auto;border:0;" alt="Chart of ES and PG low traffic" src="https://blog.timbunce.org/wp-content/uploads/2013/07/es-and-pg-low-traffic.png?w=600&#038;h=323" width="600" height="323" border="0" /></a></p>
<p>The mean search time using ES+PG was worse than the 90th percentile time for PG alone. That was disappointing and puzzling. I embarked on a review of all the (many) things that might not be optimal in the ES server configuration, in the <a href="http://www.elasticsearch.org/guide/reference/mapping/">mapping</a> applied to the fields, and the particular way were we constructing the queries. Here Clinton Gormley was beyond helpful, again. We found and tuned many little things, which was great, but none were clearly the cause of the apparent slowness.</p>
<p>To cut a long story short, the cause turned out to be the fact we were running the ES nodes in virtual machines (KVM). More specifically, although we&#8217;d configured ES to lock the physical memory pages via <a href="http://www.elasticsearch.org/guide/reference/setup/installation/">bootstrap.mlockall=true</a>, mlockall() within a <em>guest</em> operating system doesn&#8217;t stop the <em>host</em> operating system stealing the physical pages.</p>
<p>From the host&#8217;s point of view those memory pages weren&#8217;t busy enough to keep assigned to the ES VM, so the solution was simple: give more traffic to ES. Sure enough, as we increased the number of requests going to ES it got faster!</p>
<p>Here&#8217;s a chart showing the final ramp up from around 15% of requests going to ES up to 100%:</p>
<p><a href="https://blog.timbunce.org/wp-content/uploads/2013/07/es-vs-pg-high-traffic-hr.png" target="_blank"><img style="display:block;margin-left:auto;margin-right:auto;border:0;" alt="Chart ES and PG at higher traffic" src="https://blog.timbunce.org/wp-content/uploads/2013/07/es-and-pg-at-higher-traffic.png?w=600&#038;h=320" width="600" height="320" border="0" /></a></p>
<p>You can see that at 15% the mean and 90th percentile performance of ES+PG closely matched that of PG alone. At 100% ES+PG was not only clearly faster than PG alone, but the 90th percentile was close to the mean of PG alone. Since then we&#8217;ve upgraded ES to a more recent version and increased the memory on each node to 16GB. Now the mean search time is a steady 100ms and the 90th percentile hovers around 150ms.</p>
<h2>Scalability</h2>
<p>We&#8217;re using multicast discovery so there&#8217;s zero configuration. We can deploy a new server and the new Elasticsearch node will join the cluster and automatically distribute the data and query workload. It really is as simple as that.</p>
<h2>Reliability</h2>
<p>We&#8217;ve only had one problem that I can recall where Elasticsearch behaved strangely. Even that didn&#8217;t stop search requests, it only affected building a new index. Restarting the cluster fixed it.</p>
<p>That was with an early 0.20.x release and we&#8217;ve had no recurrence after upgrading. We&#8217;re on the latest 0.20.x now and plan to move to 0.90.x before long. (An upgrade that should significantly boost performance again.)</p>
<h2>Next Steps</h2>
<p>We&#8217;ve been impressed with Elasticsearch as a search solution,  in terms of functionality, reliability and performance. Delighted by the support from Clinton and the IRC community. And amazed at the range of <a href="http://www.elasticsearch.org/guide/reference/modules/plugins/">plugins</a> being developed.</p>
<p>We&#8217;re pushing full listing data into Elasticsearch now, and writing modules to better abstract the searching so it can be used more easily in other applications. We&#8217;re also happily cooking up plans to use more Elasticsearch features, like <a href="http://www.elasticsearch.org/guide/reference/api/percolate/">percolate</a>, in other projects.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">584</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2013/07/es-and-pg-low-traffic.png" medium="image">
			<media:title type="html">Chart of ES and PG low traffic</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2013/07/es-and-pg-at-higher-traffic.png" medium="image">
			<media:title type="html">Chart ES and PG at higher traffic</media:title>
		</media:content>
	</item>
		<item>
		<title>NYTProf v5 &#8211; Flaming Precision</title>
		<link>https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/</link>
					<comments>https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Mon, 08 Apr 2013 22:27:32 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[nytprof]]></category>
		<category><![CDATA[performance]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=557</guid>

					<description><![CDATA[As soon as I saw a Flame Graph visualization I knew it would make a great addition to NYTProf. So I&#8217;m delighted that the new Devel::NYTProf version 5.00, just released, has a Flame Graph as the main feature of the index page. In this post I&#8217;ll explain the Flame Graph visualization, the new &#8216;subroutine calls &#8230; <a href="https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/" class="more-link">Continue reading <span class="screen-reader-text">NYTProf v5 &#8211; Flaming&#160;Precision</span></a>]]></description>
										<content:encoded><![CDATA[<p>As soon as I saw a <a href="http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/">Flame Graph</a> visualization I knew it would make a great addition to NYTProf. So I&#8217;m delighted that the new <a href="https://metacpan.org/module/Devel::NYTProf">Devel::NYTProf</a> version 5.00, just released, has a Flame Graph as the main feature of the index page.</p>
<p><a href="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png"><img loading="lazy" data-attachment-id="556" data-permalink="https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/nytprof-v5-flamegraph-png/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png" data-orig-size="1200,618" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}" data-image-title="nytprof-v5-flamegraph.png" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=676" src="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=676" alt="nytprof-v5-flamegraph.png"   class="alignnone size-full wp-image-556" srcset="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=440&amp;h=227 440w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=880&amp;h=453 880w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=150&amp;h=77 150w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=300&amp;h=155 300w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png?w=768&amp;h=396 768w" sizes="(max-width: 440px) 100vw, 440px" /></a></p>
<p>In this post I&#8217;ll explain the Flame Graph visualization, the new &#8216;subroutine calls event stream&#8217; that makes the Flame Graph possible, and other recent changes, including improved precision in the subroutine profiler.<span id="more-557"></span></p>
<h2>Precision</h2>
<p>Let&#8217;s start with the improved precision. That work was actually released a few months ago in Devel::NYTProf 4.23 but not announced.</p>
<p>Devel::NYTProf started life as a line/statement profiler, writing a stream of events, one per statement. It&#8217;s important for speed that the stream is space efficient, so statement times were expressed as integer microseconds (a &#8216;tick&#8217;) and written in a compressed form. Values less than 128&micro;s use a single byte. This worked very well for v1. Back in early 2008 minimum statement times were typically just a few microseconds.</p>
<p>When I added the subroutine profiler I chose to use <a href="http://en.wikipedia.org/wiki/Double-precision_floating-point_format">double precision floating point</a> values to hold the subroutine call times with seconds as the units. I presume that seemed reasonable at the time as microseconds (multiples of 1e-6) can be accurately stored double precision floating point values and are significantly above the typical <a href="http://en.wikipedia.org/wiki/Machine_epsilon">machine epsilon</a> of 2.220446e-16.</p>
<p>I&#8217;d assumed the values weren&#8217;t at risk from the pernicious effect of <a href="http://en.wikipedia.org/wiki/Floating_point#Machine_precision_and_backward_error_analysis">cumulative round-off errors</a>. The situation got worse with NYTProf v2 because that switched the clock &#8216;tick&#8217; from 1&micro;s to 100ns on some systems (those with POSIX realtime clock API and OS X). And then worse again when profiling of &#8216;slowops&#8217; was added in NYTProf v3 since slowops are often far from slow.</p>
<p><code>$ perl -we '$n=10_000_000; $t=0.0; $i=3/$n; $t+=$i while $n--; print "$t\n";'<br />
2.99999999961925</code></p>
<p>The way the subroutine profiler works, calculating inclusive and exclusive times as it goes, makes it sensitive to these accumulated errors. (Sometimes a subroutine that did nothing but call a very fast subroutine many times could be reported as having taken less time than the sum of the times in the subroutine it called.)</p>
<p>The subroutine profiler still uses double precision floating point values to accumulate the times, but now accumulates integer ticks instead of fractional seconds.</p>
<p><code>$ perl -we '$n=10_000_000; $t=0.0; $i=3.0; $t+=$i while $n--; $t/=10_000_000; print "$t\n";'<br />
3</code></p>
<p>(The <code>$t=0.0</code> and <code>$i=3.0</code> ensure perl is using floating point values in that example. I checked it with <a href="https://metacpan.org/module/Devel::Peek">Devel::Peek</a>.)</p>
<h2>Subroutine Call Events</h2>
<p>There&#8217;s one thing the old and <a href="https://blog.timbunce.org/2008/07/12/devel-dprof-broken-by-the-passage-of-time/">deeply flawed</a> Devel::DProf profiler can do that NYTProf hasn&#8217;t been able to: the DProf <a href="https://metacpan.org/module/dprofpp">dprofpp</a> utility can generate a <em>subroutine call tree</em>.</p>
<p>NYTProf hasn&#8217;t been able to do that because its subroutine profiler worked entirely in memory, accumulating aggregate data about each  <em>call arc</em>, but not outputting anything until the end of the profile. So all the calls on any given arc are merged together.</p>
<p>NYTProf v5 adds a new <code>calls</code> option that enables streaming of subroutine call events as they happen. With <code>calls=2</code> subroutine call and return events are generated. With <code>calls=1</code> (the default) only subroutine return events are generated. (A curious side effect of perl internals and the way NYTProf works means it can&#8217;t <em>reliably</em> know the name of the subroutine at call entry time. So the call entry event isn&#8217;t very useful at the moment.)</p>
<p>The call return events are sufficient to recreate a call tree, albeit with some expensive massaging of the data. NYTProf does this with the new <code>nytprofcalls</code> utility which reads and processes the stream of call return events. At the moment it&#8217;s undocumented, rather hackish, and only generates the call data in a collapsed form suitable for generating a flamegraph (more below). It could be extended to produce a call tree without too much work. Then, finally, the ghost of Devel::DProf can be laid to rest.</p>
<h2>Flame Graph</h2>
<p><a href="http://twitter.com/brendangregg">Brendan Gregg</a> developed the Flame Graph as a way to visualize very large volumes of stack traces sampled by <a href="http://dtrace.org/blogs/about/">DTrace</a>.</p>
<p>It&#8217;s a wonderfully compact and information-rich way to visualize the where a program is spending its time. It&#8217;s also unusual and potentially confusing, so a little explanation is required. Keep in mind that it&#8217;s a visualization of <em>distinct call stacks</em> and that the colors are not meaningful.</p>
<p>The y-axis represents stack depth. Each box represents the spent time in a particular subroutine <em>when called by the subroutine below it</em>. So a particular subroutine will appear in multiple places if called via different call stacks.</p>
<p>The x-axis spans the time the profiler was running. It does not show the passing of time from left to right, as most graphs do. The left to right ordering has no meaning (it&rsquo;s sorted alphabetically).</p>
<p>The width of the box shows the inclusive time the subroutine was running, or part of the ancestry of subroutines that were running (the boxes above it). Wider box functions may be slower than narrow box functions, <em>or</em> they may simply be called more often. The call count is not shown.</p>
<p>Brendan&#8217;s original flamegraph script generated an SVG that wasn&#8217;t well suited to embedding in an application like NYTProf. He&#8217;s kindly accepted a series of <a href="https://github.com/brendangregg/FlameGraph/pulls/timbunce?direction=desc&amp;page=1&amp;sort=created&amp;state=closed">pull requests</a> to add the key features I was looking for. The most important being the ability to make the boxes clickable: click on a box and you&#8217;ll be taken to the report for that subroutine!</p>
<p>Let&#8217;s take a closer look at a simple example using a recursive Fibonacci function:</p>
<blockquote>
<pre>sub fib {
    my $n = shift;
    return $n if $n &lt; 2;
    fib($n-1) + fib($n-2);
}
sub foo { fib(8) }
sub bar { fib(8) }
foo();
bar();</pre>
</blockquote>
<p>That gives us a Flame Graph like this:</p>
<p><a href="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png"><img loading="lazy" data-attachment-id="565" data-permalink="https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/nytprof-v5-flamegraph-fib/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png" data-orig-size="2397,449" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}" data-image-title="nytprof-v5-flamegraph-fib" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=676" src="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=676" alt="nytprof-v5-flamegraph-fib"   class="alignnone size-full wp-image-565" srcset="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=440&amp;h=82 440w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=880&amp;h=165 880w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=150&amp;h=28 150w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=300&amp;h=56 300w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png?w=768&amp;h=144 768w" sizes="(max-width: 440px) 100vw, 440px" /></a></p>
<p>The line at the bottom that spans the full width represents the entire profile run. In this case it was 778&micro;s. (Hover over any block to see the time &#8211; you can see one in the image, along with the bold and bordered box it relates to).</p>
<p>The first line above that shows the calls to <code>foo</code> and <code>bar</code>. The line for those is shorter than the total line because the total includes the time perl spent compiling the script. It shows up clearly here because this script is so fast.</p>
<p>Then, above the blocks for both <code>foo</code> and <code>bar</code>, you can see the recursive calls to <code>fib</code> rising like flames (okay, with a little imagination). Two things to note here. Firstly <code>bar</code> is shown to the left of <code>foo</code> simply because the names at each level are in lexicographic order. There&#8217;s no deeper meaning in the ordering.</p>
<p>Secondly, you can easily see that <code>bar</code> was faster (narrower) than <code>foo</code>, even though they contain the same code. Why&#8217;s that? When <code>foo</code> ran first it would have paid the price for growing the stacks and warming the memory pages. Then when <code>bar</code> was called it gained from <code>foo</code>&#8216;s work.</p>
<h2>Flame Graph Generator</h2>
<p>Behind the scenes <code>nytprofhtml</code> runs <code>nytprofcalls</code> to generate a file in the report directory called <code>all_stacks_by_time.calls</code>. It then calls <code>flamegraph.pl</code> to read that file and generate the <code>all_stacks_by_time.svg</code> that&#8217;s shown in the report.</p>
<p>The <code>all_stacks_by_time.calls</code> has a very simple format. One line per distinct call stack, with subroutine names separated by semicolons, followed by a number (which is either in 1&micro;s or 100ns units depending on the platform). Here&#8217;s an example running the code above but calling <code>fib(2)</code> instead of <code>fib(8)</code> to keep it small:</p>
<pre><code>main::bar 37
main::bar;main::fib 45
main::bar;main::fib;main::fib 19
main::foo 416
main::foo;main::fib 222
main::foo;main::fib;main::fib 61
</code></pre>
<p>This simple format is perfect for grep&#8217;ing! You can effectively zoom-in on any subset of the call stacks by generating a flamegraph of just the stacks that contain the functions you&#8217;re interested in. For example, running this command on the profile of <a href="https://metacpan.org/module/Perl::Critic">perlcritic</a> shown at the top:</p>
<p><code>grep -w Perl::Critic::Policy::new nytprof/all_stacks_by_time.calls | flamegraph.pl &gt; tmp.svg &amp;&amp; open tmp.svg</code></p>
<p>gives you this Flame Graph:</p>
<p><a href="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png"><img loading="lazy" data-attachment-id="573" data-permalink="https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/nytprof-v5-flamegraph-grep/" data-orig-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png" data-orig-size="2399,769" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}" data-image-title="nytprof-v5-flamegraph-grep" data-image-description="" data-image-caption="" data-medium-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=300" data-large-file="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=676" src="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=676" alt="nytprof-v5-flamegraph-grep"   class="alignnone size-full wp-image-573" srcset="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=440&amp;h=141 440w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=880&amp;h=282 880w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=150&amp;h=48 150w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=300&amp;h=96 300w, https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png?w=768&amp;h=246 768w" sizes="(max-width: 440px) 100vw, 440px" /></a></p>
<p>You can see that a lot of time is being spent gathering stack traces for exceptions (this is with perlcritic 1.118 on perl v5.14.2).</p>
<p>It would be nice to have a Flame Graph generated for each of the top-N files/modules, showing just the subset of call stacks that involve any of the subroutines defined in that file. I didn&#8217;t get around to that for v5.00. Feel free to <a href="https://github.com/timbunce/devel-nytprof">fork the code</a>, add that in, and send me a pull request!</p>
<h2>Minor Changes</h2>
<p>The very old and very limited <code>nytprofcsv</code> utility has been deprecated. Let me know if you use it, otherwise it would be around much longer.</p>
<p>The <code>blocks</code> option is no longer on by default &#8211; it seems that few people used the ability to view statement times rolled up at the block level. You can always enable it with <code>blocks=1</code> in the options.</p>
<h2>What Next?</h2>
<p>For NYTProf? I don&#8217;t know.</p>
<p>Next up on <em>my</em> to-do list is giving <a href="https://blog.timbunce.org/2012/10/05/introducing-develsizeme-visualizing-perl-memory-use/">Devel::SizeMe</a> the love it needs. There&#8217;s some deep work I&#8217;d really like to get done before <a href="http://www.yapcna.org">YAPC::NA</a> in June.</p>
<p>Maybe I&#8217;ll see you there.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">557</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph.png" medium="image">
			<media:title type="html">nytprof-v5-flamegraph.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-fib2.png" medium="image">
			<media:title type="html">nytprof-v5-flamegraph-fib</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2013/04/nytprof-v5-flamegraph-grep.png" medium="image">
			<media:title type="html">nytprof-v5-flamegraph-grep</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Alternatives as a MetaCPAN feature</title>
		<link>https://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/</link>
					<comments>https://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Sun, 10 Mar 2013 22:16:06 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[cpan]]></category>
		<category><![CDATA[metacpan]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=549</guid>

					<description><![CDATA[I expressed this idea recently in a tweet and then started writing it up in more detail as a comment to Brendan Byrd&#8217;s The Four Major Problems with CPAN blog post. It grew in detail until I figured I should just write it up as a blog post of my own.(I fell out of the &#8230; <a href="https://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/" class="more-link">Continue reading <span class="screen-reader-text">Suggested Alternatives as a MetaCPAN&#160;feature</span></a>]]></description>
										<content:encoded><![CDATA[<p>I expressed this idea recently in <a href="https://twitter.com/timbunce/status/310385898971869186">a tweet</a> and then started writing it up in more detail as a comment to Brendan Byrd&#8217;s <a href="http://blogs.perl.org/users/brendan_byrd/2013/03/the-four-major-problems-with-cpan.html">The Four Major Problems with CPAN</a> blog post. It grew in detail until I figured I should just write it up as a blog post of my own.<span id="more-549"></span>(I fell out of the way of blogging over the two years or so of focus and distraction that our major <a href="https://blog.timbunce.org/2011/06/29/building-a-different-kind-of-extension/">house extension</a> took to go from conception to reality. I&#8217;ve been meaning to start blogging again more regularly anyway. I&#8217;ve a few blog posts brewing in the back of my mind, so we&#8217;ll see how it goes.)</p>
<p>In Brendan&#8217;s <a href="http://blogs.perl.org/users/brendan_byrd/2013/03/the-four-major-problems-with-cpan.html">post</a> he describes four problems with CPAN:</p>
<ol>
<li>Too many modules are unmaintained; abandoned but not marked as such.</li>
<li>There is not enough data on what modules are mature; which ones are the &#8220;right ones&#8221; to use.</li>
<li>Many modules are only used for semi-private needs.</li>
<li>Modules cannot be renamed or deleted, even with a long-term deprecation process.</li>
</ol>
<p>I&#8217;d like to propose a feature that doesn&#8217;t seem to address these issues directly but would, I believe, greatly reduce the significance of all of them. </p>
<p>Olaf Alders responded to Brendan&#8217;s post with <a href="http://blogs.perl.org/users/olaf_alders/2013/03/sifting-through-the-cpan.html">Sifting Through the CPAN</a> and pointed out the need for better search tools and specifically suggests tagging. While tagging might be helpful in general I think we need a way to explicitly guide users from one module to another.</p>
<h2>Suggested Alternatives</h2>
<p>I&#8217;ve long thought that CPAN would benefit from a mechanism to track &#8220;suggested alternative modules&#8221;. (And/or perhaps &#8220;suggested alternative <em>distributions</em>&#8220;, but I&#8217;ll just talk about modules for now.)</p>
<p>I envisage a &#8220;Suggested Alternatives&#8221; section in the right sidebar on every module page. It would show the top-N suggestions, with a [++] icon beside each, ordered by the number of people who have made the suggestion or agreed to it by pressing the [++] icon. And naturally it would have a text field to enter an existing module name, with type-ahead suggestions. Finally, the Suggested Alternatives heading would be a link to a details page.</p>
<p>The details page would show, for that module, every instance of a suggestion being made or up-voted, with the user and the date. That would let people see who made the suggestion and when. Users would be able to remove their own suggestions.</p>
<p>For modules that are the suggested alternative for some other module, their page could show something like &#8220;Suggested as the alternative to X other modules by Y people&#8221; with a link to a page that would show the corresponding details.</p>
<p>With something like this in place &#8220;unmaintained, abandoned&#8221; modules would gather suggested alternatives. Mature &#8216;good&#8217; modules would tend to accumulate suggestions pointing towards them, while mature &#8216;poor&#8217; modules would tend to accumulate suggestions pointing away. Experiments and obscure &#8220;private needs&#8221; modules wouldn&#8217;t gather suggestions and that, combined with the higher ranking of modules with votes and inward pointing suggestions, means they&#8217;d languish in obscurity doing little harm.</p>
<h2>The Alternatives Graph</h2>
<p>This &#8220;alternatives&#8221; data creates a <em>graph of relationships</em> among similar modules in a powerful and directly useful way.</p>
<p>For search results it would be useful not only for ranking but also for widening the search. Modules that are the suggested alternatives for modules in the &#8216;natural&#8217; results could be included. That&#8217;s potentially a big win.</p>
<p>Of course it would be perfectly reasonable for a pair of modules to have suggestions pointing to each other. Or for there to be loops of suggestions. That&#8217;s fine and simply expresses the conflicting views of the users making the suggestions.</p>
<h2>Similar Modules (a digression)</h2>
<p>I also had the idea that there may be value in having a &#8216;similar modules&#8217; link that shows the list of modules produced by traversing the graph of suggestions for some number of hops in both directions, and ranked by some combination of votes and placement in the graph.</p>
<p>But then I wondered if that would be better implemented an explicit way to suggest a &#8216;similar module&#8217;. In other words, generalize the idea of a &#8220;suggested alternative&#8221; into a &#8220;related module&#8221; relationship plus attributes like a &#8220;weight&#8221;. Where a positive weight denotes a &#8220;suggested alternative&#8221; and a zero weight is simply a &#8220;similar module&#8221; or a &#8220;see also&#8221;. Perhaps there&#8217;s also value in having a &#8220;complementary module&#8221; relationship.</p>
<p>This is all a bit vague. It suggests to me that any code to support a &#8220;module relationship&#8221; mechanism should be kept generic to allow for other kinds of relationships in future.</p>
<h2>The Whys and Wherefores</h2>
<p>The primary data of the graph is a link from one module to another with a count of the number of people who agreed with that suggestion.</p>
<p>That surface data is built from a deeper layer that records, for each link, which users that made the suggestion and when.</p>
<p>A helpful extra feature would be to let users optionally give a short reason for <em>why</em> <em>they</em> are suggesting <em>this</em> particular alternative. Perhaps because they feel it&#8217;s unmaintained, or lacks specific features that their suggested alternative has.</p>
<p>Suggestions without the whys would be very useful, and I&#8217;d suggest that that much is implemented first. But suggestions without explanations are also very limited. Knowing what motivated someone to suggest a particular alternative would be <em>very</em> helpful to others trying to pick a module for a task. For example, people might make multiple alternative suggestions recommending Bar instead of Foo if you want a certain feature, and Baz instead of Foo if they want another. </p>
<p>I don&#8217;t think there&#8217;s much risk of this becoming a comment battlefield because on any given page all the comments share the same direction &#8216;away&#8217; from the module. Someone with an opposing viewpoint would add a separate suggestion with their own comments on the &#8216;opposite&#8217; module.</p>
<p>I&#8217;d suggest the comment field be kept <em>very</em> short, say 50 characters, and provide a separate url to encourage referencing supporting material such as a blog post or mailing list archive.</p>
<p>Other approaches might be to have a few checkboxes with typical reasons (very limited), or perhaps tags, or link in with <a href="http://cpanratings.perl.org">cpanratings</a> in some way (possibly complex).</p>
<h2>Alternative Distributions</h2>
<p>The best way to build and present Alternative Distributions data is probably to simply derive it from the Alternative Modules data.</p>
<p>It would simply be a read-only view that collapses the module level graph data down to links between the corresponding distributions.</p>
<h2>Yanick Steps Up</h2>
<p>After writing a draft of this post I saw a <a href="https://twitter.com/yenzie/status/310839547250499586">tweet</a> from <a href="https://twitter.com/yenzie">Yanick</a> with a link to <a href="http://babyl.dyndns.org/techblog/entry/metacpan-recommendations">a specific proposal on his blog</a>. I skimmed it, realised it was similar to mine and replied saying to I&#8217;d reference it here. I decided I&#8217;d finish my post before reading it properly.</p>
<p>So here are my thoughts on Yanick&#8217;s suggestions:</p>
<p>Distributions vs Modules: Modules are the fundamental unit of use and the natural focus of attention and reviews. It&#8217;s relatively easy to derive distribution suggestions from module suggestions, but not the other way around. Using modules as the focus also means the suggestions will still be valid if a module moves from one distribution to another.</p>
<p>Adding notes: I agree that comments are best avoided for the <em>initial</em> system. I also feel strongly that their value outweighs their risks if implemented and presented carefully, so they should at least be taken into account in the initial design work.</p>
<p>User interface for recommending an alternative: Having a button beside the existing high-profile vote button doesn&#8217;t feel right to me. The vote button is a positive action and encouraging low-friction drive-by voting makes sense. Suggesting an alternative is a more negative action, and one to be considered more carefully. Using the sidebar seems more appropriate.</p>
<p>User interface for viewing suggestion alternatives: I&#8217;d rather not include any user names on the module page. It complicates the code and confuses the user experience (&#8220;which names are shown and why?&#8221; etc). The full details are available on the detail page if anyone wants to take the extra step to see them.</p>
<p>Volunteering to <em>do something</em>: Awesome!</p>
<p>Thanks Yanick.</p>
<p>Update: Implementation is being discussed on <a href="https://github.com/CPAN-API/cpan-api/issues/253">this cpan-api ticket</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">549</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Introducing Devel::SizeMe &#8211; Visualizing Perl Memory Use</title>
		<link>https://blog.timbunce.org/2012/10/05/introducing-develsizeme-visualizing-perl-memory-use/</link>
					<comments>https://blog.timbunce.org/2012/10/05/introducing-develsizeme-visualizing-perl-memory-use/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Fri, 05 Oct 2012 11:49:03 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[sizeme]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=540</guid>

					<description><![CDATA[For a long time I&#8217;ve wanted to create a module that would shed light on how perl uses memory. This year I decided to do something about it. My research and development didn&#8217;t yield much fruit in time for OSCON in July, where my talk ended up being about my research and plans. (I also &#8230; <a href="https://blog.timbunce.org/2012/10/05/introducing-develsizeme-visualizing-perl-memory-use/" class="more-link">Continue reading <span class="screen-reader-text">Introducing Devel::SizeMe &#8211; Visualizing Perl Memory&#160;Use</span></a>]]></description>
										<content:encoded><![CDATA[<p>For a long time I&#8217;ve wanted to create a module that would shed light on how perl uses memory. This year I decided to do something about it.</p>
<p>My research and development didn&#8217;t yield much fruit in time for OSCON in July, where <a href="http://www.slideshare.net/Tim.Bunce/perl-memory-use-201207">my talk</a> ended up being about my research and plans. (I also tried to explain that RSS isn&#8217;t a useful measurement for this, and that malloc buffering means even total process size isn&#8217;t a very useful measurement.) I was invited to speak at <a href="http://yapcasia.org/2012/">YAPC::Asia</a> in Tokyo in September and <em>really</em> wanted to have something worthwhile to demonstrate there.</p>
<p>I&#8217;m delighted to say that some frantic hacking (aka Conference Driven Development) yielded a working demo just in time and, after a little more polish, I&#8217;ve now uploaded <a href="http://search.cpan.org/perldoc?Devel%3A%3ASizeMe">Devel::SizeMe</a> to CPAN.</p>
<p>In this post I want to introduce you to Devel::SizeMe, show some screenshots, a <a href="http://blip.tv/timbunce/perl-memory-use-and-devel-sizeme-at-yapc-asia-2012-6381282">screencast of the talk and demo</a>, and outline current issues and plans for future development.<span id="more-540"></span></p>
<p>For a while I thought <a href="https://blog.timbunce.org/tag/nytprof/">Devel::NYTProf</a> might be a useful framework for building some kind of &#8220;memory profiler&#8221;. Something that would measure changes in memory use over time between lines and subroutines. Nicholas Clark even created a clever experimental hack to demo the concept. Sadly the data just didn&#8217;t seem to be very useful. It turns out that knowing where memory is <em>allocated</em> and <em>freed</em> isn&#8217;t nearly as important as knowing where memory is being <em>held</em>.</p>
<h2>The Plan</h2>
<p>It was clear that some kind of &#8216;snapshot&#8217; mechanism was needed. Something that would:</p>
<ol>
<li>crawl <em>all</em> the data structures within a perl interpreter</li>
<li>have some way of <em>naming</em> the path to each data structure</li>
<li>stream the data out for external storage and processing</li>
<li>be fast enough that snapshots could be taken frequently</li>
<li>visualize the vast amount of data</li>
<li>compare different snapshots</li>
</ol>
<p>Luckily the hardest part, step 1, was already covered by <a href="https://metacpan.org/module/Devel::Size">Devel::Size</a>. Originally written by Dan Sugalski in 2005, then maintained by Tels and BrowserUK, it had been picked up and polished by Nicholas Clark to stay in sync with the many internal optimizations he and others were adding to the perl core. It&#8217;s not without problems, and I&#8217;ll outline those below, but it was a great base for me.</p>
<p>I added a callback mechanism, so my code and others could &#8220;hitch a ride&#8221; on the back of Devel::Size as it crawled the data structures, and came up with a very lightweight way to track and output the &#8220;name path&#8221;.</p>
<h2>Textual Output</h2>
<p>My initial code just wrote a tree-like textual representation to prove the concept:</p>
<pre style="line-height:normal;">$ SIZEME='' perl -MDevel::SizeMe=total_size -e 'total_size([ 1, "hi", [] ])'
SV(PVAV) fill=2/2		[#1 @0] 
:   +24 sv_head =24
:   +40 sv_body =64
:   +24 av_max =88
:   ~note av_len 2
:   AVelem-&gt;		[#2 @1] 
:   :   SV(RV)		[#3 @2] 
:   :   :   +24 sv_head =112
:   :   :   RV-&gt;		[#4 @3] 
:   :   :   :   SV(PVAV) fill=-1/-1		[#5 @4] 
:   :   :   :   :   +24 sv_head =136
:   :   :   :   :   +40 sv_body =176
:   :   ~note i 2
:   AVelem-&gt;		[#6 @1] 
:   :   SV(PV)		[#7 @2] 
:   :   :   +24 sv_head =200
:   :   :   +16 sv_body =216
:   :   :   +16 SvLEN =232
:   :   ~note i 1
:   AVelem-&gt;		[#8 @1] 
:   :   SV(IV)		[#9 @2] 
:   :   :   +24 sv_head =256
:   :   ~note i 0
</pre>
<p>There you can see the array (PVAV) &#8216;node&#8217; with &#8216;leaf&#8217; sizes for the sv_head (24 bytes), sv_body (40 bytes), and the array of element pointers (av_max, 24 bytes). Below that you can see a &#8216;link&#8217; called AVelem pointing to a reference (RV) to an array with no elements. The &#8220;~note&#8221; lines are &#8216;attributes&#8217; that can be used to provide extra information about nodes. The &#8216;<code>=<em>NNN</em></code>&#8216; gives a running total of the accumulated size.</p>
<p>The terminology here (sv_head, sv_body, av_max etc.) might not be familiar to you unless you&#8217;ve spent time <a href="http://cpansearch.perl.org/src/RURBAN/illguts-0.42/index.html">delving into perl guts</a>. Hopefully, though, it&#8217;s clear that Devel::SizeMe gives access to <em>immense detail</em>.</p>
<h2>Graph Visualization</h2>
<p>That detail can quickly become overwhelming for non-trivial data structures. Some kind of visualization was needed. So I added a more compact &#8216;raw&#8217; output format and a script (sizeme_store.pl) to process it. The script &#8216;decorates&#8217; the nodes with the leaf and attribute data, gives the links better names, and adds extra details like the total size of the children.</p>
<pre>$ SIZEME='|sizeme_store.pl --dot=sizeme.dot' perl -MDevel::SizeMe=total_size -e 'total_size([ 1, "hi", [] ])'</pre>
<p>The SIZEME env var gives the name of the file to write the raw data to, or in this case the name of a program to pipe the data into. Here I&#8217;m asking sizeme_store.pl to write a <a href="http://www.graphviz.org/content/dot-language">dot format</a> file which, when rendered by <a href="http://www.graphviz.org">Graphviz</a>, produces a graph like this:</p>
<p><img style="display:block;margin-left:auto;margin-right:auto;" src="https://blog.timbunce.org/wp-content/uploads/2012/10/screen-shot-2012-10-04-at-22-55-181.png?w=400&#038;h=228" alt="Screen Shot 2012-10-04 at 22.55.18.png" border="0" width="400" height="228" /></p>
<p>You can see the links have been labeled with the index attribute, and the nodes show how the size is calculated (self+children=total) and the sizes accumulate up the graph.</p>
<p>That&#8217;s lovely, and works well for modestly sized data structures. It doesn&#8217;t scale well though. You quickly find yourself looking at diagrams like this:</p>
<p><img style="display:block;margin-left:auto;margin-right:auto;" src="https://blog.timbunce.org/wp-content/uploads/2012/10/screen-shot-2012-10-04-at-22-31-09.png?w=600&#038;h=193" border="0" width="600" height="193" /></p>
<h2>Treemap Visualization</h2>
<p>The graph visualization is rather more impressive than it is practical. A more useful visualization for this kind of data is an interactive <a href="http://en.wikipedia.org/wiki/Treemapping">treemap</a>. Where the size of the boxes represents the memory use and you can drill-down into the data structures. To do that, and have it work on massive data dumps, I needed some kind of database and tree map code that supported on-demand loading. I opted for <a href="http://sqlite.org">SQLite</a> as the data store, the <a href="http://thejit.org">JavaScript InfoVis Toolkit</a> for the tree map code, and <a href="https://metacpan.org/module/Mojolicious::Lite">Mojolicious::Lite</a> as the web app framework.</p>
<pre>$ SIZEME='|sizeme_store.pl --db=sizeme.db' perl -MDevel::SizeMe=total_size -e 'total_size([ 1, "hi", [] ])'</pre>
<p>That&#8217;s asking sizeme_store.pl to produce a sizeme.db file. Then, to visualize the data you can run sizeme_graph.pl to launch the web app:</p>
<pre>$ sizeme_graph.pl --db=sizeme.db daemon</pre>
<p>then visit <a href="http://127.0.0.1:3000/" rel="nofollow">http://127.0.0.1:3000/</a> to see the result:</p>
<p><img style="display:block;margin-left:auto;margin-right:auto;" src="https://blog.timbunce.org/wp-content/uploads/2012/10/screen-shot-2012-10-04-at-22-58-18.png?w=600&#038;h=364" alt="Screen Shot 2012 10 04 at 22 58 18" border="0" width="600" height="364" /></p>
<p>The overall grey area, which has a title bar labeled &#8220;SV(PVAV)&#8221;, represents the total memory used by the structure. The area is divided into three parts for the three elements of the array. The smallest, labeled &#8220;[0]-&gt; SV(IV)&#8221;, is the integer. The next larger one, labeled &#8220;[1]-&gt; SV(PV)&#8221;, is the string. The largest area is the array reference. Because the referenced array was empty the logic in sizeme_graph.pl has &#8216;collapsed&#8217; the array into the parent node to simplify the tree map. This is reflected in the label &#8220;[2]-&gt; SV(RV) RV-&gt; SV(AV)&#8221;.</p>
<p>The darker box is a tooltip that moves with the pointer and displays extra detail about whatever node the pointer hovers over. In this case it&#8217;s showing that the total memory use is 88 bytes (the head and the body size of the RV and the AV have been summed up). The rest of the content is mostly debugging information. They&#8217;ll be more useful info here in future.</p>
<h2>The Whole Picture</h2>
<p>The total_size($ref) function dumps the contents of a particular data structure. But it&#8217;s not enough to get the whole picture. For that I wanted to be able to dump <em>everything</em> in a perl interpreter. Executing total_size(\%main::) gets closer to everything, but it&#8217;s still a long way off.</p>
<p>So I added a <code>perl_size()</code> function. That starts by dumping the stashes (<code>\%main::</code>, or in internals speak PL_defstash) but then goes on to dump many more items you might never have realized existed. PL_stashcache, PL_regex_padav, PL_encoding, PL_modglobal, and PL_parser to name but a few. It then records the amount of unused space in perl&#8217;s arenas.</p>
<p>Finally then scans the arenas looking for any values that haven&#8217;t been seen yet. Currently this finds quite a lot because the <code>perl_size()</code> code isn&#8217;t complete yet. (Many thanks to <a href="https://metacpan.org/author/FLORA">rafl</a> for helping improve the coverage here.) Once it&#8217;s complete, any unseen values found in the arenas will be leaks. So Devel::SizeMe may turn into a useful leak detection tool.</p>
<p>Taking this idea further, there&#8217;s also a <code>heap_size()</code> function. The goal here is to try to account for everything in the heap. (See <a href="http://www.slideshare.net/Tim.Bunce/perl-memory-use-yapcasia2012">my slides</a> if you&#8217;re not familiar with that term.) The one key item here is asking malloc for information about how much memory it&#8217;s using and, especially, how much &#8216;free&#8217; memory it&#8217;s holding on to, for malloc&#8217;s which support that.</p>
<h2>See It In Action</h2>
<p>This explanation is rather dry. To get a real sense of what Devel::SizeMe can do you need to see it in action with some non-trivial data. Here&#8217;s a screencast of my Perl Memory Use talk at YAPC::Asia (also available as a raw mov <a href="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov">here</a> and <a href="http://a20.video4.blip.tv/3990001922812/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov">here</a>, mv4 <a href="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v">here</a> and <a href="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v">here</a>, and mp4 <a href="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4">here</a> and <a href="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4">here</a>). The demonstration starts at 13:00.</p>

<h2>Simple Usage</h2>
<p>Just four steps:</p>
<ol>
<li>cpanm Devel::SizeMe # install the module</li>
<li>perl -d:SizeMe &#8230;your.script.here&#8230;</li>
<li>sizeme_graph.pl daemon</li>
<li>open <a href="http://127.0.0.1:3000/" rel="nofollow">http://127.0.0.1:3000/</a></li>
</ol>
<p>Devel::SizeMe notices that it&#8217;s been run as <code>perl -d:SizeMe</code> and arranges to automatically call <code>perl_size()</code> in an <code>END</code> block. Simple.</p>
<h2>Current Issues</h2>
<p>There are two weakness with the current Devel::Size logic that affect Devel::SizeMe.</p>
<p>The first is that it uses a simple depth-first search. That&#8217;s fine when just calculating a total, but for Devel::SizeMe it means that chasing references held by one named item, like a subroutine, can lead to all sorts of other items, including entire stashes, appearing to be &#8220;within&#8221; the item that held the reference. The second is that Devel::Size doesn&#8217;t have well defined sense of when to stop chasing references because it doesn&#8217;t consider reference counts.</p>
<p>So I plan to add a multi-phase search mechanism. References with a count of 1 will be followed immediately. References with a count greater than one will be queued, along with a count of how many times the reference has been seen so far. In this way all the &#8216;named&#8217; data reachable from <code>%main::</code> will be found first and identified with their natural names before the queued items are crawled. This should greatly improve the output.</p>
<p>More coverage is needed in perl_size() to reduce the number of &#8216;unseen&#8217; items that show up in the arenas, as seen in the screencast.</p>
<h2>Future Plans</h2>
<p>A priority is to get my changes to the core of Devel::Size integrated back in. It would be crazy to have two modules duplicating this sometimes complex and perl-version-specific logic. My goal is to have a single C file that&#8217;s used by both modules. Each would compile it with different macros to enable the required behavior. This should enable Devel::Size to suffer no performance loss for the extra logic that Devel::SizeMe has added.</p>
<p>I&#8217;ve already started adding some support for &#8220;named&#8221; runs. The idea is to enable the size functions to be called multiple times within a single process, and to store the data in separate tables within the database. This is an important step towards being able to compare multiple runs to see how the memory use has changed.</p>
<p>Lots of refactoring is needed to turn my conference-driven-dash-for-the-finish-line hacking into more robust and reusable code. In particular I&#8217;d like to get a reasonable stable and useful database schema so other people can write module to process the data generated by Devel::SizeMe.</p>
<p>Further in the future I can imagine having an option to record the existence of pointers to data that&#8217;s already been seen. That information is currently discarded but would add a great deal of detail to the output. Reference loops would be much easier to see for example. It would turn the output &#8216;tree&#8217; into a <a href="http://en.wikipedia.org/wiki/Directed_graph">directed graph</a> and enable much richer visualizations.</p>
<p>We&#8217;re just at the start.</p>
<p>Enjoy.</p>
<hr />
<p>This page has been translated into <a href="http://www.webhostinghub.com/support/es/misc/presentando-devel-sizeme" rel="nofollow">Spanish</a> language by Maria Ramos  from <a href="http://www.webhostinghub.com/support/edu" rel="nofollow">Webhostinghub.com/support/edu</a>. Thank you Maria.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2012/10/05/introducing-develsizeme-visualizing-perl-memory-use/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="0" type="video/quicktime" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="0" type="video/quicktime" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://blip.tv/file/get/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a20.video4.blip.tv/3990001922812/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="124493455" type="video/quicktime" />
<enclosure url="http://a20.video4.blip.tv/3990001922812/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="124493455" type="video/quicktime" />
<enclosure url="http://a20.video4.blip.tv/3990001922812/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="124493455" type="video/quicktime" />
<enclosure url="http://a20.video4.blip.tv/3990001922812/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="124493455" type="video/quicktime" />
<enclosure url="http://a20.video4.blip.tv/3990001922812/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012311.mov" length="124493455" type="video/quicktime" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a1.video2.blip.tv/13610011292959/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012447.m4v" length="44043750" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />
<enclosure url="http://a11.video2.blip.tv/9620011293108/Timbunce-PerlMemoryUseAndDevelSizeMeAtYAPCAsia2012614.mp4" length="39601412" type="video/mp4" />

		<post-id xmlns="com-wordpress:feed-additions:1">540</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2012/10/screen-shot-2012-10-04-at-22-55-181.png" medium="image">
			<media:title type="html">Screen Shot 2012-10-04 at 22.55.18.png</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2012/10/screen-shot-2012-10-04-at-22-31-09.png" medium="image" />

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2012/10/screen-shot-2012-10-04-at-22-58-18.png" medium="image">
			<media:title type="html">Screen Shot 2012 10 04 at 22 58 18</media:title>
		</media:content>
	</item>
		<item>
		<title>A Space For Thought</title>
		<link>https://blog.timbunce.org/2012/04/08/a-space-for-thought/</link>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Sun, 08 Apr 2012 18:15:33 +0000</pubDate>
				<category><![CDATA[health]]></category>
		<category><![CDATA[life]]></category>
		<category><![CDATA[oscon]]></category>
		<category><![CDATA[toastmasters]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=511</guid>

					<description><![CDATA[This is the text of a speech I originally wrote for the International Speech Competition at my Toastmasters club in April 2012. (I won the club competition and came second in the area competition a week or so later.) In July I gave a slightly modified version, reproduced here, as a 5 minute Lightning Talk &#8230; <a href="https://blog.timbunce.org/2012/04/08/a-space-for-thought/" class="more-link">Continue reading <span class="screen-reader-text">A Space For&#160;Thought</span></a>]]></description>
										<content:encoded><![CDATA[<p>This is the text of a speech I originally wrote for the International Speech Competition at my <a href="http://www.toastmasters.org/" target="_blank">Toastmasters</a> club in April 2012. (I won the club competition and came second in the area competition a week or so later.)</p>
<p>In July I gave a slightly modified version, reproduced here, as a 5 minute Lightning Talk at OSCON in Portland OR.<span id="more-511"></span>I wrote early drafts in the first person, which I prefer to do for material rooted in personal experience, then changed it be mostly second person as that seemed to be more effective in this case. In written form you&#8217;ll miss the gestures and delivery but hopefully the text is clear enough.</p>
<p>It&#8217;s written to spoken quite slowly, with pauses, so please read it that way when you&#8217;ve some time to spare. </p>
<h2>A SPACE FOR THOUGHT</h2>
<p>What is the difference between thought, and the quiet awareness in the space between thoughts?</p>
<p>~pause~</p>
<p>I want to share with you the single most important thing I&#8217;ve learned in my life.</p>
<p>It&#8217;s a shift in how I relate to myself and the world around me.<br />
A change in perspective that has revealed answers to many mysteries;<br />
so much more of the world makes sense to me now.</p>
<p>I really want to share this with you, but I have a problem.</p>
<p>The key idea is so simple that, if you&#8217;re not familiar with it, you probably won&#8217;t believe me.</p>
<p>Or if you are, you may dismiss it as obvious and of no value. Missing the depth and implications of it.</p>
<p>To persuade you I could quote countless great examples from literature, science, art, and everyday life.<br />
Showing you how they fit together and make sense when viewed in this light.</p>
<p>But I don&#8217;t have time.</p>
<p>I only have time to give you a starting point, to plant a seed,<br />
and some suggestions for how to nurture it, in the hope that it can grow and blossom for you too.</p>
<p>Before I share you this simple insight, before I plant this seed,<br />
I need <em>your</em> help to prepare the ground.<br />
I need <em>you</em> to <em>experience</em> something for yourself.</p>
<p>So please join me in a simple exercise in awareness. In paying attention.</p>
<p>Start paying attention now, to the feeling of your left foot.<br />
Just experience your left foot for a while, <em>without thinking about it</em>.</p>
<p>~pause~</p>
<p>What you&#8217;re paying attention <em>to</em> is your foot.<br />
What you&#8217;re paying attention <em>with</em> is in your head.</p>
<p>~pause~</p>
<p>We&#8217;ll do that again now but this time I&#8217;ll say something to prompt some thought.<br />
I want you to notice what happens to your attention when you start thinking.</p>
<p>Return your attention to your foot now.</p>
<p>~pause~</p>
<p>Nine plus seven.</p>
<p>~pause~</p>
<p>Did you notice your attention move away from your foot when you started thinking?<br />
The focus of your awareness moved from your foot into your mind.</p>
<p>Your full attention can&#8217;t be on a thought and something else at the same time.<br />
You need to be <em>aware</em> of the thought, just as you need to be aware of the feeling.</p>
<p>Awareness is primary. Thinking and feeling are secondary.</p>
<p>~pause~</p>
<p>So here&#8217;s the seed I want to plant:</p>
<p><em>You</em> are not your thoughts, just as you are not your feelings.<br />
You, the essence of who you really are, <em>is</em> the awareness.<br />
The conscious awareness within which your thoughts and feelings arise.</p>
<p>~pause~</p>
<p>That&#8217;s it.</p>
<p>It&#8217;s so simple, and yet so delicate.</p>
<p>Easily crushed by the weight of your own thoughts, that are constantly seeking to define you.</p>
<p>~pause~</p>
<p>Having planted the seed, I want to give you three tips for nurturing it, that I have found very helpful.</p>
<p>1st &#8211; Give your thoughts and opinions some space.</p>
<p>View them from a little distance.<br />
Note their contents but <em>don&#8217;t judge them</em>.<br />
Judging involves the thinking mind and you won&#8217;t break free.<br />
Simply note their contents and let them go.</p>
<p>Treat your thoughts as suggestions from a <em>much loved friend</em>.<br />
But a friend who you know is vain, insecure, and untrustworthy.</p>
<p>Noticing how this friend reacts to situations in your life<br />
is a <em>fascinating and rewarding</em> pastime.</p>
<p>You don&#8217;t need to watch a soap opera on TV<br />
when you can watch the one going on in your thinking mind!</p>
<p>2nd &#8211; Practice taking your attention away from thoughts<br />
whenever they&#8217;re unhappy, unproductive or unhelpful.<br />
Which, let&#8217;s face it, can be much of the time.</p>
<p>Simply bring your attention to your breathing, your foot,<br />
or anything else in the present moment.</p>
<p>3rd &#8211; Slow the momentum of the mind by bringing moments of stillness into your life regularly.</p>
<p>The phone rings &mdash; take a conscious breath with an empty mind before answering.<br />
Get in the car &mdash; take a few breaths before starting the engine.<br />
Look at nature, birds, trees, flowers, people, without labeling, judging, or other mental activity.</p>
<p>~pause~</p>
<p>The more often <em>I</em> remember to do these simple things,<br />
the more my sense of self shifts,<br />
from the noise and turmoil of the thinking mind,<br />
to being rooted in the peace beyond it.</p>
<p>So what is the difference between thought, and the quiet awareness in the space between thoughts?<br />
That&#8217;s for you to discover in your own way, if you want to,<br />
<em>but you won&#8217;t find out by thinking about it</em>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">511</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>What&#8217;s actually installed in that perl library?</title>
		<link>https://blog.timbunce.org/2011/11/16/whats-actually-installed-in-that-perl-library/</link>
					<comments>https://blog.timbunce.org/2011/11/16/whats-actually-installed-in-that-perl-library/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Wed, 16 Nov 2011 21:52:01 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[cpan]]></category>
		<category><![CDATA[metacpan]]></category>
		<category><![CDATA[presentation]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=502</guid>

					<description><![CDATA[A key part of my plan for Upgrading from Perl 5.8 is the ability to take a perl library installed for one version of perl, and reinstall it for a different version of perl. To do that you have to know exactly what distributions were installed in the original library. And not just which distributions, &#8230; <a href="https://blog.timbunce.org/2011/11/16/whats-actually-installed-in-that-perl-library/" class="more-link">Continue reading <span class="screen-reader-text">What&#8217;s actually installed in that perl&#160;library?</span></a>]]></description>
										<content:encoded><![CDATA[<p>A key part of my plan for <a href="https://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/">Upgrading from Perl 5.8</a> is the ability to take a perl library installed for one version of perl, and reinstall it for a different version of perl.</p>
<p>To do that you have to know exactly what distributions were installed in the original library.  And not just which distributions, but which versions of those distributions.</p>
<p>I&#8217;ve a solution for that now. It turned out to be rather harder to solve than I&#8217;d thought&#8230; <span id="more-502"></span>As I mentioned <a href="https://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/">previously</a>, I had developed a &#8220;distinctly hackish solution&#8221; that seemed to be working well. Sadly it didn&#8217;t withstand battle testing.</p>
<p>We have a library with almost 5000 modules installed from CPAN over many years. I ran that hackish script and it duly listed the distributions it thought were installed. Using that list I reinstalled them into a new library and ran <code>diff -r</code> to compare the two. That found a bunch of differences that led me into a vortex of hacking and rerunning. Eventually I had to admit that the whole approach wasn&#8217;t robust enough and I started to explore other ideas.</p>
<p>Some searching turned up <a href="http://search.cpan.org/perldoc?BackPAN::Version::Discover">BackPAN::Version::Discover</a> which is meant to &#8220;Figure out exactly which dist versions you have installed&#8221;. Perfect. Sadly it simply didn&#8217;t work well for me. Probably because it&#8217;s using a similarly flawed approach to my own.</p>
<p>I knew brian d foy&#8217;s <a href="http://blogs.perl.org/users/brian_d_foy/2011/03/recreating-a-perl-installation-with-mycpan.html">MyCPAN</a> project was working towards a similar goal. His approach required us to either run a large BackPAN indexing process or paying to license the data to offset his costs for doing so. That didn&#8217;t seem attractive.</p>
<p>I wondered about using <a href="https://github.com/gitpan">GitPAN</a> and the github API to match git blob hashes of local modules with files in the gitpan repos. Sadly GitPAN has fallen out of date and isn&#8217;t being maintained at the moment. With hindsight I&#8217;m thankful of that because it lead me to a better solution.</p>
<h2>MetaCPAN</h2>
<p><a href="http://metacpan.org/about">MetaCPAN</a> is full of awesome. On the surface it looks like another kind of search.cpan.org site. Don&#8217;t be fooled. Underneath is a vast repository of CPAN metadata powered by an <a href="http://www.elasticsearch.org/">ElasticSearch</a> distributed database (based on Lucene). How vast? Every file in every distribution on CPAN (<em>and</em>, critically for me, the BackPAN archive) has been indexed in great detail. Including details like the file size and which spans of lines are code and which are pod.</p>
<p>The cherry on the cake is the <a href="https://github.com/CPAN-API/cpan-api/wiki/Beta-API-docs">RESTful API</a> that provides full access to <a href="http://www.elasticsearch.org/guide/reference/query-dsl/">ElasticSearch query expressions</a>.</p>
<p>The key &#8220;lightbulb over head&#8221; moment came when I realized I could ask MetaCPAN to &#8220;<a href="http://explorer.metacpan.org/?url=%2Ffile&amp;content=%7B%22query%22%3A%7B%22filtered%22%3A%7B%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22filter%22%3A%7B%22and%22%3A%5B%7B%22term%22%3A%7B%22file.module.name%22%3A%22DBI%3A%3AProfile%22%7D%7D%2C%7B%22term%22%3A%7B%22file.module.version%22%3A%222.014123%22%7D%7D%5D%7D%7D%7D%2C%22fields%22%3A%5B%22release%22%5D%7D">find all releases that contain a particular version of a module</a>&#8220;. Bingo!</p>
<h2>The Method</h2>
<p>The next step was how to work out which of those candidates was the one actually installed. The key realization here was that I could use MetaCPAN to get version and file size info for all the modules in each candidate release and see how well they matched what was currently installed.</p>
<p>The whole process falls into several distinct phases&#8230;</p>
<p>The first phase finds the name, version, and file size of all the modules in the library being surveyed. (Taking care to handle an archlib nested within the main lib.)</p>
<p>Then, for every module it asks MetaCPAN for all the distribution releases that included that that module version. For rarely changed modules in frequently released distributions there might be many candidates, so it tries to limit the number of candidates by also matching the file size. This is especially helpful for modules that don&#8217;t have a version number.</p>
<p>Then, for every candidate distribution release, MetaCPAN is queried to get the modules in the release, along with their version numbers and file sizes. These are compared to the data it gathered about the locally installed modules to yield a &#8220;fraction installed&#8221; figure between 0 and 1. The candidates that share the highest fraction installed are returned.</p>
<p>Typically there&#8217;s just one candidate that has fraction installed of 1. A perfect match. Sometimes the fraction is less than 1 for various obscure but valid reasons. Sometimes life isn&#8217;t so simple. There may be multiple candidates that have the same highest fraction installed value. So the next phase attempts to narrow the choice from among the &#8220;best candidates&#8221; for each module. The results are gathered into a two level hash of distributions and candidate releases.</p>
<p>The final phase is the first to work in terms of distributions instead of modules. For each distribution it tries to choose among the candidate releases.</p>
<h2>The Results</h2>
<p>The method seems to work well. It identifies files with local changes. It deals gracefully with &#8216;remnant&#8217; modules that were included in an old release but not in later ones. And it copes with distributions that have been split into separate distributions.</p>
<p>It reports progress and anything unusual to stderr and writes the list of distributions to stdout. You should investigate anything that&#8217;s reported to ensure that the chosen distribution is the right one.</p>
<p>I checked the results by creating a new library (see below) and running <code>diff -r <em>old_lib new_lib</em></code>. I didn&#8217;t see any differences that I couldn&#8217;t account for.</p>
<p>The survey process is not fast. It can take a couple of hours on the first run for a large library. Most of that time is spent making MetaCPAN calls (<em>lots and lots</em> of MetaCPAN calls) so you&#8217;re dependent on network and MetaCPAN performance. Most of the calls are cached in an external file so later runs are much faster.</p>
<h2>Using The Results</h2>
<p>Using a list of distributions to recreate a library isn&#8217;t as straight-forward as it might seem. You can&#8217;t just give the list to <a href="http://search.cpan.org/perldoc?cpanm">cpanm</a> because it would try to install the <em>latest</em> version of any prerequisites. I looked at using &#8211;scandeps or topological sorting to reorder the list to put the prerequisites first. It didn&#8217;t work out. I also looked at using <a href="http://search.cpan.org/perldoc?mcpani">CPAN::Mini::Inject</a> (and <a href="http://search.cpan.org/dist/OrePAN/" target="_blank">OrePAN</a> and <a href="http://search.cpan.org/perldoc?Pinto" target="_blank">Pinto</a>) to create a local MiniCPAN for cpanm to fetch from. They didn&#8217;t work out either, for various reasons.</p>
<p>In the end I added a <code>--makecpan <em>dir</em></code> option so that the surveyor script itself would fetch the distributions and create a MiniCPAN for cpanm to use.</p>
<p>So now a typical initial run looks like this:</p>
<p><code>    dist_surveyor --makecpan my_cpan /some/perl/lib/dir &gt; installed_dists.txt</code></p>
<p>followed by building a new library from the results:</p>
<p><code>    cpanm --mirror file:$PWD/my_cpan --mirror-only -l new_lib &lt; installed_dists.txt</code></p>
<p>If you need to rebuild the library, perhaps due to test failures, then it&#8217;s <em>much</em> faster to use a list of modules to drive cpanm. Fortunately dist_surveyor writes one for you:</p>
<p><code>    cpanm --mirror file:$PWD/my_cpan --mirror-only -l new_lib &lt; my_cpan/dist_surveyor/token_packages.txt</code></p>
<h2>Testing Bonus</h2>
<p>Speaking of test failures, I was surprised to see how often tests failed due to problems with prerequisites even though the distribution and its prerequisites had passed their tests when originally installed. For example, imagine distribution A v1, and its prerequisite B v1 are installed. Later, distribution B gets upgraded to v2 but the tests for distribution A don&#8217;t get rerun.</p>
<p>Reinstalling all the distributions forces all distributions to be tested with the prerequisites that are actually being used.</p>
<h2>Presentation Slides</h2>
<p>I gave a lightning talk on Dist::Surveyor at the <a href="http://conferences.yapceurope.org/lpw2011/">2011 London Perl Workshop</a> (always a great event) and uploaded the <a href="http://www.slideshare.net/Tim.Bunce/perl-distsurveyor-2011">slides</a>.</p>
<h2>Source Code</h2>
<p>The <a href="https://github.com/timbunce/Dist-Surveyor">repository</a> is on github and I&#8217;ve made a <a href="http://search.cpan.org/dist/Dist-Surveyor/">release</a> to CPAN.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2011/11/16/whats-actually-installed-in-that-perl-library/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">502</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Upgrading from Perl 5.8</title>
		<link>https://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/</link>
					<comments>https://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Thu, 21 Jul 2011 14:50:26 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[cpan]]></category>
		<category><![CDATA[testing]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=499</guid>

					<description><![CDATA[Imagine&#8230; You have a production system, with many different kinds of application services running on many servers, all using the perl 5.8.8 supplied by the system. You want to upgrade to use perl 5.14.1 You don&#8217;t want to change the system perl. You&#8217;re using CPAN modules that are slightly out of date but you can&#8217;t &#8230; <a href="https://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/" class="more-link">Continue reading <span class="screen-reader-text">Upgrading from Perl&#160;5.8</span></a>]]></description>
										<content:encoded><![CDATA[<p>Imagine&#8230;</p>
<ol>
<li>You have a production system, with many different kinds of application services running on many servers, all using the perl 5.8.8 supplied by the system.
</li>
<li>You want to upgrade to use perl 5.14.1
</li>
<li>You don&#8217;t want to change the system perl.
</li>
<li>You&#8217;re using CPAN modules that are slightly out of date but you can&#8217;t upgrade them because newer versions have dependencies that require perl 5.10.
</li>
<li>The perl application codebase is large and has poor test coverage.
</li>
<li>You want developers to be able to easily test their code with different versions of perl.
</li>
<li>You don&#8217;t want a risky all-at-once &#8220;<a href="http://en.wikipedia.org/wiki/Big_bang_adoption">big bang</a>&#8221; upgrade. Individual production installations should be able to use different perl versions, even if only for a few days, and to switch back and forth easily.
</li>
<li>You want to simplify future perl upgrades.
</li>
</ol>
<p>I imagine there are lots of people in similar situations.</p>
<p>In this post I want to explore how I&#8217;m tackling a similar problem, both for my own benefit and in the hope it&#8217;ll be useful to others.<span id="more-499"></span></p>
<h2>Incremental Upgrades</h2>
<p>Perl now has an explicit <a href="http://metacpan.org/module/perlpolicy#BACKWARD-COMPATIBILITY-AND-DEPRECATION">deprecation policy</a> that requires a mandatory warning for at least one major perl version before a feature is removed. So a feature that&#8217;s removed in perl 5.14 will generate a mandatory warning, at compile time if possible, in perl 5.12.</p>
<p>This means we should <em>not</em> jump straight from perl 5.8.8 to 5.14.1. It&#8217;s important to test our code with the latest 5.10.x and 5.12.x releases along the way. That way if we do hit a problem it&#8217;ll be easier to determine the cause.</p>
<p>This also fits in with our desire to simplify future upgrades. Effectively we&#8217;re not doing one perl version upgrade but three, although we may only do one or two actual upgrades on production machines.</p>
<h2>Multiple Perls</h2>
<p>We want the developers to be able to able to easily test their code with different versions of perl, so we need to allow multiple versions to be installed at the same time. Fortunately <a href="http://metacpan.org/module/perlbrew">perlbrew</a> makes that easy. </p>
<p>We&#8217;ll probably have the systems team install ready-built and read-only perlbrew perls on all the machines via scp. We&#8217;ll use perlbrew as a way to get a set of perls installed but the actual selecting of a perl via PATH etc. we&#8217;ll handle ourselves.</p>
<h2>Multiple CPAN Install Trees</h2>
<p>Major versions of perl aren&#8217;t binary compatible with each other. This means extension modules, like DBI, which were installed for one major version of perl can&#8217;t be reused with another.</p>
<p>We keep all the code installed from CPAN in a repository, separate from the perl installation. Perl finds them using PERL5LIB env var and installers install there using the PERL_MB_OPT and PERL_MM_OPT env vars to set it as the &#8216;install_base&#8217;.</p>
<p>Since we want developers to switch easily between perl versions, this means we need multiple CPAN installation directories, one per <em>major</em> perl version. We&#8217;ll rebuild and reinstall the extension modules into each immediately after building and installing the corresponding perl version.</p>
<p>If we have to rebuild and reinstall the extension modules then we can easily rebuild and reinstall <em>all</em> our CPAN modules. That way we get to rerun all their test suites against each version of perl plus the specific versions of their prerequisite modules that we&#8217;re using.</p>
<h2>Reinstalling CPAN Distributions</h2>
<p>This is where it gets tricky. </p>
<p>Identifying what CPAN distributions we have installed is fairly easy. You can use tools like CPAN.pm or <a href="https://github.com/bingos/throwaway/blob/master/whatdists.pl">whatdists.pl</a> to generate a list. But there&#8217;s a catch. They&#8217;ll only tell you what <em>current distributions</em> you need to install to get the same set of modules. That&#8217;s not what we need.</p>
<p>We need a list of the <em>specific distribution versions</em> that are currently installed. It turns out that that information isn&#8217;t recorded in the installation and it&#8217;s amazingly difficult to recreate <em>reliably</em>. (The perllocal.pod file ought to have this information but isn&#8217;t updated by the Module::Build installer and doesn&#8217;t record the actual distribution name.)</p>
<p>In an extension of his MyCPAN work, brian d foy is trying to tackle this problem by creating MD5 hashes for the <em>millions</em> of files on BackPAN (the CPAN archive) but there&#8217;s <a href="http://blogs.perl.org/users/brian_d_foy/2011/03/recreating-a-perl-installation-with-mycpan.html">still much hard work ahead</a>.</p>
<p>Why do we need the specific versions, why not simply upgrade everything to the latest version first as a separate project? Two reasons.</p>
<p>First, we&#8217;re caught by the fact that some latest distributions, either directly or indirectly, require a later version of perl. (David Cantrel&#8217;s <a href="http://cpxxxan.barnyard.co.uk/">cpxxxan</a> project offers an interesting approach to this problem. E.g., use <a href="http://cp5.8.8an.barnyard.co.uk/" rel="nofollow">http://cp5.8.8an.barnyard.co.uk/</a> as the CPAN mirror to get a &#8220;latest that works on 5.8.8&#8221; view. [Thanks to ribasushi++ for the reminder.])</p>
<p>Second, having a complete list of <em>exactly</em> what we have installed also gives us easy reproducibility. Future installs will always yield exactly the same set of files, without risk of silent changes due to new releases on CPAN. The cpxxxan indices for older perls are much less likely to change, but still may. Also, if we upgraded everything to the latest using cp5.8.8an we&#8217;d need an extra testing cycle to check for problems with that upgrade before we even start on the perl upgrade.</p>
<p>After contemplating the large, ambitious, and incomplete MyCPAN project, I decided I&#8217;d try a distinctly hackish solution to this problem by extending the whatdists.pl script with a perllocal.pod parser and some heuristics. It <em>seems</em> to have worked out well. I&#8217;m going to check it by installing the distributions into a different directory and diff&#8217;ing that against the original. </p>
<p>If that works out I&#8217;ll release the code and write up a blog post about it.</p>
<h2>Installing Only Specific CPAN Distributions</h2>
<p>Normally when you install a distribution from CPAN you&#8217;re happy for the installer to fetch and install the latest version of any prerequisite modules it might need. In our situation we want to install only a specific version of each.</p>
<p>In theory we could arrange that by ordering the list such that the prerequisite modules are installed first. The <a href="http://metacpan.org/module/CPANDB">CPANDB</a> module combined with a topological sort of the requires, test_requires, and build_requires dependencies via the <a href="http://metacpan.org/module/Graph#Topological-Sort">Graph module</a> should do the trick. [Hat tip to ribasushi++ for the CPANDB suggestion.] But there&#8217;s a simpler approach&#8230;</p>
<p>I&#8217;ll probably simply duck that issue by using <a href="http://metacpan.org/module/CPAN::Mini::Inject">CPAN::Mini::Inject</a> to create a miniature CPAN that contains <em>only</em> the specific versions of the specific distributions we&#8217;re using. Then we can use the <a href="http://metacpan.org/module/cpanm">cpanm</a> &#8211;mirror and &#8211;mirror-only options to install from that mini CPAN.</p>
<h2>Extending Test Coverage</h2>
<p>All the above will give developers the ability to switch perl versions with ease, while keeping exactly the same set of CPAN modules. So now we can turn our attention to testing.</p>
<p>Our test coverage could charitably be described as spotty. Getting it up to a good level across all our code is simply not viable in the short term.</p>
<p>So for now I&#8217;m setting a <em>very</em> low goal: simply get <em>all</em> the perl modules and scripts compiled. You could say I&#8217;m aiming for 100% &#8220;compilation coverage&#8221; :-)</p>
<p>This will get all the developers aware of the basic mechanics of testing, like <a href="http://metacpan.org/module/Test::Most">Test::Most</a> and <a href="http://metacpan.org/module/prove">prove</a> and it gives us a good baseline to increase coverage from. More importantly in the short term it let&#8217;s us detect any compile-time deprecation warnings as we test with perl 5.10 and 5.12.</p>
<p>To ensure 100% (compilation) coverage I&#8217;ll use <a href="http://metacpan.org/module/Devel::Cover">Devel::Cover</a> to do coverage analysis and write a utility, probably using <a href="http://metacpan.org/module/Devel::CoverX::Covered">Devel::Cover::Covered</a>, to find <em>all</em> our perl scripts and modules and check that they have all at least been compiled.</p>
<h2>Summary</h2>
<ul>
<li>Multiple perl versions, via perlbrew.
</li>
<li>Multiple identical CPAN install trees, one per major perl version.
</li>
<li>Proven 100% compilation coverage as a minimum.
</li>
</ul>
<p>So, that&#8217;s the plan.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/feed/</wfw:commentRss>
			<slash:comments>10</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">499</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Building a different kind of extension</title>
		<link>https://blog.timbunce.org/2011/06/29/building-a-different-kind-of-extension/</link>
					<comments>https://blog.timbunce.org/2011/06/29/building-a-different-kind-of-extension/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Wed, 29 Jun 2011 21:44:29 +0000</pubDate>
				<category><![CDATA[local]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=496</guid>

					<description><![CDATA[For the past year I&#8217;ve been rather distracted, with little time to devote to Open Source projects. I&#8217;ve been working on a different kind of project, adding an extension to our home. It&#8217;s been quite a journey. After much planning (the plumbing Statement of Works, for example, covers four pages), and our fair share of &#8230; <a href="https://blog.timbunce.org/2011/06/29/building-a-different-kind-of-extension/" class="more-link">Continue reading <span class="screen-reader-text">Building a different kind of&#160;extension</span></a>]]></description>
										<content:encoded><![CDATA[<p>For the past year I&#8217;ve been rather distracted, with little time to devote to Open Source projects. I&#8217;ve been working on a different kind of project, adding an extension to our home. It&#8217;s been quite a journey.</p>
<p>After much planning (the plumbing Statement of Works, for example, covers four pages), and our fair share of trials and tribulations, the builders broke ground two weeks ago. Now, after days of digging and rock-breaking, the foundations trenches are all dug out and the concrete will be poured tomorrow morning. Finally, we&#8217;ll be &#8220;out of the ground&#8221;.</p>
<p><img style="display:block;margin-left:auto;margin-right:auto;" src="https://blog.timbunce.org/wp-content/uploads/2011/06/img_0404.jpg?w=600&#038;h=450" alt="Digging foundations" border="0" width="600" height="450" /></p>
<p>Naturally I want to be around to handle issues as they arise, so this year I won&#8217;t be going to OSCON or YAPC::EU. If all goes well we should be completed in time for me to attend the <a href="http://lanyrd.com/2011/london-perl-workshop/">London Perl Workshop</a> in November.</p>
<p>Meanwhile I hope to find a little time for catching up on outstanding issues with DBI and NYTProf and perhaps a little more blogging.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2011/06/29/building-a-different-kind-of-extension/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">496</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2011/06/img_0404.jpg" medium="image">
			<media:title type="html">Digging foundations</media:title>
		</media:content>
	</item>
		<item>
		<title>Looking for a Senior Developer job? TigerLead is Hiring again in West LA</title>
		<link>https://blog.timbunce.org/2011/04/14/looking-for-a-senior-developer-job-tigerlead-is-hiring-again-in-west-la/</link>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Thu, 14 Apr 2011 17:50:58 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[jobs]]></category>
		<category><![CDATA[postgresql]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=486</guid>

					<description><![CDATA[The company I work for, TigerLead.com, has another job opening in West LA: As a Senior Developer, you will be playing a central role in the design, development, and delivery of cutting-edge web applications for one of the most heavily-trafficked network of real estate sites on the web. You will work in a small, collaborative &#8230; <a href="https://blog.timbunce.org/2011/04/14/looking-for-a-senior-developer-job-tigerlead-is-hiring-again-in-west-la/" class="more-link">Continue reading <span class="screen-reader-text">Looking for a Senior Developer job? TigerLead is Hiring again in West&#160;LA</span></a>]]></description>
										<content:encoded><![CDATA[<p>The company I work for, <a href="http://www.tigerlead.com/">TigerLead.com</a>, has another job  opening in West LA: </p>
<blockquote><p>As a Senior Developer, you will be playing a central role in the design, development, and delivery of cutting-edge web applications for one of the most heavily-trafficked network of real estate sites on the web. You will work in a small, collaborative environment with other seasoned pros and with the direct support of the company&rsquo;s owners and senior management. Your canvas and raw materials include rich data sets totaling several million property listings replenished daily by hundreds of external data feeds. This valuable data and our powerful end-user tools to access it are deployed across several thousand real estate search sites used by more than a million home-buyer leads and growing by 50K+ users each month. The 1M+ leads using our search tools are in turn tracked and cultivated by the several thousand real estate professionals using our management software. This is an outstanding opportunity to see your creations immediately embraced by a large community of users as you work within a creative and supportive environment that is both professional and non-bureaucratic at the same time, offering the positives of a start-up culture without the drama and instability. </p></blockquote>
<p>If that sounds like interesting work to you then take a look at the <a href="http://www.tigerlead.com/jobs/senior-web-developer.html">full job posting</a>.</p>
<p>TigerLead is a lovely company to work for and this is a great opportunity. Highly recommended.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">486</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>java2perl6api &#8211; Java to Perl 6 API translation &#8211; What, Why, and Whereto</title>
		<link>https://blog.timbunce.org/2010/07/16/java2perl6api-java-to-perl-6-api-tranalation-what-why-and-whereto/</link>
					<comments>https://blog.timbunce.org/2010/07/16/java2perl6api-java-to-perl-6-api-tranalation-what-why-and-whereto/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Fri, 16 Jul 2010 17:12:59 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[dbdi]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[perl6]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=460</guid>

					<description><![CDATA[In this post I&#8217;m going to talk about the java2perl6api project. What its goals are, why I think it&#8217;s important, how it relates to a Perl 6 DBI, what exists now, what&#8217;s needs doing, and how you can help. Firstly I&#8217;d like to point out that, funnily enough, I&#8217;m not very familiar with Java or &#8230; <a href="https://blog.timbunce.org/2010/07/16/java2perl6api-java-to-perl-6-api-tranalation-what-why-and-whereto/" class="more-link">Continue reading <span class="screen-reader-text">java2perl6api &#8211; Java to Perl 6 API translation &#8211; What, Why, and&#160;Whereto</span></a>]]></description>
										<content:encoded><![CDATA[<p>In this post I&#8217;m going to talk about the java2perl6api project. What its goals are, why I think it&#8217;s important, how it relates to a Perl 6 DBI, what exists now, what&#8217;s needs doing, and how you can help.<br />
<span id="more-460"></span></p>
<p>Firstly I&#8217;d like to point out that, funnily enough, I&#8217;m not very familiar with Java or Perl6. It&#8217;s entirely possible that I&#8217;ll make all sorts of errors in the following details. If you spot any do please let me know.</p>
<h2>Background</h2>
<p>The Java language ecosystem is big and mature after years of heavy investment of time and money.</p>
<p>It doesn&#8217;t have a central repository of Open Source modules like CPAN (though <a href="http://en.wikipedia.org/wiki/Apache_Maven">Maven</a> repositories <a href="http://download.java.net/maven/1/">like</a> <a href="http://repo1.maven.org/maven2/">these</a> are similar I guess). It does, however, have a number of mature high quality class libraries, and a very large number of developers familiar with those libraries (more on that below).</p>
<h2>Goals</h2>
<p>The primary goal of the java2perl6api project is to make it easy to create Perl 6 class libraries that <em>mirror</em> Java equivalents. By <em>mirror</em> I mean share the same method names and semantics at a high level (though not at a low-level, more on that below).</p>
<p>Secondary goals are to do that well enough that:</p>
<ul>
<li>the documentation for Java classes can serve as primary the documentation for the corresponding Perl 6 classes. The Perl 6 classes need only document the differences in behavior, which these should be minimal and &#8216;natural&#8217;. The same applies to books describing the Java classes.
</li>
<li>Java developers familiar with the Java classes should feel comfortable working with the corresponding Perl 6 classes.
</li>
<li>and, hopefully, some way can be found to convert test suites for the Java classes into Perl 6 code that&#8217;ll test the corresponding Perl 6 classes. (I appreciate that this is a non-trivial proposition, but there are viable approaches available, like <a href="http://www.xmlvm.org/overview/">xmlvm</a>.) Even if that can&#8217;t be done, extracting and translating tests manually is less work, and more effective, than creating them from scratch for a new API.
</li>
</ul>
<h2>Why?</h2>
<p>Firstly, creating good APIs is hard. Java APIs like <a href="http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/technotes/guides/jdbc/">JDBC 3.0</a> and <a href="http://java.sun.com/developer/technicalArticles/javase/nio/">NIO.2</a> are the result of years of professional effort and demanding commercial experience. Why not build on that experience?</p>
<p>I appreciate that Java APIs are often limited by the constraints of the language, such as the lack of closures, and that Perl 6 can probably express any given set of semantics more effectively than Java. My point here is that some Java APIs embody, however inelegantly, years of hard won experience that we can benefit from. I&#8217;d rather make new mistakes than repeat old ones.</p>
<p>Secondly, there are many more Java developers than Perl developers. Many <em>many</em> more if job vacancies are any indication:</p>
<p><img src="https://i0.wp.com/www.indeed.com/trendgraph/jobgraph.png" alt="job vacancy trends for perl developer and java developer" height="300" width="540" /></p>
<p>I think we&#8217;d be foolish not to try to smooth the path for any Java developers who might be interested in Perl 6. The java2perl6api project is just one small aspect of that.</p>
<p>I really hope someone starts writing a &#8220;Perl 6 for Java Developers&#8221; tutorial. Perl 6 has the potential to become a very popular language<sup><a href="#1">1</a></sup>. Getting just a tiny percentage of Java developers (and Computer Science majors and their teachers) interested in it could be a big help.</p>
<p>Thirdly, any future DBI for Perl 6 and Parrot needs a much better foundation than the very limited and poorly defined one that <a href="http://search.cpan.org/~timb/DBI-1.611/lib/DBI/DBD.pm">underlies the Perl 5 DBI</a>. I plan to adopt the JDBC 3.0 API <em>and test suite</em> for that <em>internal</em> role. (You could call this a &#8220;Test Suite Driven Strategy&#8221;.) I&#8217;ll talk more about that in a future blog post.</p>
<h2>The History java2perl6api</h2>
<p>I&#8217;ve been kicking around various ideas for integrating Java and Perl6/Parrot for years. I think I first decided to use JDBC as the inspiration for the DBI-to-driver API in 2006.</p>
<p>You may remember back in 2004, around the 10th anniversary of the DBI, the <a href="http://www.perlfoundation.org/">Perl Foundation</a> setup a &#8220;DBI Development Fund&#8221; that people could <a href="http://dbi.perl.org/donate/">donate</a> to. I&#8217;ve never drawn any money from that fund. I want to use it to oil other peoples wheels.</p>
<p>In 2007 <a href="http://news.perlfoundation.org/2007/03/best-practical-sponsors-perl-6.html">Best Practical sponsored Perl 6 Microgrants</a> through the Perl Foundation. I asked if I could piggyback my idea for a Java to Perl 6 API translator onto their microgrant management process but using money from the DBI Development Fund. TPF and Best Practical kindly agreed. I posted a description of the task and Phil Crow volunteered and was <a href="http://news.perlfoundation.org/2007/04/phil-crow-to-create-jdbc-api-f.html">awarded the microgrant</a> in April 2007.</p>
<p>At OSCON in July 2007 I gave lightning talk called &#8220;<a href="http://www.slideshare.net/Tim.Bunce/dbi-for-parrot-and-perl-6-lightning-talk-2007">Database interfaces for open source languages suck</a>&#8221; which explained the rationale for using JDBC as a foundation for the DBI-to-driver API and mentioned Phil&#8217;s java2perl6 project.</p>
<p>Development ground to a halt around the end of 2007 for various reasons. It picked up again for a few months after OSCON 2009 (where I gave a short lightning talk asking for help) then stalled again in October. Partly because we seemed to have hit a limitation with Rakudo and partly because I was focussed on Devel::NYTProf <a href="https://blog.timbunce.org/2009/12/24/nytprof-v3-worth-the-wait/">version 3</a> and then <a href="https://blog.timbunce.org/2010/06/09/nytprof-v4-now-with-string-eval-x-ray-vision/">version 4</a>, which took <em>way</em> more time than I expected.</p>
<p>There&#8217;s life in the project again now. We&#8217;ve dodged the earlier problem, put the <a href="http://github.com/timbunce/java2perl6">code on github</a>, brought it into sync with current <a href="http://rakudo.org/">Rakudo</a> Perl 6 syntax, and generally instilled some momentum.</p>
<h2>The Current java2perl6api</h2>
<p>Let&#8217;s take a look at a simple example.</p>
<p>To generate a perl6 file that mirrors the API of the java.sql.Savepoint class you&#8217;d just execute java2perl6api like this:</p>
<pre style="background-color:#ddd;margin:2em;padding:1em;">$ java2perl6api java.sql.Savepoint
loading java.sql.Savepoint
wrote java/sql/Savepoint.pm6 - interface java.sql.Savepoint
checking java/sql/Savepoint.pm6 - interface java.sql.Savepoint
</pre>
<p>That&#8217;s loaded and parsed the description of the java.sql.Savepoint class (from the <a href="http://download.oracle.com/docs/cd/E17476_01/javase/1.5.0/docs/tooldocs/windows/javap.html">javap</a> command), generated a corresponding perl6 module, and run perl6 to validate it.</p>
<p>The generated module (with some whitespace and cruft removed) looks like this:</p>
<pre style="background-color:#ddd;margin:1em;padding:1em;">use v6;
role java::sql::Savepoint {
    method getSavepointId (
    --&gt; Int   #  int
    ) { ... }
    method getSavepointName (
    --&gt; Str   #  java.lang.String
    ) { ... }
};
=begin pod
=head1 Java
  Compiled from "Savepoint.java"
  public interface java.sql.Savepoint{
      public abstract int getSavepointId() throws java.sql.SQLException;
      public abstract java.lang.String getSavepointName() throws java.sql.SQLException;
  }
=end pod
</pre>
<p>The pod section shows the description of the class that javap returned. The java2perl6api utility parsed that <a href="http://download.oracle.com/docs/cd/E17409_01/javase/tutorial/java/concepts/interface.html">Java interface</a> and generated the corresponding <a href="http://perlcabal.org/syn/S14.html#Roles">Perl6 role</a>. The &#8216;java.sql.Savepoint&#8217; has been mapped to &#8216;java::sql::Savepoint&#8217;. The generated methods are stubs using <code>...</code> (the &#8220;yada, yada, yada&#8221; operator). The types int and java.lang.String have been mapped to Int and Str. Because the only types used were built-ins, no type declarations were added.</p>
<p>Currently java2perl6api handles the above plus overloaded methods (which generate <a href="http://perlcabal.org/syn/S12.html#Multisubs_and_Multimethods">multi methods</a>), multiple implements clauses (which generate multiple <a href="http://perlcabal.org/syn/S14.html#Compile-time_Composition">does</a> clauses). There&#8217;s also partial support for class/interface constants (which currently generate exported methods).</p>
<p>The default behavior is to recursively process any Java types referenced by the class which aren&#8217;t mapped to Perl 6 types. So executing <code>java2perl6api java.sql.Connection</code>, for example, will generate 48 Perl 6 modules! (Because <code>java.sql.Connection</code> refers to many types, including <code>java.sql.Array</code> which refers to many types including <code>java.sql.ResultSet</code> which refers to <code>java.net.URL</code> which refers to <code>java.net.Proxy</code> etc. etc.) The <code>--norecurse</code> options disables this behavior.</p>
<p>Normally you&#8217;ll want to use the recursion but instead of letting it drill <em>all</em> the way into the Java types, you would supply your own &#8216;typemap&#8217; specification via an option. That tells java2perl6api which Java types you want to map to which Perl 6 types. So instead of recursing into the <code>java.net.URL</code> type to generate a <code>java/net/URL.pm6</code> file, for example, you can tell java2perl6api to use a specific Perl 6 type. Perhaps just <code>Str</code> for now.</p>
<h2>How this relates to JDBC / DBDI / DBI v2</h2>
<p>I want to start applying java2perl6api to the <a href="http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/technotes/guides/jdbc">JDBC</a> classes now to create a &#8220;Database Driver Interface&#8221; or &#8220;DBDI&#8221; for Perl 6.</p>
<p>Starting with the <a href="http://download-llnw.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/sql/DriverManager.html">DriverManager</a> class and the <a href="http://download-llnw.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/sql/Connection.html">Connection</a>  interface I&#8217;ll use java2perl6api to generate corresponding Perl 6 roles with <em>heavy</em> stubbing out of types. Basically anything I don&#8217;t need to think about right now will be mapped to the <code>Any</code> type.</p>
<p>I&#8217;ll start fleshing out some basic implementation logic for each in a Perl 6 class that <a href="http://perlcabal.org/syn/S14.html#Compile-time_Composition">does</a> the corresponding role. I&#8217;ll probably use PostgreSQL as the first driver and the guts of <a href="http://github.com/mberends/MiniDBI/blob/master/lib/MiniDBD/Pg.pm6">MiniDBD::Pg</a> as inspiration.</p>
<p>The first minor milestones will be creating connections, then execute non-selects, then selects then prepared statements. Somewhere along the way I expect they&#8217;ll be a Perl 6 DBDI driver implemented for the <a href="http://blogs.perl.org/users/martin_berends/2010/06/rakudo-perl-6-gets-into-databases.html">Perl 6 MiniDBI project</a>. The next key step would be to start refactoring the code heavily so anyone wanting to implement a new driver should only have to implement the driver specific parts. (There are some JDBC driver toolkits that can provide useful ideas for that.)</p>
<h2>What needs doing</h2>
<p>There&#8217;s a <a href="http://github.com/timbunce/java2perl6/blob/master/TODO">TODO file in the repository</a> that lists the current items that need working on.</p>
<p>One fairly simple item is to add a <code>--prefix</code> option to specify an extra leading name for the generated role. So <code>java.sql.Savepoint</code> with a prefix of <code>DBDI</code> would generate a <code>DBDI::java::sql::Savepoint</code> role.</p>
<p>Another item, less simple but more important, is to automatically discover the values of constants and embed them into the generated file. Probably the best way to do that is to extend <a href="http://github.com/timbunce/java2perl6/blob/master/lib/Java/Javap/javap.grammar">the parser</a> (which uses <a href="http://search.cpan.org/perldoc?Parse::RecDescent">Parse::RecDescent</a>) to parse the verbose-mode output of javap, which includes those details.</p>
<p>There are <a href="http://github.com/timbunce/java2perl6/blob/master/TODO">plenty of others</a>.</p>
<h2>How you can get involved</h2>
<p>Firstly, come and say &#8220;Hi!&#8221; in the <a href="irc://chat.freenode.net/#dbdi">#dbdi</a> IRC channel on irc.freenode.net.</p>
<p>The code is on <a href="http://github.com/timbunce/java2perl6">github</a>. You can get commit access by asking on the <a href="irc://chat.freenode.net/#perl6">#perl6</a> channel.</p>
<p>There&#8217;s also a mailing list at <a href="mailto:dbdi-dev@perl.org">dbdi-dev@perl.org</a> which you can <a href="mailto:dbdi-dev-subscribe@perl.org">subscribe</a> to.</p>
<p>I look forward to hearing from you!</p>
<hr />
<ol>
<li><a name="1"></a><br />
When I say &#8220;Perl 6 has the potential to become a very popular language&#8221; I do so with typical British <a href="http://en.wikipedia.org/wiki/Understatement">Understatement</a>.
</li>
</ol>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2010/07/16/java2perl6api-java-to-perl-6-api-tranalation-what-why-and-whereto/feed/</wfw:commentRss>
			<slash:comments>10</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">460</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="http://www.indeed.com/trendgraph/jobgraph.png?q=%22perl+developer%22%2C%22java+developer%22" medium="image">
			<media:title type="html">job vacancy trends for perl developer and java developer</media:title>
		</media:content>
	</item>
		<item>
		<title>NYTProf 4.04 &#8211; Came, Saw Ampersand, and Conquered</title>
		<link>https://blog.timbunce.org/2010/07/09/nytprof-4-04-came-saw-ampersand-and-conquered/</link>
					<comments>https://blog.timbunce.org/2010/07/09/nytprof-4-04-came-saw-ampersand-and-conquered/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Fri, 09 Jul 2010 21:06:24 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[nytprof]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=457</guid>

					<description><![CDATA[Please forgive the title! Perl has three regular expression match variables ( $&#38; $&#8216; $&#8217; ) which hold the string that the last regular expression matched, the string before the match, and the string after the match, respectively. As you&#8217;re probably aware, the mere presence of any of these variables, anywhere in the code, even &#8230; <a href="https://blog.timbunce.org/2010/07/09/nytprof-4-04-came-saw-ampersand-and-conquered/" class="more-link">Continue reading <span class="screen-reader-text">NYTProf 4.04 &#8211; Came, Saw Ampersand, and&#160;Conquered</span></a>]]></description>
										<content:encoded><![CDATA[<p><em>Please forgive the title!</em></p>
<p>Perl has three regular expression match variables ( <code>$&amp; $&lsquo; $&rsquo;</code> ) which hold the string that the last regular expression matched, the string before the match, and the string after the match, respectively.</p>
<p>As you&#8217;re probably aware, the mere presence of <em>any</em> of these variables, <em>anywhere</em> in the code, even if never accessed, will slow down <em>all</em> regular expression matches in the <em>entire</em> program. (See the WARNING at the end of the <a href="http://perldoc.perl.org/perlre.html#Capture-buffers">Capture Buffers section of the perlre documentation</a> for more information.)</p>
<p>Clearly this is not good.<br />
<span id="more-457"></span></p>
<p>I&#8217;ve long planned to add detection and reporting of this to <a href="http://search.cpan.org/dist/Devel-NYTProf/">Devel::NYTProf</a>, along with things like method cache invalidation, but it&#8217;s never risen to the top of the list. In fact, now I look, I see it never even got entered into the ever-growing collection of ideas recorded in the <a href="http://cpansearch.perl.org/src/TIMB/Devel-NYTProf-4.04/HACKING">HACKING</a> file.</p>
<p>After the 4.00 release, plus few minor releases, I&#8217;d put NYTProf on hold and was starting to focus on my java2perl6 API translation project (more news on that soon).</p>
<p>Then I saw a recent <a href="http://www.effectiveperlprogramming.com/blog/140">blog post by Josh McAdams</a>, one of the authors of <a href="http://www.amazon.com/exec/obidos/ASIN/0321496949/theeffeperl-20">Effective Perl Programming</a> (along with Joseph N. Hall and brian d foy) about detecting these variables using the <a href="http://search.cpan.org/perldoc?Devel::SawAmpersand">Devel::SawAmpersand</a> and <a href="http://search.cpan.org/perldoc?Devel::FindAmpersand">Devel::FindAmpersand</a> modules. Firstly it reminded me of the issue, and then it struck me that few people would bother using those tools because they simply <em>wouldn&#8217;t know they had the problem</em> in the first place.</p>
<p>Someone with a performance problem is likely to use a profiler like NYTProf to see where time is being spent in their code. That might point out that significant time is being spent in regular expressions, but even then they might not make the leap to consider these special match variables as a possible cause. <em>The profiler should point it out to them!</em></p>
<p>NYTProf version 4.03 didn&#8217;t. Clearly that was not good. So NYTProf version 4.04 now does!</p>
<p>In the list of files on the index page it highlights the file and adds a comment:</p>
<p><img src="https://blog.timbunce.org/wp-content/uploads/2010/07/nytprof-highlighted-file-on-index-page.png?w=815&#038;h=157" alt="highlighted file on index page" border="0" width="815" height="157" /></p>
<p>On the report page for the file itself it adds an unmissable, and hopefully self-explanatory, note to the top of the page:</p>
<p><img src="https://blog.timbunce.org/wp-content/uploads/2010/07/nytprof-note-on-report-page.png?w=670&#038;h=186" alt="note on report page" border="0" width="670" height="186" /></p>
<p>I&#8217;d be very interested to hear from anyone who now discovers these problem variable lurking in their application code or any CPAN modules.</p>
<p>Go take a look!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2010/07/09/nytprof-4-04-came-saw-ampersand-and-conquered/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">457</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2010/07/nytprof-highlighted-file-on-index-page.png" medium="image">
			<media:title type="html">highlighted file on index page</media:title>
		</media:content>

		<media:content url="https://blog.timbunce.org/wp-content/uploads/2010/07/nytprof-note-on-report-page.png" medium="image">
			<media:title type="html">note on report page</media:title>
		</media:content>
	</item>
		<item>
		<title>Reflections on Perl and DBI from an Early Contributor</title>
		<link>https://blog.timbunce.org/2010/07/08/reflections-on-perl-and-dbi-from-an-early-contributor/</link>
					<comments>https://blog.timbunce.org/2010/07/08/reflections-on-perl-and-dbi-from-an-early-contributor/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Thu, 08 Jul 2010 12:48:27 +0000</pubDate>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[cpan]]></category>
		<category><![CDATA[dbi]]></category>
		<category><![CDATA[perl4]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=449</guid>

					<description><![CDATA[The name Buzz Moschetti probably isn&#8217;t familiar to you. Buzz was the author of the Perl 4 database for Interbase known as Interperl. Back in those days Perl 5 was barely a twinkle in Larry&#8217;s eye and database interfaces for Perl 4 required building a custom perl binary. Buzz was one of the four people &#8230; <a href="https://blog.timbunce.org/2010/07/08/reflections-on-perl-and-dbi-from-an-early-contributor/" class="more-link">Continue reading <span class="screen-reader-text">Reflections on Perl and DBI from an Early&#160;Contributor</span></a>]]></description>
										<content:encoded><![CDATA[<p>The name Buzz Moschetti probably isn&#8217;t familiar to you. Buzz was the author of the Perl 4 database for <a href="http://en.wikipedia.org/wiki/InterBase#History">Interbase</a> known as <a href="http://cpan.perl.org/modules/dbperl/perl4/interperl/README">Interperl</a>.</p>
<p>Back in those days Perl 5 was barely a twinkle in Larry&#8217;s eye and <a href="http://cpan.perl.org/modules/dbperl/perl4/">database interfaces for Perl 4</a> required building a custom perl binary.</p>
<p>Buzz was one of the four people to get the email on September 29th 1992 from Ted Lemon that started the <a href="http://cpan.perl.org/modules/dbperl/DBI/perldb-interest/">perldb-interest</a> project which defined a <a href="http://cpan.perl.org/modules/dbperl/DBI/dbispec.v04">specification</a> that ultimately lead to the DBI. (The other people were Kurt Andersen re informix, Kevin Stock re <a href="http://cpan.perl.org/modules/dbperl/perl4/oraperl/">oraperl</a>, and Michael Peppler re <a href="http://cpan.perl.org/modules/dbperl/perl4/sybperl/">sybperl</a>. I joined a few days later.)</p>
<p><strong>Update</strong>: It turns out that it was actually Buzz who sent that original email, Ted just forwarded it on to others, including me. So Buzz can be said to have started the process that led to the DBI!</p>
<p>I hadn&#8217;t heard from Buzz for <em>many</em> years until he sent me an email recently.</p>
<p>This is his story:<span id="more-449"></span></p>
<hr />
<p>Thought I&#8217;d share a quick story with you.</p>
<p>Recently, I was frustrated with a development team&#8217;s efforts in putting together some DB-oriented reconciliations.  The candidate solution was a blend of precompiled SQL in COBOL code, file dumps and ftps, programs to read files, more programs to read other DBs, etc. etc.   Not only was the process orchestration a project in its own right, the end-to-end logic required to accurately perform the reconciliation was distributed across several programs and platforms, diluting the knowledge base.  I knew a perl program using multiple DBD drivers to different DB engines could do it in a much cleaner way, but over the years my job has changed and although I still use perl regularly, I don&#8217;t do much in the way of DBD/DBI.   To make matters worse, one of the targets was mainframe DB2 and very little work had been done here with DBD::DB2.   Also, the Sybperl module continues to be heavily used in addition to DBD::Sybase, so local DBD/DBI expertise in general is thin.  I decided to get it working on my own.</p>
<p>The infrastructure team spun up for me a Linux virtual machine with a modern build environment on it.  This had the latest gcc compilers and a firm-approved build of perl 5.8.5 right out of the box.  It took a few days of low-priority requests to get the appropriate 32bit Linux client-side SDKs for the DB2 and Sybase products but soon enough I had an environment set up with headers and shared libs.  I was ready to build some perl modules, something I haven&#8217;t done in years.</p>
<p>I went to CPAN and downloaded DBD::DB2, untar&#8217;d it, and ran perl Makefile.PL and make.  Everything worked perfectly and the whole exercise took minutes.  &#8216;make test&#8217; sets PERL_DL_NONLAZY and warned of some unused symbols not being found, but that was OK.  The rest of the tests that I expected to work with my level of permissions worked fine. &#8216;make install&#8217; worked perfectly.  Buoyed by this success, I wrote a 4-liner test program just to connect and fetch some data from a table I knew about.  Outside of the test environment, however, the shared libs for DB2 were not found so I cheated and relinked and reinstalled DB2.so with the -Wl,-rpath option to &#8220;cement in&#8221; the location of those libs so I wouldn&#8217;t have to fuss with LD_LIBRARY_PATH.   My test program now worked fine.  Newly comfortable with the process, I downloaded DBD::Sybase and built and installed the module in scarcely more time than it took for the compiler to run.  In my excitement I skipped over the DBD::Sybase 4-liner test program and went straight to a slightly bigger script that used both modules and grabbed data from both DBs.  It quietly and quickly executed.  </p>
<p>Total time from initial download with almost no clues to a running example: about 40 minutes.  Later, for grin&#8217;s sake, I threw in DBD::Oracle for good measure.  That went even faster &#8212; about 5 minutes &#8212; from CPAN download to printing &#8220;Oracle connected!&#8221; because I was more familiar with the connection string syntax that is bespoke for each engine.</p>
<p>As I watched the program run, it made me reflect on how far we&#8217;ve come and how easy yet sophisticated the perl module ecosystem has become. There is no question that this multi-DBD perl program is easier to understand and support than a solution involving a set of disconnected programs, platforms, and files.  But I think it is the organization and design of the resources as a whole &#8212; DBI, DBD, CPAN, MakeMaker, pod, binary and non-binary library locations, etc. &#8212; that makes the whole environment so clear, symmetric, and easy to use with confidence.  I think back to the build environment that I used to create <a href="http://cpan.perl.org/modules/dbperl/perl4/interperl/README">interperl</a>, and the progress that has been made in terms of both breadth of module functionality and depth of framework for module build portability is simply amazing.  Perl has grown far beyond just being another language. It has a value proposition as an able integrator of widely disparate functionality.  </p>
<p>I exited the Perl mainstream some time ago but I am watching from the side and I applaud the work you&#8217;ve done in this space.</p>
<p>Take care.</p>
<hr />
<p>Thanks Buzz!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2010/07/08/reflections-on-perl-and-dbi-from-an-early-contributor/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">449</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Looking for a new job? TigerLead is also Hiring in Ann Arbor MI</title>
		<link>https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-also-hiring-in-ann-arbor-mi/</link>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Fri, 02 Jul 2010 19:41:27 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[jobs]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[ruby]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=443</guid>

					<description><![CDATA[In addition to the job vacancy in West LA, the company I work for, TigerLead.com, has an opening for a &#8220;skilled developer&#8221; in Ann Arbor, Michigan: Our work involves manipulating and warehousing external data feeds and developing web interfaces to create home search tools for prospective buyers and lead management tools for real estate agents. &#8230; <a href="https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-also-hiring-in-ann-arbor-mi/" class="more-link">Continue reading <span class="screen-reader-text">Looking for a new job? TigerLead is also Hiring in Ann Arbor&#160;MI</span></a>]]></description>
										<content:encoded><![CDATA[<p>In addition to the <a href="https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-hiring-in-west-la/">job vacancy in West LA</a>, the company I work for, <a href="http://www.tigerlead.com/">TigerLead.com</a>, has an opening for a &#8220;skilled developer&#8221; in Ann Arbor, Michigan:</p>
<blockquote><p> Our work involves manipulating and warehousing external data feeds and developing web interfaces to create home search tools for prospective buyers and lead management tools for real estate agents. We&#8217;re looking for a skilled coder to join our small team of talented engineers in Ann Arbor. We hope to find an experienced programmer who is a good fit with our team, well-versed in multiple languages, able to learn quickly and work independently. We work in a Linux environment, and tools and languages we use include Perl, Ruby on Rails, PostgreSQL, and GIT. Perl experience is a significant plus, but your current comfort level with any of these specific tools is less important than overall technical aptitude and ability to learn quickly and fit in well with the current team.</p></blockquote>
<p>That&#8217;s a little thin on details partly because the work is varied. If you think you might be interested, take a look at the <a href="http://annarbor.craigslist.org/eng/1804836163.html">full job posting</a>.</p>
<p>TigerLead is a lovely company to work for and this is a great opportunity. Highly recommended.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">443</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
		<item>
		<title>Looking for a new job? TigerLead is Hiring in West LA</title>
		<link>https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-hiring-in-west-la/</link>
					<comments>https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-hiring-in-west-la/#comments</comments>
		
		<dc:creator><![CDATA[TimBunce]]></dc:creator>
		<pubDate>Fri, 02 Jul 2010 17:37:22 +0000</pubDate>
				<category><![CDATA[software]]></category>
		<category><![CDATA[jobs]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[ruby]]></category>
		<guid isPermaLink="false">http://timbunce.wordpress.com/?p=439</guid>

					<description><![CDATA[The company I work for, TigerLead.com, has an opening for a &#8220;skilled coder / database wrangler&#8221;. We&#8217;re looking for a skilled coder / database wrangler to play a key role within our Operations and Engineering teams. The various responsibilities of the job include working with the large databases underlying our real estate search tools, setting &#8230; <a href="https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-hiring-in-west-la/" class="more-link">Continue reading <span class="screen-reader-text">Looking for a new job? TigerLead is Hiring in West&#160;LA</span></a>]]></description>
										<content:encoded><![CDATA[<p>The company I work for, <a href="http://www.tigerlead.com/">TigerLead.com</a>, has an opening for a &#8220;skilled coder / database wrangler&#8221;. </p>
<blockquote><p>We&#8217;re looking for a skilled coder / database wrangler to play a key role within our Operations and Engineering teams. The various responsibilities of the job include working with the large databases underlying our real estate search tools, setting up services for new clients, communicating with clients to evaluate bug reports, troubleshooting technical issues escalated by our client services team, and interfacing with the engineering team on systems maintenance and development. The scope of work that we do involves managing hundreds of external data feeds that feed into in-house databases totaling several million property listings. These listing databases power hundreds of real estate search sites used by more than a million home-buyer leads, who are tracked and cultivated by the thousands of Realtors using our management software. This position is critical to the robustness of these systems.</p></blockquote>
<p>If that sounds like interesting work to you then take a look at the <a href="http://losangeles.craigslist.org/wst/eng/1821042952.html">full job posting</a>.</p>
<p>TigerLead is a lovely company to work for and this is a great opportunity. Highly recommended.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.timbunce.org/2010/07/02/looking-for-a-new-job-tigerlead-is-hiring-in-west-la/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">439</post-id>
		<media:content url="https://0.gravatar.com/avatar/c1f8fff6645793f1615f748a0e33dfd3a4bf238f63095a180d01899515f628c7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">TimBunce</media:title>
		</media:content>
	</item>
	</channel>
</rss>
