<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Lightbox Technologies, Inc.</title>
	
	<link>http://www.lightboxtechnologies.com</link>
	<description>Intelligence From Evidence</description>
	<lastBuildDate>Fri, 18 May 2012 01:48:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/LightboxTechnologies" /><feedburner:info uri="lightboxtechnologies" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Lightgrep Search for EnCase Beta is Ready!</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/yLyaj11Vaws/</link>
		<comments>http://www.lightboxtechnologies.com/2012/04/19/lightgrep-search-for-encase-beta-is-ready/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 04:02:59 +0000</pubDate>
		<dc:creator>Geoff</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[lightgrep]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=503</guid>
		<description><![CDATA[We&#8217;re pleased to announce the first Beta of Lightgrep Search for EnCase! It&#8217;s been a long time coming, but Lightgrep Search for EnCase is finally ready! Emails went out to our Beta subscribers on Sunday. If you received our request for information, please send us your contact info and EnCase dongle ID at <span class="mh-email">i<a href='http://www.google.com/recaptcha/mailhide/d?k=010fHq2cJV4_GhtNnW9S584w==&amp;c=Cy_wzbWZo-7Lvoo5AEg89dKmwd7fqZuR4QybMR1mTaY=' onclick="window.open('http://www.google.com/recaptcha/mailhide/d?k=010fHq2cJV4_GhtNnW9S584w==&amp;c=Cy_wzbWZo-7Lvoo5AEg89dKmwd7fqZuR4QybMR1mTaY=', '', 'toolbar=0,scrollbars=0,location=0,statusbar=0,menubar=0,resizable=0,width=500,height=300'); return false;" title="Reveal this e-mail address">...</a>@lightboxtechnologies.com</span>. Any [...]]]></description>
			<content:encoded><![CDATA[We&#8217;re pleased to announce the first Beta of Lightgrep Search for EnCase! It&#8217;s been a long time coming, but Lightgrep Search for EnCase is finally ready! Emails went out to our Beta subscribers on Sunday. If you received our request for information, please send us your contact info and EnCase dongle ID at <a href="mailto:info@lightboxtechnologies.com">info@lightboxtechnologies.com</a>. Any Beta subscribers who provided us with a dongle ID should have received an email with download links for the installer and license files. If you haven&#8217;t received the email, please check your spam folder and <a href="mailto:info@lightboxtechnologies.com">contact us</a> if you still don&#8217;t have it.
<br /><br />
We&#8217;re very excited about the opportunity to have our new product reviewed by our peers in the digital forensics world, and to receive your feedback. We know your time is valuable, and we really appreciate the effort put in by our Beta testers. We truly believe Lightgrep is the fastest and most robust search tool in forensics, and with your help it can be even better. Please drop us a line if you need any assistance getting the Beta setup and running.
<br /><br /><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=yLyaj11Vaws:HvFGfJCQB7Q:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=yLyaj11Vaws:HvFGfJCQB7Q:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=yLyaj11Vaws:HvFGfJCQB7Q:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=yLyaj11Vaws:HvFGfJCQB7Q:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=yLyaj11Vaws:HvFGfJCQB7Q:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=yLyaj11Vaws:HvFGfJCQB7Q:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=yLyaj11Vaws:HvFGfJCQB7Q:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/yLyaj11Vaws" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2012/04/19/lightgrep-search-for-encase-beta-is-ready/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2012/04/19/lightgrep-search-for-encase-beta-is-ready/</feedburner:origLink></item>
		<item>
		<title>EnScript Tutorial for Parsing Windows System Restore Logs</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/jHs-hOIP3PA/</link>
		<comments>http://www.lightboxtechnologies.com/2012/03/05/enscript-tutorial-for-parsing-windows-system-restore-logs/#comments</comments>
		<pubDate>Mon, 05 Mar 2012 14:36:50 +0000</pubDate>
		<dc:creator>Jon</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[HomePage]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=446</guid>
		<description><![CDATA[Our good friend and former colleague Yogesh Khatri recently posted an EnScript tutorial on his blog, detailing how to parse Windows XP System Restore logs. In addition to the EnScript code, it&#8217;s a good overview of how to parse restore point log information. Steve Bunting also has some excellent information related to System Restore.]]></description>
			<content:encoded><![CDATA[Our good friend and former colleague <a href="http://www.swiftforensics.com/">Yogesh Khatri</a> recently posted an <a title="Enscript Tutorial 1 - Parse XP System Restore Logs" href="http://www.swiftforensics.com/2012/03/enscript-tutorial-1-parse-xp-system.html">EnScript tutorial</a> on his blog, detailing how to parse Windows XP System Restore logs. In addition to the EnScript code, it&#8217;s a good overview of how to parse restore point log information. Steve Bunting also has some <a title="Restore Point Forensics" href="http://www.stevebunting.org/udpd4n6/forensics/restorepoints.htm">excellent information related to System Restore</a>.<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=jHs-hOIP3PA:Fmmpe7qeibE:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=jHs-hOIP3PA:Fmmpe7qeibE:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=jHs-hOIP3PA:Fmmpe7qeibE:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=jHs-hOIP3PA:Fmmpe7qeibE:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=jHs-hOIP3PA:Fmmpe7qeibE:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=jHs-hOIP3PA:Fmmpe7qeibE:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=jHs-hOIP3PA:Fmmpe7qeibE:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/jHs-hOIP3PA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2012/03/05/enscript-tutorial-for-parsing-windows-system-restore-logs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2012/03/05/enscript-tutorial-for-parsing-windows-system-restore-logs/</feedburner:origLink></item>
		<item>
		<title>DoD CyberCime Conference</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/gb4Ksfrax7M/</link>
		<comments>http://www.lightboxtechnologies.com/2012/01/26/dod-cybercime-conference/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 06:28:57 +0000</pubDate>
		<dc:creator>Jon</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[presentations]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=429</guid>
		<description><![CDATA[We presented Wednesday morning at the 2012 DoD CyberCrime Conference in Atlanta, about &#8220;Forensic Clusters: Advanced Processing with Open Source Software.&#8221; This wasn&#8217;t a talk about clustering related items (although we did touch on that briefly), but more about building clusters of servers to scale up to the storage and processing demands of large-scale evidence [...]]]></description>
			<content:encoded><![CDATA[<p>
We presented Wednesday morning at the <a href="http://www.dodcybercrime.com/12CC/index.asp">2012 DoD CyberCrime Conference</a> in Atlanta, about &#8220;<a href="http://www.dodcybercrime.com/12CC/Breakout_Session_Descriptions.asp#F6" title="Session Description">Forensic Clusters: Advanced Processing with Open Source Software</a>.&#8221; This wasn&#8217;t a talk about clustering related items (although we did touch on that briefly), but more about building clusters of servers to scale up to the storage and processing demands of large-scale evidence sets.
</p>

<p>
In a nutshell, we&#8217;ve used <a href="http://hadoop.apache.org/">Apache Hadoop</a> and <a href="http://hbase.apache.org/">Apache HBase</a> as the foundation for a new way of processing evidence files, with some key assists from <a href="http://sleuthkit.org" title="The Sleuthkit">The Sleuthkit</a>, and a <a href="http://tika.apache.org">deep</a> <a href="http://mahout.apache.org">bench</a> <a href="http://opencv.willowgarage.com/wiki/">of</a> <a href="http://numpy.scipy.org/">other</a> <a href="http://ffmpeg.org/">open</a> <a href="http://brandynwhite.com/hadoopy-cython-based-mapreduce-library-for-py">source</a> <a href="http://people.redhat.com/~rjones/febootstrap/">projects</a>. By using Hadoop, we&#8217;re able to stripe evidence data across multiple machines without creating a storage bottleneck, and we&#8217;re able to process the evidence in parallel.
</p>

<p>
The project is still in prototype-phase, and it&#8217;s already proven itself to be a viable approach. While ripping apart an evidence file with 20 machines is a lot of fun, we&#8217;re even more excited about using intensive processing algorithms (like clustering graphics and documents) with all the CPU cycles we can now harness, and about being able to warehouse evidence files over time for comparative analysis.
</p>

<p>
You can <a href="http://www.lightboxtechnologies.com/wp-content/uploads/2012/01/LightboxTechnologies_ForensicClusters_DODCyberCrime2012.pdf">download our slides</a> (PDF) from the presentation, but feel free to give us a ring or drop us a line if you&#8217;d like to know more.
</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=gb4Ksfrax7M:_GJS7D7vhGg:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=gb4Ksfrax7M:_GJS7D7vhGg:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=gb4Ksfrax7M:_GJS7D7vhGg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=gb4Ksfrax7M:_GJS7D7vhGg:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=gb4Ksfrax7M:_GJS7D7vhGg:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=gb4Ksfrax7M:_GJS7D7vhGg:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=gb4Ksfrax7M:_GJS7D7vhGg:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/gb4Ksfrax7M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2012/01/26/dod-cybercime-conference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2012/01/26/dod-cybercime-conference/</feedburner:origLink></item>
		<item>
		<title>Forensic Clusters Presentation at DOD CyberCrime Conference 2012</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/GaFtACO1uds/</link>
		<comments>http://www.lightboxtechnologies.com/2012/01/21/forensic-clusters-presentation-at-dod-cybercrime-conference-2012/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 21:50:21 +0000</pubDate>
		<dc:creator>Geoff</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[presentations]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=404</guid>
		<description><![CDATA[Jon and I will be at the Department of Defense CyberCrime Conference this week presenting Forensic Clusters: Advanced Processing with Open Source Software. The session will be on Wednesday at 11 AM in the Learning Center room. We&#8217;re looking forward to hearing feedback after the session. There&#8217;s a lot of concern in the forensics community [...]]]></description>
			<content:encoded><![CDATA[Jon and I will be at the Department of Defense CyberCrime Conference this week presenting <em>Forensic Clusters: Advanced Processing with Open Source Software</em>. The session will be on <a href="http://www.dodcybercrime.com/12CC/trackagendawednesday.asp">Wednesday</a> at 11 AM in the Learning Center room. We&#8217;re looking forward to hearing feedback after the session.
<br /><br />
There&#8217;s a lot of concern in the forensics community about increasing data volumes and how we can look at less data. We believe there&#8217;s a lot of value in learning from larger data sets, and would love to speak with anyone interested in implementing large scale processing for forensics. Please call or email us if you&#8217;re interested in meeting &#8211; we&#8217;ll be in Atlanta all week.  Here&#8217;s the description of the session:
<br /><br />
<em>The Sleuth Kit (TSK) is a well-known cross-platform library and set of command line tools for computer forensics. We will discuss the development and capabilities of a new open source tool and automated forensic processing system that incorporates TSK, and is built on top of Hadoop from the Apache Software Foundation. Hadoop underpins a reliable and scalable architecture very similar to that used by Google to index and categorize Internet content and is in use at many familiar companies such as Facebook and Yahoo! We have leveraged that capability to create a processing platform that works in conjunction with TSK to provide forensically sound hard drive extraction while automating forensic processes and providing safe data storage. The tool includes advanced features, such as face detection, graphics clustering, video analysis, and document clustering, and also provides an easy to use plugin interface for community members to extend functionality. Speeding up processes is as simple as adding another machine to the cluster, scaling as far and as fast as the user desires, not limited simply to three or four nodes. We will also demonstrate measured performance increases to be gained on different size clusters.</em>
<br /><br />
Be on the lookout for a Lightgrep for EnCase update very soon. While it has taken a little longer than we would have liked, we&#8217;re extremely excited about the capabilities of Lightgrep and hope you will be, too.
<br /><br />
<a href="mailto:info@lightboxtechnologies.com">info@lightboxtechnologies.com</a>
<br />
(646) 719-0317
<br /><br /><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=GaFtACO1uds:Srr7XPh6wFQ:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=GaFtACO1uds:Srr7XPh6wFQ:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=GaFtACO1uds:Srr7XPh6wFQ:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=GaFtACO1uds:Srr7XPh6wFQ:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=GaFtACO1uds:Srr7XPh6wFQ:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=GaFtACO1uds:Srr7XPh6wFQ:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=GaFtACO1uds:Srr7XPh6wFQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/GaFtACO1uds" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2012/01/21/forensic-clusters-presentation-at-dod-cybercrime-conference-2012/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2012/01/21/forensic-clusters-presentation-at-dod-cybercrime-conference-2012/</feedburner:origLink></item>
		<item>
		<title>SANS 360 Lightning Talk: Factory Forensics</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/w8H3n1EbtkM/</link>
		<comments>http://www.lightboxtechnologies.com/2011/12/14/sans-360-lightning-talk-factory-forensics/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 21:55:17 +0000</pubDate>
		<dc:creator>Jon</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[SANS]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=333</guid>
		<description><![CDATA[Last night&#8217;s SANS 360 session was a blast. It was much more intense than a normal lightning/work-in-progress talk, and the speakers were great. Big props to Rob Lee and the SANS crew for organizing it. For those who couldn&#8217;t make it, the rest of this blogpost is a recreation of my talk, &#8220;Factory Forensics.&#8221; Geoff [...]]]></description>
			<content:encoded><![CDATA[<p>
Last night&#8217;s <a href="http://computer-forensics.sans.org/blog/2011/12/13/dfir-sans360-event-is-today">SANS 360 session</a> was a blast. It was much more intense than a normal lightning/work-in-progress talk, and the speakers were great. Big props to Rob Lee and the SANS crew for organizing it.
</p>

<p>
For those who couldn&#8217;t make it, the rest of this blogpost is a recreation of my talk, &#8220;Factory Forensics.&#8221; Geoff and I will be speaking about the technology in a bit more depth at the <a href="http://www.technologyforums.com/12cc/index.asp">2012 DoD Cybercrime Conference</a> in Atlanta, on Wednesday, January 25, at 1100 in the Learning Center room (&#8220;Forensic Clusters&#8221;). We&#8217;re up against <a href="http://jessekornblum.livejournal.com/">Jesse Kornblum</a> in the same time slot, but, hey, he always puts his slides up online, right?
</p>

<h2>Factory Forensics</h2>

<p>
<a href="http://www.flickr.com/photos/caguard/5812323158/" title="Watchful guard by The California National Guard, on Flickr"><img class="alignleft" src="http://farm6.staticflickr.com/5025/5812323158_85a224cfee_m.jpg" width="161" height="240" alt="Watchful guard"></a><a href="http://thedigitalstandard.blogspot.com/">Chris Pogue</a> has talked a lot this past year about his approach to casework, which he calls &#8220;Sniper Forensics.&#8221; I haven&#8217;t had the pleasure of meeting Chris in person, but his blogposts about Sniper Forensics are great (Parts <a href="http://blog.spiderlabs.com/2011/01/spiderlabs-blog-post-sniper-forensics-part-1.html">One</a>, <a href="http://blog.spiderlabs.com/2011/01/sniper-forensics-part-two-target-acquisition-in-part-one-of-the-sniper-forensics-post-we-discussed-the-history-of-forensic.html">Two</a>, <a href="http://blog.spiderlabs.com/2011/02/sniper-forensics-monthly-february-2011-part-iii-target-identification-one-of-the-keys-needed-for-a-sniper-to-successfully.html">Three</a>, <a href="http://blog.spiderlabs.com/2011/03/sniper-forensics-part-4.html">Four</a>, and <a href="http://blog.spiderlabs.com/2011/07/sniper-forensics-part-v-finding-evil-part-ii.html">Five</a>). The basic idea behind Sniper Forensics is that you keep in mind what question you’re trying to answer in your case, and you work backwards from it <em>[Ed.: admittedly a gross simplification, but, hey, it was a 6 minute talk. Read Chris's posts if you haven't already.]</em>. That way, you stay on target and don’t get overwhelmed by the evidence or fall down a rabbit hole.
</p>
<p>
<em>But</em>&#8230;
</p>
<p>
<a href="http://www.wired.com/dangerroom/2008/09/old-russian-tan/"><img class="alignright" title="Quantity has a quality all its own — Joseph Stalin [traditionally]" src="http://www.wired.com/images_blogs/dangerroom/images/2008/09/02/t72b_minsk_parade_may_2005_1_2.jpg" alt="" width="120" height="180" /></a>The problem with snipers is that they can’t deal with every situation. They need backup. We’re dealing with more raw data as input than ever before, and, as the public has grown aware of computer forensics, we&#8217;re now more in demand than ever before. Working smarter (a la Sniper Forensics) and doing less (&#8220;triage&#8221;) will only take you so far: we have to increase our productivity and get more done. The tool to do that is the <a href="http://en.wikipedia.org/wiki/Assembly_line" alt="Wikipedia: the source of all knowledge">assembly line</a>.
</p>

<p>
According to Henry Ford, there are three main principles of the assembly line. Kind of wordy, but they are:
<div style="position:relative; left:10%; width:80%;">
<ol style="list-style-type:decimal;">
<li>Place the tools and the men in the sequence of the operation so that each component part shall travel the least possible distance while in the process of finishing.</li>
<li>Use work slides or some other form of carrier so that when a workman completes his operation, he drops the part always in the same place&#8211;which place must always be the most convenient place to his hand&#8211;and if possible have gravity carry the part to the next workman for his operation.</li>
<li>Use sliding assembling lines by which the parts to be assembled are delivered at convenient distances.</li>
</ol>
</div>
</p>
And here&#8217;s how I interpret them:
<div style="position:relative; left:10%; width:80%;">
<ol style="list-style-type:decimal;">
<li>Make a sequence of operations, in the right order</li>
<li>The output from one stage is the input to the next</li>
<li>The flow between stages should be automatic</li>
</ol>
</div>
</p>

<p>
<a href="http://en.wikipedia.org/wiki/File:Henry_ford_1919.jpg" title="Henry Ford, kind of a weird dude"><img class="alignleft" src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/18/Henry_ford_1919.jpg/188px-Henry_ford_1919.jpg"/></a>Ford really wasn&#8217;t the inventor of the assembly line, but its refinement at Ford Motor Company early in the twentieth century allowed him to cut the price of the <a href="http://www.mtfca.com/HOF/Fame.htm" title="Model T Ford Club of America Speedster and Racer Hall of Fame">Model T</a> by a third, produce over a thousand of &#8216;em a day, and put competitors out of business.
</p>

<p>
So, I went looking around for a way to build an assembly line for forensics, something that would let me process lots of evidence reliably. I looked into how other tech companies were dealing with large data sets, and the most popular solution they use for storing and processing large unstructured data sets is an open source framework, <a href="http://hadoop.apache.org">Apache Hadoop</a>.
</p>

<p>
<a href="http://www.youtube.com/watch?v=qxC4urJOchs" title="Doug Cutting talks about founding Hadoop"><img class="aligntop" src="http://hadoop.apache.org/images/hadoop-logo.jpg"/></a>
Hadoop is software that you can run on a cluster of 1 or 2 U servers, your typical entry-level servers with a few disks and a few cores, nothing fancy. It scales up to thousands of machines storing petabytes of data, but clusters can be built incrementally without much hassle.
</p>

<p>
<a href="http://www.cloudera.com/what-is-hadoop/hadoop-overview/" title="Hadoop Overview, by Cloudera"><img class="alignright" src="http://www.cloudera.com/assets/images/diagram-1.gif" width="224" height="188"/></a>
Hadoop has two main components. The first is <a href="http://hadoop.apache.org/common/docs/current/hdfs_design.html" title="HDFS Architectural Guide">HDFS</a>, a distributed filesystem. This breaks up files into blocks and it stores the blocks on different machines. The blocks are automatically replicated and checksummed, so it’s fault-tolerant. <span style="font-size: smaller"><i>[These images come from <a href="http://www.cloudera.com/what-is-hadoop/hadoop-overview/">Cloudera's Hadoop Overview</a>, which is a great place to start learning about Hadoop.]</i></span>
</p>

<p style="padding-top: 20px;">
<a href="http://www.cloudera.com/what-is-hadoop/hadoop-overview/" title="Hadoop Overview, by Cloudera"><img class="alignright" src="http://www.cloudera.com/assets/images/diagram-2.gif" width="204" height="188"/></a>
The second component is called <a href="http://research.google.com/archive/mapreduce.html" title="Google invented the MapReduce paradigm and described it in this paper">MapReduce</a>, which is a batch processing service. You write your program to a specific API, and then MapReduce sends it to all the nodes in the cluster and starts running it. The nodes only process the blocks of data they have locally, so the disk accesses remain local, and then the system collates all the output automatically. Network traffic is thus conserved, and Hadoop generally does large streaming reads and writes to disk instead of seeking all over the place. The important thing to remember is that it&#8217;s faster to send the program to the data than it is to send the data to the program.
</p>

<p>
<a href="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide12.png"><img src="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide12-300x225.png" alt="" title="Pipeline" width="300" height="225" class="alignright size-medium wp-image-375"/></a>
So, earlier this year we worked on creating a system for doing forensics on top of Hadoop. There are three main steps. First, we have to ingest the data into the system, doing all the filesystem stuff. This is kind of complicated, but using the <a href="http://sleuthkit.org/" title="http://sleuthkit.org">Sleuthkit</a> was a big help. Second, we have all our processing tasks, like text extracting, keyword searching, and some other cool things. Finally, the output from the system is a set of static HTML reports that contain all the results, the idea being that we could produce useful information automatically for use as a starting point without having to learn another GUI.
</p>

<p>
Here are some screenshots of the output reports. The first one gives an overview of the evidence and the search results.
</p>

<a href="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide13.png"><img src="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide13-300x225.png" alt="" title="Device Overview and Search Results" width="300" height="225" class="aligncenter size-large wp-image-376"/></a>

<p>
The second one shows you some results from our image clustering routine, where we arrange similar images into groups.
</p>
<a href="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide14.png"><img src="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide14-300x225.png" alt="" title="Image Clustering" width="300" height="225" class="aligncenter size-large wp-image-377" /></a>

<p>
The third shows you key frames that we’ve extracted from a video file, so you can just look over the gallery view of key frames without watching the whole video.
</p>
<a href="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide15.png"><img src="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide15-300x225.png" alt="" title="Video Keyframe Analysis" width="300" height="225" class="aligncenter size-large wp-image-378" /></a>

<p>
How’d it end up? Well, the good news is that it basically works. We ran it with 5 nodes, 10 nodes, and 20 nodes and all the processing tasks worked. Performance was pretty good and it scaled. The bad thing we found is that we had to learn a lot about Hadoop and use some complicated techniques to get good performance&#8212;you can write pretty simple code with Hadoop but performance won’t be as good as it should be. The ugly part of it was that Hadoop is all Java and, while Java is very fast, resolving the dependencies between all the different open source libraries, working with their build systems, and configuring it all on a cluster was a big pain. We ended up contributing a <a href="https://issues.apache.org/jira/browse/WHIRR-334" title="...which was then cleaned up and improved by other developers. Open source, FTW!">patch</a> to Apache to make this easier.
</p>
<a href="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide17.png"><img src="http://www.lightboxtechnologies.com/wp-content/uploads/2011/12/Slide17.png" alt="" title="The Good, the Bad, and the Ugly" width="300" height="225" class="aligncenter" /></a>

<p>
<a href="http://www.flickr.com/photos/mikebehnken/5343485804/" title="Bull'z Eyez by MikeBehnken, on Flickr"><img src="http://farm6.staticflickr.com/5241/5343485804_dd5ea2a9f4_m.jpg" width="240" height="160" alt="Bull'z Eyez" class="alignleft"></a>
The other big thing we realized after we created the initial prototype was that it’s not really enough to produce static output for you to use. It’s a good start, but if we’re going to incorporate factory processes into forensics, what we need to output are tools for you to use to help you zero in on your cases (like the snipers you are). Initial results always spur further questions, and if you have to go back to the drawing board to answer them, then we haven’t helped you as much as we could. So, we’re working on that.
</p>

<p>
This is the machine shop at the first assembly line at Ford. It looks pretty primitive to us now. But it changed history. Many people think that the golden age of forensics is behind us, what with full disk encryption, cellphones, the cloud, and a host of other challenges. But we think we’re just getting started in forensics, and that we have a very bright future in front of us.
</p>
<a href="http://www.flickr.com/photos/ellenm1/4279879689/" title="Machine shop at Highland Park by ellenm1, on Flickr"><img src="http://farm5.staticflickr.com/4055/4279879689_28ca86607d.jpg" width="500" height="314" alt="Machine shop at Highland Park"></a class="aligncenter">
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=w8H3n1EbtkM:ffaMLSuCFRs:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=w8H3n1EbtkM:ffaMLSuCFRs:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=w8H3n1EbtkM:ffaMLSuCFRs:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=w8H3n1EbtkM:ffaMLSuCFRs:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=w8H3n1EbtkM:ffaMLSuCFRs:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=w8H3n1EbtkM:ffaMLSuCFRs:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=w8H3n1EbtkM:ffaMLSuCFRs:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/w8H3n1EbtkM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2011/12/14/sans-360-lightning-talk-factory-forensics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2011/12/14/sans-360-lightning-talk-factory-forensics/</feedburner:origLink></item>
		<item>
		<title>Beta Demos at CEIC 2011</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/1iCBLNtbG04/</link>
		<comments>http://www.lightboxtechnologies.com/2011/06/14/beta-demos-at-ceic-2011/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 04:25:19 +0000</pubDate>
		<dc:creator>Geoff</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[lightgrep]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=306</guid>
		<description><![CDATA[Jon and I had the opportunity to present at CEIC in Orlando this year. Jon presented a session called Deep Dive Grep, in which he showed a method of representing GREP keywords as simple flowcharts. I haven&#8217;t seen anyone present keywords in this manner before, and I think it makes the topic really easy to [...]]]></description>
			<content:encoded><![CDATA[<p>Jon and I had the opportunity to present at <a href="http://www.ceicconference.com/agenda2011.htm">CEIC</a> in Orlando this year. Jon presented a session called Deep Dive Grep, in which he showed a method of representing GREP keywords as simple flowcharts. I haven&#8217;t seen anyone present keywords in this manner before, and I think it makes the topic really easy to understand. The basic idea is that no matter how complicated your GREP, you can always draw a flow chart that shows the path the search engine will take when searching data.
</p>
<p>I ran a session called Statistical Analysis and Data Sampling for eDiscovery; the presentation can be found <a href="http://www.geoffblack.com/presentations/">here</a>. The idea is that you can use statistical sampling to ease the burden in eDiscovery matters, and to validate keywords on smaller sets of data. I also presented an EnScript which helps practitioners by automatically choosing a random sample from EnCase Logical Evidence Files. I think the presentation was generally well-received and there was a lot of good discussion. My only regret was that there was so much good discussion that we didn&#8217;t have time to get into keyword testing in EnCase eDiscovery, which is really the final piece to complete the sampling. If anyone has questions for either Jon or I, feel free to <a href="http://www.lightboxtechnologies.com/contact/">drop us a line</a>.
</p>
<p>
On another exciting note, we gave some limited early beta demos of Lightgrep for EnCase to friends and colleagues while we were in the Sunshine State. Everyone that took part in the interactive demo let us know that Lightgrep would definitely save them a lot of time not only just by searching more quickly, but also by producing more pertinent results. We met a lot of great people and received some very useful feedback and suggestions.
</p>
<p>
Based on the feedback we received, we&#8217;re making some changes before we release the public beta &#8211; things that we feel will make Lightgrep even easier to use with better performance than our already game changing capabilities. Stay tuned for more updates!
</p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=1iCBLNtbG04:-1OZEZZaKRA:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=1iCBLNtbG04:-1OZEZZaKRA:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=1iCBLNtbG04:-1OZEZZaKRA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=1iCBLNtbG04:-1OZEZZaKRA:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=1iCBLNtbG04:-1OZEZZaKRA:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=1iCBLNtbG04:-1OZEZZaKRA:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=1iCBLNtbG04:-1OZEZZaKRA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/1iCBLNtbG04" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2011/06/14/beta-demos-at-ceic-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2011/06/14/beta-demos-at-ceic-2011/</feedburner:origLink></item>
		<item>
		<title>Lightgrep presentation at 7th IFIP WG 11.9 International Conference on Digital Forensics</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/2z99G4H1648/</link>
		<comments>http://www.lightboxtechnologies.com/2011/02/16/lightgrep-presentation-at-7th-ifip-wg-11-9-international-conference-on-digital-forensics/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 16:19:59 +0000</pubDate>
		<dc:creator>joel</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[HomePage]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=287</guid>
		<description><![CDATA[I (Joel) attended the Seventh Annual IFIP WG 11.9 International Conference on Digital Forensics in sunny Orlando, Florida two weeks ago. While there, I presented an extended version of our NeFX paper on lightgrep, soon to appear in Advances in Digital Forensics VII.  For those interested, you can see my slides and read a preprint [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.lightboxtechnologies.com/wp-content/uploads/2011/02/IMAG00121.jpg"><img class="alignright size-medium wp-image-293" title="IMAG0012" src="http://www.lightboxtechnologies.com/wp-content/uploads/2011/02/IMAG00121-300x179.jpg" alt="Beware the alligators!" width="300" height="179" /></a>I (Joel) attended the <a href="http://www.ifip119.org/Conferences/">Seventh Annual IFIP WG 11.9 International Conference on Digital Forensics</a> in sunny Orlando, Florida two weeks ago. While there, I presented an extended version of our NeFX paper on lightgrep, soon to appear in <em>Advances in Digital Forensics VII</em>.  For those interested, you can see my <a href="http://staff.science.uva.nl/~juckelma/slides/ifip-wg-11.9-2011-slides.pdf">slides</a> and read a <a href="http://staff.science.uva.nl/~juckelma/pubs/StewartUckelmanIFIP2011.pdf">preprint</a> of our paper.</p>

<p>It was great meeting everyone; there seems to be some exciting work happening in digital forensics these days, from abstract things like frameworks to very concrete things like mobile device forensics. I particularly enjoyed talks by <a href="http://people.seas.harvard.edu/~tmoore/">Tyler Moore</a>, detailing his empirical research on phishing and typosquatting, as well as by Heather McCalley on how rootkits surreptitiously call back to their authors.</p>

<p>Special thanks to <a href="http://www.daytonastate.edu/CampusDirectory/empInfo.jsp?id=7823932290415926">Mark Pollitt</a>, one of the local organizers, who took us to a wildlife refuge (with alligators!) as an excursion after the conference ended. We saw lots of alligators, no forensic scientists were eaten, and a good time was had by all.</p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=2z99G4H1648:1IrllIsWvI0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=2z99G4H1648:1IrllIsWvI0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=2z99G4H1648:1IrllIsWvI0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=2z99G4H1648:1IrllIsWvI0:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=2z99G4H1648:1IrllIsWvI0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=2z99G4H1648:1IrllIsWvI0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=2z99G4H1648:1IrllIsWvI0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/2z99G4H1648" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2011/02/16/lightgrep-presentation-at-7th-ifip-wg-11-9-international-conference-on-digital-forensics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2011/02/16/lightgrep-presentation-at-7th-ifip-wg-11-9-international-conference-on-digital-forensics/</feedburner:origLink></item>
		<item>
		<title>Lightgrep Search for EnCase at NYC4Sec</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/KI49TTQ_k7I/</link>
		<comments>http://www.lightboxtechnologies.com/2011/02/09/lightgrep-search-for-encase-at-nyc4sec/#comments</comments>
		<pubDate>Wed, 09 Feb 2011 05:05:18 +0000</pubDate>
		<dc:creator>Geoff</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[lightgrep]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=38</guid>
		<description><![CDATA[Jon gave a presentation on the inner workings of keyword search at the NYC4Sec Meetup on January 19, &#8220;Lightgrep &#8211; Fast Keyword Searching for Forensics.&#8221; There was a good crowd of about 30 with a mix of active industry professionals and grad students from John Jay&#8217;s Forensic Computing program. It&#8217;s nice to see the students [...]]]></description>
			<content:encoded><![CDATA[<p>Jon gave a presentation on the inner workings of keyword search at the <a href="http://www.nyc4sec.info/events/15814644/" target="_blank">NYC4Sec Meetup</a> on January 19, &#8220;Lightgrep &#8211; Fast Keyword Searching for Forensics.&#8221; There was a good crowd of about 30 with a mix of active industry professionals and grad students from <a href="http://www.jjay.cuny.edu/academics/690.php" target="_blank">John Jay&#8217;s Forensic Computing program</a>. It&#8217;s nice to see the students that will be working in our industry come out to the Meetups and show an interest in upcoming computer forensics technologies and techniques.
</p>

<p><a href="http://photos2.meetupstatic.com/photos/event/1/9/4/highres_21000404.jpeg" target="_blank"><img class="alignright" src="http://photos4.meetupstatic.com/photos/event/1/9/4/event_21000404.jpeg" alt="Jon diagramming keyword paths" /></a>Exposing attendees to his awful handwriting on the chalkboard, Jon introduced the basic theory behind how grep works. Each search term is used to construct a data structure, formally known as a <a href="http://en.wikipedia.org/wiki/Finite-state_machine" target="_blank">finite state machine</a> (or, collectively, as <a href="http://en.wikipedia.org/wiki/Automata_theory" target="_blank">finite automata</a>). Those are ten dollar words for something every investigator is familiar with: a flowchart.</p>

<p>
A simple keyword, like <tt>abc</tt>, can be represented with a straightforward flowchart, where there&#8217;s a box for &#8216;a&#8217;, which leads to a box for &#8216;b&#8217;, and finally to a box for &#8216;c&#8217;. When conducting a search, the computer starts at the beginning of the flowchart and follows it based upon the bytes it reads, trying to move one box for each byte read. If it can&#8217;t go forward, then it starts over from the beginning of the flowchart.</p>

<p>What the various grep operators (e.g., *, +, ?, |, {}) do is alter the shape of the flowchart. For example, | lets you move to two different boxes, + lets you loop back to the box you&#8217;re already on, ? lets you skip ahead to some other box, and * both allows you to skip the next box and loop back to the current box. So, while regular expressions may look complicated, they aren&#8217;t magic; doodling out a flowchart for a regular expression is a great way to understand how it works.
</p>

<p>After Jon finished, I showed a demo of Lightgrep for EnCase. We were only at alpha stage at this point, but everything went very smoothly. Not liking to bore people with mindless sales pitch, we got down to the nitty gritty quickly. We&#8217;ve been talking it up quite a bit, so it was gratifying for us to finally show the working product to our peers. We&#8217;re really looking forward to our beta testers putting it through the ringer under some different stress loads in the near future. If you&#8217;re interested, please sign up for our beta mailing list!
</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=KI49TTQ_k7I:x5Byk335oDw:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=KI49TTQ_k7I:x5Byk335oDw:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=KI49TTQ_k7I:x5Byk335oDw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=KI49TTQ_k7I:x5Byk335oDw:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=KI49TTQ_k7I:x5Byk335oDw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=KI49TTQ_k7I:x5Byk335oDw:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=KI49TTQ_k7I:x5Byk335oDw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/KI49TTQ_k7I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2011/02/09/lightgrep-search-for-encase-at-nyc4sec/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2011/02/09/lightgrep-search-for-encase-at-nyc4sec/</feedburner:origLink></item>
		<item>
		<title>Speed leads, quality kills</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/B4N0wXhNRPY/</link>
		<comments>http://www.lightboxtechnologies.com/2010/10/27/speed-leads-quality-kills/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 05:04:20 +0000</pubDate>
		<dc:creator>Jon</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[lightgrep]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=131</guid>
		<description><![CDATA[Speed is a feature. An Information Week article today talked about the success and growth of the FBI&#8217;s RCFL program. In 2009, the RCFLs processed 2.3 petabytes. If you&#8217;re processing petabytes, you need to go big to go fast. Going 17,000 miles per hour isn&#8217;t easy, though. As in rocketry, mistakes in forensics often aren&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><a href="http://www.flickr.com/photos/suzanmarie/5054447093/in/photostream/"><img class="alignright" title="Saturn V" src="http://farm5.static.flickr.com/4091/5054447093_563f4c7090_d.jpg" alt="Saturn V rocket, by Suzan Marie, licensed under the Creative Commons" width="248" height="375" /></a></p>
<p>Speed is a feature. An <a title="FBI Reports Digital Forensics Progress " href="http://www.informationweek.com/news/government/security/showArticle.jhtml?articleID=227900634&amp;subSection=All+Stories" target="_blank">Information Week article</a> today talked about the success and growth of the FBI&#8217;s <a title="Regional Computer Forensics Laboratory" href="http://www.rcfl.gov/" target="_blank">RCFL</a> program. In 2009, the RCFLs processed 2.3 petabytes.</p>
<p>If you&#8217;re processing petabytes, you need to go big to go fast.</p>
<p style="text-align: left;">Going <a title="roughly, orbital velocity" href="http://science.howstuffworks.com/satellite3.htm" target="_blank">17,000 miles per hour</a> isn&#8217;t easy, though. As in rocketry, mistakes in forensics often aren&#8217;t survivable. As evidence sets get bigger, and more demands are placed on investigators, it&#8217;s conceivable that problems can slip by&#8230; until it&#8217;s too late.</p>

<p style="text-align: left;"><a href="http://www.flickr.com/photos/jurvetson/1468055021/in/photostream/"><img class="alignleft" style="margin-left: 20px; margin-right: 20px;" title="Spontaneous Combustion" src="http://farm2.static.flickr.com/1397/1468055021_93696fa86b_d.jpg" alt="Spontaneous Combustion by Steve Jurvetson, licensed under the Creative Commons" width="273" height="375" /></a>The issue I&#8217;m concerned with right now is not about making <a href="http://www.lightboxtechnologies.com/2010/09/25/nefx-2010-presentation-on-lightgrep/" target="_self">lightgrep</a> fast, it&#8217;s about ensuring lightgrep&#8217;s quality. How do you test grep&#8230; for forensics? What if it misses hits and investigators find out? Worse yet: what if it misses hits, in a subtle manner, and no one notices?</p>
<p style="text-align: left;">User-level acceptance testing is a good start, where you run typical keywords on known evidence files and compare the results to other tools. This gives a basic level of assurance, but it hardly rules out the subtle bugs.</p>

<p style="text-align: left;">Better yet, develop <a href="http://www.extremeprogramming.org/rules/unittests.html">automated unit tests</a>. Good software has separate parts, and those parts can be tested in isolation. Lightgrep has unit tests covering its parser, analyzer, search engine, and even some serialization routines. Still, there&#8217;s a reason why people find regular expressions hard to learn; they express a lot of options in a handful of characters. It&#8217;s hard to write unit tests to cover every scenario.</p>

<p style="text-align: left;">But the nice thing about regular expressions is that they&#8217;re <em>regular</em>: they have a structure. This leads us to <a href="http://qtp.blogspot.com/2009/11/exhaustive-testing.html" target="_blank">exhaustive testing</a>: generate all possible inputs and verify they yield the expected output. The problem with exhaustive testing often is that, computationally, we could count all the stars in the sky before the tests completed.</p>

<p style="text-align: left;"><a href="http://www.flickr.com/photos/jurvetson/872758351/"><img class="alignright" title="Rocket's Red Glare" src="http://farm2.static.flickr.com/1063/872758351_2c17b1a2f1_d.jpg" alt="Rocket's Red Glare by Steve Jurvetson, licensed under the Creative Commons" width="252" height="375" /></a>In forensics, we deal with bytes, and bytes have 256 values. If we tried searching  all possible regular expressions with all possible byte streams, we  could maybe deal with 3-4 bytes of data before our CPUs melted. However,  we can take a cue from genetics. DNA has only four types of genes. What  if we didn&#8217;t test every possible byte and, instead, restricted  ourselves to all possible regular expressions, up to a given length,  using a very small set of allowed byte values (known as an &#8220;alphabet&#8221;)?  Then the goal becomes much more possible.</p>

<p style="text-align: left;">Testing all forms of regular expressions on even a small alphabet puts any grep engine through its paces. That&#8217;s what happened to lightgrep this week, and the initial results are about what you&#8217;d expect. There&#8217;s work to do. But with <a title="the Perl Compatible Regular Expressions library" href="http://www.pcre.org/" target="_blank">PCRE</a> as a benchmark, lightgrep will get there.</p>

<p style="text-align: left;">There&#8217;ll be a few more posts concerning the specifics of this testing procedure, getting deep into the weeds. In the meantime, it&#8217;s worth thinking about how problems in forensics can be reduced to make exhaustive testing feasible. And it&#8217;s worth thinking about how forensics tools are tested now, and whether that&#8217;s enough to get us where we need to go.</p>

<p style="text-align: left;"><a href="http://www.flickr.com/photos/gsfc/3549815873/"><img class="aligncenter" title="Not Lost in Space" src="http://farm4.static.flickr.com/3299/3549815873_d1076a42ca_d.jpg" alt="Not Lost in Space, NASA Goddard Photo and Video, licensed under the Creative Commons" width="246" height="375" /></a></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=B4N0wXhNRPY:hGLsamjRQe8:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=B4N0wXhNRPY:hGLsamjRQe8:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=B4N0wXhNRPY:hGLsamjRQe8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=B4N0wXhNRPY:hGLsamjRQe8:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=B4N0wXhNRPY:hGLsamjRQe8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=B4N0wXhNRPY:hGLsamjRQe8:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=B4N0wXhNRPY:hGLsamjRQe8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/B4N0wXhNRPY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2010/10/27/speed-leads-quality-kills/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2010/10/27/speed-leads-quality-kills/</feedburner:origLink></item>
		<item>
		<title>NeFX 2010 Presentation on lightgrep</title>
		<link>http://feedproxy.google.com/~r/LightboxTechnologies/~3/11WxYgh5btQ/</link>
		<comments>http://www.lightboxtechnologies.com/2010/09/25/nefx-2010-presentation-on-lightgrep/#comments</comments>
		<pubDate>Sat, 25 Sep 2010 18:09:39 +0000</pubDate>
		<dc:creator>Jon</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[HomePage]]></category>
		<category><![CDATA[lightgrep]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.lightboxtechnologies.com/?p=67</guid>
		<description><![CDATA[I spent last Monday and Tuesday at NeFX, the ACM&#8217;s Northeast Forensic Exchange, held at Georgetown University. Thanks go to Clay Shields and Yong Guan for organizing it. There were a number of interesting talks, many about network forensics and covert channels that were well over my head. I particularly enjoyed talks by Nicole Beebe [...]]]></description>
			<content:encoded><![CDATA[<p>I spent last Monday and Tuesday at <a href="http://nefx.cs.georgetown.edu/program.html" target="_blank">NeFX</a>, the ACM&#8217;s Northeast Forensic Exchange, held at Georgetown University. Thanks go to <a href="http://www.cs.georgetown.edu/~clay/" target="_blank">Clay Shields</a> and <a href="http://home.engineering.iastate.edu/~guan/" target="_blank">Yong Guan</a> for organizing it. There were a number of interesting talks, many about network forensics and covert channels that were well over my head. I particularly enjoyed talks by <a href="http://faculty.business.utsa.edu/nbeebe/" target="_blank">Nicole Beebe</a> and <a href="http://www.afit.edu/directory/faclook.cfm?id=200" target="_blank">Gilbert Peterson</a>, which were a bit closer to my own interests.</p>

<p>I gave a brief talk on <em>lightgrep</em>, a new digital forensics grep tool I&#8217;ve been working on.</p>

<p>It was surprising to find that there wasn&#8217;t anything more sophisticated  beyond &#8220;strings | grep&#8221; in the open source forensics world. Regular  expression searching is useful not only for finding relevant text, but  also for file signature analysis, artifact extraction, and file carving.  Having a good grep tool that can search binary data for many keywords  at once—like EnCase can—is important.</p>

<p><a href="/wp-content/uploads/2010/09/LightgrepPerformance.012.png"><img class="alignleft size-medium wp-image-78" title="LightgrepPerformance.012" src="/wp-content/uploads/2010/09/LightgrepPerformance.012-300x225.png" alt="" width="300" height="225" /></a></p>

<p>Regular expression search algorithms are well-known, and I thought it would be fun to see whether I could code up something halfway decent. The initial results were shocking, and reinforced in my mind that this is a problem worth pursuing.</p>

<p>Lightgrep still has some kinks that need to be worked out before release. If you&#8217;re willing to push the envelope, <a href="mailto:info@lightboxtechnologies.com">drop me a line</a> and I&#8217;ll contact you about beta testing.</p>

<p><a title="NeFX Lightgrep Slides" href="http://www.lightboxtechnologies.com/wp-content/uploads/2010/09/NeFX-Lightgrep-withNotes-Public.pdf">Lightgrep Slides for NeFX2010</a> (pdf)</p><br /><br /><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=11WxYgh5btQ:kLGG4o_FMn4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=11WxYgh5btQ:kLGG4o_FMn4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=11WxYgh5btQ:kLGG4o_FMn4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=11WxYgh5btQ:kLGG4o_FMn4:I9og5sOYxJI"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=I9og5sOYxJI" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=11WxYgh5btQ:kLGG4o_FMn4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?i=11WxYgh5btQ:kLGG4o_FMn4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/LightboxTechnologies?a=11WxYgh5btQ:kLGG4o_FMn4:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/LightboxTechnologies?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/LightboxTechnologies/~4/11WxYgh5btQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.lightboxtechnologies.com/2010/09/25/nefx-2010-presentation-on-lightgrep/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.lightboxtechnologies.com/2010/09/25/nefx-2010-presentation-on-lightgrep/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 0.774 seconds. --><!-- Cached page generated by WP-Super-Cache on 2012-05-17 23:42:40 -->

