<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Four Years Remaining</title>
	<atom:link href="https://fouryears.eu/feed/" rel="self" type="application/rss+xml" />
	<link>https://fouryears.eu</link>
	<description>Preparing the consequences</description>
	<lastBuildDate>Thu, 04 Jan 2024 23:45:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.9.8</generator>
	<item>
		<title>BreadboardBot</title>
		<link>https://fouryears.eu/2024/01/02/breadboardbot/</link>
		<comments>https://fouryears.eu/2024/01/02/breadboardbot/#respond</comments>
		<pubDate>Tue, 02 Jan 2024 10:15:05 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[How to]]></category>
		<category><![CDATA[Procrastination]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Robotics]]></category>
		<category><![CDATA[Teaching]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2437</guid>
		<description><![CDATA[This article has been cross-posted to Hackster. What is the best way to learn (or teach) introductory robotics? What could be the "hello world" project in this field, that would be simple for a beginner to get started with, complex enough to be exciting and qualify as a "robot", and deep enough to allow further creative [&#8230;]]]></description>
				<content:encoded><![CDATA[<blockquote><p>This article has been cross-posted to <a href="https://www.hackster.io/konstantint/breadboardbot-ec993d">Hackster</a>.</p></blockquote>
<p>What is the best way to learn (or teach) introductory robotics? What could be the "hello world" project in this field, that would be simple for a beginner to get started with, complex enough to be exciting and qualify as a "robot", and deep enough to allow further creative extensions and experiments? Let us try solving this problem logically, step by step!</p>
<h3>First things first: what is a "robot"?</h3>
<p>Axiom number one: for anything to feel like a robot, it must have some moving parts. Ideally, it could have the freedom move around, and probably the easiest way to achieve that would be to equip it with two wheels. Could concepts other than a two-wheeler qualify as the simplest robot? Perhaps, but let us proceed with this one as a reasonably justified assumption.</p>
<h3>The wheels</h3>
<p><img class="alignright wp-image-2441" src="http://fouryears.eu/wp-content/uploads/2024/01/geekservo.jpg" alt="" width="100" height="104" /></p>
<p>To drive the two upcoming wheels we will need two motors along with all that driver circuitry, ugh! But wait, what about those small <a href="https://en.wikipedia.org/wiki/Servo_(radio_control)#Continuous-rotation_servos">continuous rotation servos</a>! They are simple, cheap, and you just need to wire them to a single microcontroller pin. Some of them are even conveniently sold along with with a LEGO-style wheel!</p>
<h3>The microcontroller</h3>
<p>Rule <img class="alignleft wp-image-2448" src="http://fouryears.eu/wp-content/uploads/2024/01/Adafruit_blinka_angles-left.svg_.jpg" alt="" width="100" height="105" /> number two: it is not a real robot, unless it has a programmable brain, like a microcontroller. But which microcontroller is the best for a beginner in year 2024? Some fifteen years ago the answer would certainly have the "Arduino" keyword in it. Nowadays, I'd say, the magic keyword is "<a href="https://circuitpython.org/">CircuitPython</a>". Programming just cannot get any simpler than plugging it into your computer and editing <code>code.py</code> on the newly appeared external drive. No need for "compiling", "flashing" or any of the "install these serial port drivers to fix obscure error messages" user experiences (especially valuable if you need to give a workshop at a school computer class). Moreover, the code to control a wheel looks as straightforward as <code>motor.throttle = 0.5</code>.</p>
<p>OK, <img class="alignright wp-image-2451" src="http://fouryears.eu/wp-content/uploads/2024/01/102010428_Preview-07.jpg" alt="" width="150" height="113" /> CircuitPython it is, but what specific CircuitPython-compatible microcontroller should we choose? We need something cheap and well-supported by the manufacturer and the community. The <a href="https://www.raspberrypi.com/products/raspberry-pi-pico/">Raspberry Pi Pico</a> would be a great match due to its popularity, price and feature set, except it is a bit too bulky (as will become clear below). Check out the <a href="https://www.seeedstudio.com/xiao-series-page">Seeedstudio Xiao Series</a> instead. The Xiao RP2040 is built upon the same chip as the Raspberry Pi Pico, is just as cheap, but has a more compact size. Interestingly, as there is a whole series of pin-compatible boards with the same shape, we can later try other models if we do not like this one. Moreover, we are not even locked into a single vendor, as the <a href="https://www.adafruit.com/search?q=qt+py">QT Py series from Adafruit</a> is also sharing the same form factor. How cool is that? (I secretly hope this particular footprint would get more adoption by other manufacturers).</p>
<h3>The body</h3>
<p>OK, cool. We have our wheels and a microcontroller to steer them. We now need to build the body for our robot and wire things together somehow. What would be the simplest way to achieve <em>that</em>?</p>
<p>Let us start with the wiring, as here we know an obviously correct answer - a <a href="https://en.wikipedia.org/wiki/Breadboard">breadboard</a>. No simpler manual wiring technology has been invented so far, right? Which breadboard model and shape should we pick? This will depend on how we decide to build the frame for our robot, so let us put this question aside for a moment.</p>
<p>How do we build the frame? The options seem infinite: we can design it in a CAD tool and laser cut it from acrylic sheets, print on a 3D printer, use some profile rails, wooden blocks, LEGO bricks,... just ask Google for some inspiration:</p>
<p><img class="aligncenter wp-image-2453" src="http://fouryears.eu/wp-content/uploads/2024/01/FireShot-Capture-030-two-wheel-robot-platform-Google-Search-www.google.com_.jpg" alt="" width="400" height="225" /></p>
<p>Ugh, this starts to feel too complicated and possibly inaccessible to a normal schoolkid.... But wait, what is it there on the back of our breadboard? Is it sticky tape? Could we just... <em>stick...</em> our motors to it directly and use <em>the breadboard itself </em>as the frame?</p>
<p>If we now try out a few breadboard models, we will quickly discover that the widespread, "mini-sized", 170-pin breadboard has the perfect size to host our two servos:</p>
<p><img class="aligncenter wp-image-2456 size-full" src="http://fouryears.eu/wp-content/uploads/2024/01/assembly-servo-glue-2.jpg" alt="" width="400" height="311" /></p>
<p>By an extra lucky coincindence, the breadboard even has screw holes in the right places to properly attach the motors! If we now hot glue the servo connectors to the side as shown above, we can wire them comfortably, just as any other components on the breadboard itself. Finally, there is just enough space in the back to pack all the servo wires and stick a 3xAAA battery box, which happens to be enough to power both our servos and the Xiao RP2040:</p>
<div id="attachment_2458" style="width: 315px" class="wp-caption aligncenter"><img class="wp-image-2458" src="http://fouryears.eu/wp-content/uploads/2024/01/assembly-battery-stick-1.jpg" alt="" width="305" height="400" /><p class="wp-caption-text">This picture shows the FS90R servos. These work just as well as the GeekServo ones shown in in the picture above.</p></div>
<p>... and thus, the BreadboardBot is born! Could a breadboard with two motors and a battery stuck to its back qualify as the simplest, lowest-tech ever robot platform? I think it could. Note that the total cost of this whole build, including the microcontroller is under $20 (under $15 if you order at the right discounts from the right places).</p>
<p>But what can it do?</p>
<h3>Line following</h3>
<p>Rule number 3: it is not a real robot if it is not able to sense the environment and react to it somehow. Hence, we need some sensors, preferably ones that would help us steer the robot (wheels are our only "actuators" so far). What are the simplest steering techniques in introductory robotics? I'd say <em>line following</em> and <em>obstacle avoidance</em>!</p>
<p>Let us do line following first then. Oh, but how do we attach the line tracking sensors, we should have built some complicated frame after all, right? Nope, by a yet another happy design coincidence, we can just plug them right here into our breadboard:</p>
<p><img class="aligncenter wp-image-2461" src="http://fouryears.eu/wp-content/uploads/2024/01/example-linefollower.jpg" alt="" width="358" height="400" /></p>
<p class="hckui__typography__bodyL">Thanks to the magic of CircuitPython, all it takes now to teach our robot to (even if crudely) follow a line is just three lines of code (boilerplate definitions excluded):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python">while True:
  servo_left.throttle = 0.1 * line_left.value
  servo_right.throttle = -0.1 * line_right.value</pre>
<p><iframe style="margin: 0 auto; display: block;" title="YouTube video player" src="https://www.youtube.com/embed/DHixDNT65I0?si=r3HCm8pSzSd9DuwK" width="400" height="225" frameborder="0" allowfullscreen="allowfullscreen"><span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start">﻿</span></iframe></p>
<h3>Obstacle avoidance</h3>
<p>OK, a line follower implemented with a three-line algorithm is cool, can we do anything else? Sure, look how conveniently the ultrasonic distance sensor fits onto our breadboard:</p>
<p><img class="aligncenter wp-image-2466" src="http://fouryears.eu/wp-content/uploads/2024/01/example-sonar.jpg" alt="" width="400" height="329" /></p>
<p>Add a few more lines of uncomplicated Python code and you get a shy robot that can follow a line but will also happily go roaming around your room, steering away from any walls or obstacles:<br />
<iframe style="margin: 0 auto; display: block;" title="YouTube video player" src="https://www.youtube.com/embed/cBlWqL9eYWU?si=iurXeOPBzQZmXa8O" width="400" height="225" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>
<p>Note how you can hear the sonar working in the video (not normally audible).</p>
<h3>... and more</h3>
<p>There is still enough space on the breadboard to add a button and a buzzer:</p>
<p><iframe style="margin: 0 auto; display: block;" title="YouTube video player" src="https://www.youtube.com/embed/FUKQwfPJc14?si=zGZOh1gf3iaF5yWN" width="400" height="225" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>
<p>.. or splash some color with a <a href="https://www.seeedstudio.com/6x10-RGB-MATRIX-for-XIAO-p-5771.html">Xiao RGB matrix</a>:</p>
<p><iframe style="margin: 0 auto; display: block;" width="400" height="225" src="https://www.youtube.com/embed/A6sq72KMCfU?si=cA1cYcfP2DvAod_k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></p>
<p>.. or plug in an OLED screen to give your BreadboardBot a funny animated face:<br />
<img class="aligncenter wp-image-2467" src="http://fouryears.eu/wp-content/uploads/2024/01/example-oled-dht11.jpg" alt="" width="365" height="400" />The one on the image above also had the <a href="https://www.keyestudio.com/free-shipping-keyestudio-dht11-temperature-humidity-moisture-sensor-detection-module-for-arduino-p0374-p0374.html">DHT11</a> sensor plugged into it, so it shows humidity and temperature while it is doing its line following. Why? Because it can!</p>
<p>Tired of programming and just want a remote-controllable toy? Sure, plug in a <a href="https://components101.com/wireless/hc-06-bluetooth-module-pinout-datasheet">HC06</a> bluetooth serial module, program one last time, and go steer the robot yourself with a smartphone:</p>
<p><img class="aligncenter wp-image-2468" src="http://fouryears.eu/wp-content/uploads/2024/01/example-oled-bt.jpg" alt="" width="360" height="554" /></p>
<h3>Other microcontroller boards, BLE, Wifi and FPV</h3>
<p>Xiao RP2040 is great, but it has no built-in wireless connectivity! Plug in its ESP32-based cousin <a href="https://www.seeedstudio.com/XIAO-ESP32S3-p-5627.html">Xiao ESP32S3</a> and you can steer your robot with a Bluetooth gamepad, or via Wifi, by opening a webpage served by the robot itself!<br />
If you have the <a href="https://www.seeedstudio.com/XIAO-ESP32S3-Sense-p-5639.html">Xiao ESP32S3 Sense</a> (the version with a microphone and camera), that webpage can even show you the live first-person view feed of the robot's camera (conveniently positioned to look forward):</p>
<p><img class="aligncenter size-full wp-image-2469" src="http://fouryears.eu/wp-content/uploads/2024/01/example-webfpv.jpg" alt="" width="400" height="351" /></p>
<p><img class="aligncenter wp-image-2470" src="http://fouryears.eu/wp-content/uploads/2024/01/example-webfpv-2.jpg" alt="" width="256" height="400" /></p>
<p>In case you were wondering, the resulting FPV bot is not a very practical solution for spying on your friends, as the microcontroller heats up a lot and consumes the battery way too fast, but it is still a fun toy and an educational project.</p>
<p>Finally, although the Xiao boards fit quite well with all other gadgets on the mini breadboard you are not limited by that choice either and can try plugging in any other appropriately-sized microcontroller board. Here is, for example, a BreadboardBot with an <a href="https://shop.m5stack.com/products/atom-lite-esp32-development-kit">M5Stack Atom Lite</a> running a line-follower program implemented as an <a href="https://github.com/konstantint/BreadboardBot/blob/main/code/esphome/m5atom_line_follower.yaml">ESPHome config</a> (which you can flash over the air!).</p>
<h3><iframe style="margin: 0 auto; display: block;" title="YouTube video player" src="https://www.youtube.com/embed/XpShH5Vs8lY?si=-BP8W1f4Qf9AP6OY" width="400" height="225" frameborder="0" allowfullscreen="allowfullscreen"></iframe><br />
Conclusion</h3>
<p>The examples above are just the tip of the iceberg. Have any other sensors lying around from those "<a href="https://www.amazon.com/Alician-Arduino-Sensors-Modules-MEGA2560/dp/B08BZN4LV7">getting started with Arduino</a>" kits that you did not find sufficiently exciting uses for so far? Maybe plugging them into the BreadboardBot is the chance to shine they were waiting for all this time! Teaching the robot to recognize familiar faces and follow them? Solving a maze? Following voice commands? Self-balancing? Programming coordinated movement of multiple robots? Interfacing the robot with your smart home or one of these fancy AI chatbots? Organizing competitions among multiple BreadboardBots? Vacuuming your desk? The sky (and your free time) is the limit here!</p>
<p>Consider making one and spreading the joy of <em>breadboard robotics</em>!</p>
<p>Detailed build instructions along with wiring diagrams and example code are available on the project's <a href="http://konstantint.github.io/BreadboardBot">Github page</a>. Contributions are warmly welcome!</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2024/01/02/breadboardbot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interning of Small Integers in Python</title>
		<link>https://fouryears.eu/2019/10/21/interning-of-small-integers-in-python/</link>
		<comments>https://fouryears.eu/2019/10/21/interning-of-small-integers-in-python/#respond</comments>
		<pubDate>Sun, 20 Oct 2019 21:00:56 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Computer science]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2406</guid>
		<description><![CDATA[Can't help sharing this lovely example, illustrating the way Python interns small integers, the lower level use of the id function as well as the power of the ctypes module. import ctypes def deref(addr, type_): return ctypes.cast(addr, ctypes.POINTER(type_)) deref(id(29), ctypes.c_int)[4] = 100 &#62; 29 100 &#62; 29*100 10000 &#62; 25+4 100 Note that depending on [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Can't help sharing <a href="https://www.reddit.com/r/Python/comments/2441cv/can_you_change_the_value_of_1/" target="_blank" rel="noopener">this</a> lovely example, illustrating the way Python <a href="https://en.wikipedia.org/wiki/String_interning" target="_blank" rel="noopener">interns</a> small integers, the lower level use of the <code class="EnlighterJSRAW" data-enlighter-language="python">id</code> function as well as the power of the <code class="EnlighterJSRAW" data-enlighter-language="python">ctypes</code> module.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python">import ctypes
def deref(addr, type_):
    return ctypes.cast(addr, ctypes.POINTER(type_))

deref(id(29), ctypes.c_int)[4] = 100</pre>
<pre class="EnlighterJSRAW" data-enlighter-language="python">&gt; 29
100

&gt; 29*100
10000

&gt; 25+4
100</pre>
<p>Note that depending on the version of Python the value of the integer may be stored at either offset [4] or [6].</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2019/10/21/interning-of-small-integers-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Prepare A Good Presentation</title>
		<link>https://fouryears.eu/2019/02/11/how-to-prepare-a-good-presentation/</link>
		<comments>https://fouryears.eu/2019/02/11/how-to-prepare-a-good-presentation/#comments</comments>
		<pubDate>Sun, 10 Feb 2019 22:35:53 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[How to]]></category>
		<category><![CDATA[Teaching]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2361</guid>
		<description><![CDATA[Presenting is hard. Although I have had the opportunity to give hundreds of talks and lectures on various topics and occasions by now, preparing every new presentation still takes me a considerable amount of effort. I have had a fair share of positive feedback, though, and have developed a small set of principles which, I [&#8230;]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright wp-image-2395" src="http://fouryears.eu/wp-content/uploads/2019/02/video-recorder-clipart-video-presentation-816111-3908096-300x162.png" alt="Presentation Clipart" width="240" height="130" srcset="https://fouryears.eu/wp-content/uploads/2019/02/video-recorder-clipart-video-presentation-816111-3908096-300x162.png 300w, https://fouryears.eu/wp-content/uploads/2019/02/video-recorder-clipart-video-presentation-816111-3908096.png 626w" sizes="(max-width: 240px) 100vw, 240px" />Presenting is hard. Although I have had the opportunity to give hundreds of talks and lectures on various topics and occasions by now, preparing every new presentation still takes me a considerable amount of effort. I have had a fair share of positive feedback, though, and have developed a small set of principles which, I believe, are key to preparing (or at least learning to prepare) good presentations. Let me share them with you.</p>
<h3>1. Start off without slides</h3>
<p><strong>Plan your talk as if you had to give it using only a blackboard</strong>. Think about <em>what</em> to say and <em>how</em> to say it so that the listener would be interested to <em>look at you</em> and <em>listen to you</em>. You will then discover moments in your speech, where you need to write or draw something on the blackboard. <em>These will be your slides</em>. Remember, that your slides are only there to <em>enhance</em> and <em>illustrate</em> your presentation. <i>Your speech </i>is its main component.</p>
<p>Of course, preparing slides can be a very convenient way to <em>plan</em> your talk, but try not to fall into the popular trap, where your slides end up <em>being the talk</em>, whilst your own role boils down to basically "playing them back" in front of the audience, so that one may start wondering why are you even there.</p>
<p>When you are done with your slides, a useful practice is to go over them and <em>remove all of the text</em>, besides the parts which you would <em>really have to scribble on the blackboard</em>, if the slides were not available. For any piece of text you currently have on the slide, ask yourself, <em>what is its purpose</em>. Usually there are just two possible answers:</p>
<ol>
<li>It is there to help <em>the listener. </em>In this case, what you usually need instead is an <em>illustration. </em>An actual chart, photo or a schematic cartoon, providing the listener with a useful visual aid. Yes, plain text can also be a visual aid sometimes, but it should be your last resort.</li>
<li>Alternatively, you put it there only to help <em>yourself</em> - as a reminder of what to speak about next or what specific wording to use. Delete such texts completely and put some effort into <i>memorizing </i>the talk instead. You can make a paper cheatsheet to peek at during the trickier spots. In general, however, if you find it hard to navigate or memorize your talk, perhaps you should have <em>organized</em> it better in the first place. Which leads us to the second rule:</li>
</ol>
<h3>2. Structure around questions</h3>
<p><strong>Imagine that you are constrained to explain your point <em>by only asking questions to the audience</em>.</strong> The audience would effectively have to present the topic "by themselves" by answering these questions in order. Surprisingly many seemingly complicated topics may be explained <a href="http://kt.era.ee/eio/pakkimine.pdf" target="_blank" rel="noopener">in that manner</a>. Of course, you do not need to literally perform the talk by only querying the audience (although I would suggest you try this format at least once - it can be rather fun). The idea is that the imaginary order of questions that need to be asked, <em>is the best order of exposition</em> for your topic. It is this precise order, in which most new statements tend to follow logically from the previous ones not just for you, but also for the audience, who will thus find it easier to track your talk.</p>
<h3>3. Presenting is show business</h3>
<p><strong>Remember that every presentation is a show.</strong> Your goal is not to explain something in the <em>most precise manner</em>, but to do it in the most <em>interesting</em> manner. There is a caveat, of course: the notion of "interestingness" depends on your audience. Presentations made for schoolkids, students, professors and grandmothers differ not because they need to be told <em>different things</em>, but because you need to tell them <em>the same things in different ways</em>.<em> </em>Thus<em>, always tailor your talk to the audience.</em></p>
<p>One basic rule which fits most audiences, though, is the following: if you <em>may</em> include interactivity in your talk (i.e. if the format and formalities allow for that), <em>do so</em>. The most basic element of interactivity in a typical presentation would be <em>a question to the audience</em>. There are two main kinds of questions:</p>
<ol>
<li>"Leading questions", where you ask the audience to explain the concepts on their own (see part 2 above).</li>
<li>"Quiz questions", for checking whether the listener is still on board. This can provide you with real-time feedback and let you know when you need to slow down or speed up.</li>
</ol>
<p>Smart combination of these two kinds of interactions would nearly always bump the interestingness of your talk up a notch. It is not enough to know <em>what</em> and <em>when</em> to ask, however. The choice of <em>who to direct your questions to </em>is just as important. Try to avoid the widely spread practice of simply throwing questions into the audience "in general" and giving word to whoever jumps up first. This would often engage just the few most extraverted or excited listeners, leaving the rest feeling a bit like outsiders (although <a href="https://www.mentimeter.com/" target="_blank" rel="noopener">semi-anonymous surveys</a> can be fun sometimes). Unfortunately, communicating with the audience is an art which requires a lot of practice and failed attempts to master. While you are not there yet, let me suggest one simple recipe, which I found to be unexpectedly useful for student audiences - <em>the talking pillow game</em>. Pick an object (not necessarily a pillow) and give it to a random listener. That listener now has to answer your next question and pass the object on to their neighbor.</p>
<p>Of course, asking questions is just the most basic way of introducing useful interaction. Some concepts can be explained in the format of an interactive demo. Try devising one, whenever you can. Keep your listeners awake at all costs.</p>
<h3>4. Experiment a lot</h3>
<p><strong>Remember that every presentation is a chance to try something new.</strong> Feel free to experiment with the various styles and formats every time (this is especially true while you are a student and "failing" a yet-another seminar talk is not really a big deal). Try practicing blackboard-only talks. Do a talk where you only ask questions. Try making a talk where you have no text on the slides at all. Make your talk into an interactive workshop. Experiment with the <a href="http://fouryears.eu/2016/07/27/slide-a-maze/" target="_blank" rel="noopener">slide types and formats</a>. Try sitting/standing/walking during the talk. Dance and sing (if you're up to). Trying is the only way to figure out what works and what does not work for you.  In any case, carefully rehearse all the elements of your talk which do not yet come naturally. For the first 100 times this would mean rehearsing all of the talk.</p>
<h3>5. Film yourself</h3>
<p><strong>Always ask someone to film your presentation.</strong> Watching yourself is the only way to discover that you are holding your hands strangely, not looking into the audience, swinging back and forth, closing your eyes, and so on. Watching yourself is often a painfully embarrassing, but a pretty much mandatory procedure, if you want to get better.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2019/02/11/how-to-prepare-a-good-presentation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Data Science Workflow</title>
		<link>https://fouryears.eu/2018/11/29/the-data-science-workflow/</link>
		<comments>https://fouryears.eu/2018/11/29/the-data-science-workflow/#comments</comments>
		<pubDate>Thu, 29 Nov 2018 19:00:40 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data science]]></category>
		<category><![CDATA[Explanation]]></category>
		<category><![CDATA[How to]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Software devlopment]]></category>
		<category><![CDATA[Survey]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2217</guid>
		<description><![CDATA[By popular suggestion this text was cross-posted to Medium. Suppose you are starting a new data science project (which could either be a short analysis of one dataset, or a complex multi-year collaboration). How should your organize your workflow? Where do you put your data and code? What tools do you use and why? In [&#8230;]]]></description>
				<content:encoded><![CDATA[<blockquote><p>By popular suggestion this text was cross-posted <a href="https://medium.com/story/the-data-science-workflow-43859db0415" target="_blank" rel="noopener">to Medium</a>.</p></blockquote>
<p>Suppose you are starting a new data science project (which could either be a short analysis of one dataset, or a complex multi-year collaboration). How should your organize your workflow? Where do you put your data and code? What tools do you use and why? In general, what should you think about before diving head first into your data? In the software engineering industry such questions have some commonly known answers. Although every software company might have its unique traits and quirks, the core processes in most of them are based on the same established principles, practices and tools. These principles are described in textbooks and taught in universities.</p>
<p>Data science is a less mature industry, and things are different. Although you can find a variety of <a href="https://drivendata.github.io/cookiecutter-data-science/" target="_blank" rel="noopener">template projects</a>, <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424" target="_blank" rel="noopener">articles</a>, <a href="https://towardsdatascience.com/structure-and-automated-workflow-for-a-machine-learning-project-2fa30d661c1e" target="_blank" rel="noopener">blo</a>g<a href="https://medium.com/datadriveninvestor/the-data-science-method-dsm-a-framework-on-how-to-take-your-data-science-projects-to-the-next-91f9fd81e5d1" target="_blank" rel="noopener">po</a>s<a href="https://medium.com/thelaunchpad" target="_blank" rel="noopener">ts</a>, <a href="https://stackoverflow.com/questions/3759723/best-way-to-organize-bioinformatics-projects" target="_blank" rel="noopener">discussions</a>, or <a href="https://docs.google.com/spreadsheets/u/1/d/1ZHLzlMdBEHDorCUEeeiSx3WAgfLGw34pC08C9T0IMkE/edit?usp=sharing" target="_blank" rel="noopener">specialized platforms</a> (open-source [<a href="https://github.com/mitdbg/modeldb" target="_blank" rel="noopener">1</a>,<a href="https://dvc.org/" target="_blank" rel="noopener">2</a>,<a href="https://github.com/datmo/datmo" target="_blank" rel="noopener">3</a>,<a href="https://www.mlflow.org/" target="_blank" rel="noopener">4</a>,<a href="https://github.com/TensorLab/tensorfx" target="_blank" rel="noopener">5</a>,<a href="https://github.com/IDSIA/sacred" target="_blank" rel="noopener">6</a>,<a href="https://github.com/instacart/lore" target="_blank" rel="noopener">7</a>,<a href="https://pythonhosted.org/Sumatra/" target="_blank" rel="noopener">8</a>,<a href="http://cknowledge.org/" target="_blank" rel="noopener">9</a>,<a href="https://kaixhin.github.io/FGLab/" target="_blank" rel="noopener">10</a>], commercial [<a href="https://hyperdash.io/" target="_blank" rel="noopener">11</a>,<a href="https://www.dataiku.com/" target="_blank" rel="noopener">12</a>,<a href="https://quiltdata.com/" target="_blank" rel="noopener">13</a>,<a href="https://neptune.ml/" target="_blank" rel="noopener">14</a>,<a href="http://www.pachyderm.io/" target="_blank" rel="noopener">15</a>,<a href="https://www.floydhub.com/" target="_blank" rel="noopener">16</a>,<a href="https://www.comet.ml/" target="_blank" rel="noopener">17</a>] and in-house [<a href="https://code.fb.com/ml-applications/introducing-fblearner-flow-facebook-s-ai-backbone/" target="_blank" rel="noopener">18</a>,<a href="https://www.tensorflow.org/tfx/" target="_blank" rel="noopener">19</a>,<a href="https://eng.uber.com/michelangelo/" target="_blank" rel="noopener">20</a>]) to help you organize various parts of your workflow, there is no textbook yet to provide universally accepted answers. Every data scientist eventually develops their personal preferences, mostly learned from experience and mistakes. I am no exception. Over time I have developed my understanding of what is a typical "data science project", how it should be structured, what tools to use, and what should be taken into account. I would like to share my vision in this post.</p>
<h3>The workflow</h3>
<p>Although data science projects can range widely in terms of their aims, scale, and technologies used, at a certain level of abstraction most of them could be implemented as the following workflow:</p>
<p><a href="http://fouryears.eu/wp-content/uploads/2018/11/pipeline.png"><img class="aligncenter wp-image-2235" src="http://fouryears.eu/wp-content/uploads/2018/11/pipeline-300x118.png" alt="Data science project workflow" width="500" height="196" srcset="https://fouryears.eu/wp-content/uploads/2018/11/pipeline-300x118.png 300w, https://fouryears.eu/wp-content/uploads/2018/11/pipeline-768x301.png 768w, https://fouryears.eu/wp-content/uploads/2018/11/pipeline-1024x401.png 1024w" sizes="(max-width: 500px) 100vw, 500px" /></a></p>
<p>Colored boxes denote the key processes while icons are the respective inputs and outputs. Depending on the project, the focus may be on one process or another. Some of them may be rather complex while others trivial or missing. For example, scientific data analysis projects would often lack the "Deployment" and "Monitoring" components. Let us now consider each step one by one.</p>
<h3>Source data access</h3>
<p>Whether you are working on the human genome or playing with <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set" target="_blank" rel="noopener"><code>iris.csv</code></a>, you typically have some notion of "raw source data" you start your project with. It may be a directory of <code>*.csv</code> files, a table in an SQL server or a HDFS cluster. The data may be fixed, constantly changing, automatically generated or streamed. It could be stored locally or in the cloud. In any case, your first step is to <em>define access to the source data</em>. Here are some examples of how this may look like:</p>
<ul>
<li>Your source data is provided as a set of <code>*.csv</code> files. You follow the <a href="https://drivendata.github.io/cookiecutter-data-science/" target="_blank" rel="noopener">cookiecutter-data-science</a> approach, make a <code>data/raw</code> subdirectory in your project's root folder, and put all the files there. You create the <code>docs/data.rst</code> file, where you describe the meaning of your source data. (Note: Cookiecutter-DataScience template actually recommends <code>references/</code> as the place for data dictionaries, while I pesonally prefer <code>docs</code>. Not that it matters much).</li>
<li>Your source data is provided as a set of <code>*.csv</code> files. You set up an SQL server, create a schema named <code>raw</code> and import all your CSV files as separate tables. You create the <code>docs/data.rst</code> file, where you describe the meaning of your source data as well as the location and access procedures for the SQL server.</li>
<li>Your source data is a messy collection of genome sequence files, patient records, Excel files and Word documents, which may later grow in unpredicted ways. In addition, you know that you will need to query several external websites to receive extra information. You create an SQL database server in the cloud and import most of the tables from Excel files there. You create the <code>data/raw</code> directory in your project, put all the huge genome sequence files into the <span style="color: #222222; font-family: monospace;"><span style="background-color: #e9ebec;">dna</span></span> subdirectory. Some of the Excel files were too dirty to be imported into a database table, so you store them in <code>data/raw/unprocessed</code> directory along with the Word files. You create an Amazon S3 bucket and push your whole <code>data/raw</code> directory there using <a href="https://dvc.org/" target="_blank" rel="noopener">DVC</a>. You create a Python package for querying the external websites. You create the <code>docs/data.rst</code> file, where you specify the location of the SQL server, the S3 bucket, the external websites, describe how to use DVC to download the data from S3 and the Python package to query the websites. You also describe, to the best of your understanding, the meaning and contents of all the Excel and Word files as well as the procedures to be taken when new data is added.</li>
<li>Your source data consists of constantly updated website logs. You set up the <a href="https://www.elastic.co/elk-stack" target="_blank" rel="noopener">ELK stack</a> and configure the website to stream all the new logs there. You create <code>docs/data.rst</code>, where you describe the contents of the log records as well as the information needed to access and configure the ELK stack.</li>
<li>Your source data consists of 100'000 colored images of size 128x128. You put all the images together into a single tensor of size 100'000 x 128 x 128 x 3 and save it in an HDF5 file <code>images.h5</code>. You create a <a href="https://quiltdata.com/" target="_blank" rel="noopener">Quilt</a> data package and push it to your private Quilt repository. You create the <code>docs/data.rst</code> file, where you describe that in order to use the data it must first be pulled into the workspace via <code>quilt install mypkg/images</code> and then imported in code via <code>from quilt.data.mypkg import images</code>.</li>
<li>Your source data is a simulated dataset. You implement the dataset generation as a Python class and document its use in a <code>README.txt</code> file.</li>
</ul>
<p>In general, remember the following rules of thumb when setting up the source data:</p>
<ol>
<li>Whenever you <em>can</em> meaningfully store your source data in a conveniently queryable/indexable form (an SQL database, the ELK stack, an HDF5 file or a <a href="http://www.rasdaman.org/" target="_blank" rel="noopener">raster database</a>), you <em>should</em> do it. Even if your source data is a single <code>csv</code> and you are reluctant to set up a server, do yourself a favor and import it into an SQLite file, for example. If your data is nice and clean, it can be as simple as:
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="enlighter">import sqlalchemy as sa
import pandas as pd
e = sa.create_engine("sqlite:///raw_data.sqlite")
pd.read_csv("raw_data.csv").to_sql("raw_data", e)</pre>
</li>
<li>If you work in a team, make sure the data is easy to share. Use an NFS partition, an S3 bucket, a Git-LFS repository, a Quilt package, etc.</li>
<li>Make sure your source data is always <em>read-only</em> and you have a <em>backup copy</em>.</li>
<li>Take your time to <em>document</em> the meaning of all of your data as well as its location and access procedures.</li>
<li>In general, take this step very seriously. Any mistake you make here, be it an invalid source file, a misunderstood feature name, or a misconfigured server may waste you a lot of time and effort down the road.</li>
</ol>
<h3>Data processing</h3>
<p>The aim of the <em>data processing </em>step is to turn the source data into a "clean" form, suitable for use in the following <em>modeling</em> stage. This "clean" form is, in most cases, a table of features, hence the gist of "data processing" often boils down to <a href="https://www.logicalclocks.com/feature-store/" target="_blank" rel="noopener">various forms</a> of <em>feature engineering</em>. The core requirements here are to ensure that the feature engineering logic is <em>maintainable</em>, the target datasets are <em>reproducible </em>and, sometimes, that the whole pipeline is <em>traceable</em> to the source representations (otherwise you would not be able to deploy the model). All these requirements can be satisfied, if the processing is organized in an explicitly described <em>computation graph</em>. There are different possibilities for implementing this graph, however. Here are some examples:</p>
<ul>
<li>You follow the <a href="https://drivendata.github.io/cookiecutter-data-science/" target="_blank" rel="noopener">cookiecutter-data-science</a> route and <a href="https://towardsdatascience.com/structure-and-automated-workflow-for-a-machine-learning-project-2fa30d661c1e" target="_blank" rel="noopener">use Makefiles</a> to describe the computation graph. Each step is implemented in a script, which takes some data file as input and produces a new data file as output, which you store in the <code>data/interim</code> or <code>data/processed</code> subdirectories of your project. You enjoy easy parallel computation via <code>make -j &lt;njobs&gt;</code>.</li>
<li>You rely on <a href="https://dvc.org/" target="_blank" rel="noopener">DVC</a> rather than Makefiles to describe and execute the computation graph. The overall procedure is largely similar to the solution above, but you get some extra convenience features, such as easy sharing of the resulting files.</li>
<li>You use <a href="https://github.com/spotify/luigi" target="_blank" rel="noopener">Luigi</a>, <a href="https://airflow.apache.org" target="_blank" rel="noopener">Airflow</a> or <a href="https://github.com/meirwah/awesome-workflow-engines" target="_blank" rel="noopener">any other</a> dedicated <a href="https://en.wikipedia.org/wiki/Workflow_management_system" target="_blank" rel="noopener">workflow management system</a> instead of Makefiles to describe and execute the computation graph. Among other things this would typically let you observe the progress of your computations on a fancy web-based dashboard, integrate with a computing cluster's job queue, or provide some other tool-specific benefits.</li>
<li>All of your source data is stored in an SQL database as a set of tables. You implement all of the feature extraction logic in terms of <a href="https://en.wikipedia.org/wiki/View_(SQL)" target="_blank" rel="noopener">SQL views</a>. In addition, you use SQL views to describe the <i>samples </i>of objects. You can then use these feature- and sample-views to create the final modeling datasets using auto-generated queries like
<pre class="EnlighterJSRAW" data-enlighter-language="sql">select
   s.Key
   v1.AverageTimeSpent,
   v1.NumberOfClicks,
   v2.Country
   v3.Purchase as Target
from vw_TrainSample s
left join vw_BehaviourFeatures v1 on v1.Key = s.Key
left join vw_ProfileFeatures v2 on v2.Key = s.Key
left join vw_TargetFeatures v3 on v3.Key = s.Key</pre>
<p>This particular approach is extremely versatile, so let me expand on it a bit. Firstly, it lets you keep track of all the currently defined features easily without having to store them in huge data tables - the feature definitions are only kept as code until you actually query them. Secondly, the deployment of models to production becomes rather straightforward - assuming the live database uses the same schema, you only need to copy the respective views. Moreover, you may even compile all the feature definitions into a single query along with the final model prediction computation using a sequence of CTE statements:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="sql">with _BehaviourFeatures as (
 ... inline the view definition ...
),
_ProfileFeatures as (
 ... inline the view definition ...
),
_ModelInputs as (
 ... concatenate the feature columns ...
)
select
     Key,
     1/(1.0 + exp(-1.2 + 2.1*Feature1 - 0.2*Feature3)) as Prob
from _ModelInputs</pre>
<p>This technique has been implemented in one in-house data science workbench tool of my design (not publicly available so far, unfortunately) and provides a very streamlined workflow.</p>
<p><div id="attachment_2239" style="width: 460px" class="wp-caption aligncenter"><img class="wp-image-2239" src="http://fouryears.eu/wp-content/uploads/2018/11/Adam.png" alt="Example of an SQL-based feature engineering pipeline" width="450" height="271" srcset="https://fouryears.eu/wp-content/uploads/2018/11/Adam.png 637w, https://fouryears.eu/wp-content/uploads/2018/11/Adam-300x181.png 300w" sizes="(max-width: 450px) 100vw, 450px" /><p class="wp-caption-text">Example of an SQL-based feature engineering pipeline</p></div></li>
</ul>
<p>No matter which way you choose, keep these points in mind:</p>
<ol>
<li>Always organize the processing in the form of a <em>computation graph</em> and keep <em>reproducibility</em> in mind.</li>
<li>This is the place where you have to think about the compute infrastructure you may need. Do you plan to run long computations? Do you need to parallelize or rent a cluster? Would you benefit from a job queue with a management UI for tracking task execution?</li>
<li>If you plan to deploy the models into production later on, make sure your system will support this use case out of the box. For example, if you are developing a model to be included in a Java Android app, yet you prefer to do your data science in Python, one possibility for avoiding a <em>lot</em> of hassle down the road would be to express all of your data processing in a specially designed DSL rather than free-from Python. This DSL may then be translated into Java or an intermediate format like PMML.</li>
<li>Consider storing some metadata about your designed features or interim computations<em>. </em>This does not have to be complicated - you can save each feature column to a separate file, for example, or use Python function annotations to annotate each feature-generating function with a list of its outputs. If your project is long and involves several people designing features, having such a registry may end up quite useful.</li>
</ol>
<h3>Modeling</h3>
<p>Once you have done cleaning your data, selecting appropriate samples and engineering useful features, you enter the realm of <em>modeling</em>. In some projects all of the modeling boils down to a single <code>m.fit(X,y)</code> command or a click of a button. In others it may involve weeks of iterations and experiments. Often you would start with modeling way back in the "feature engineering" stage, when you decide that outputs of one model make for great features themselves, so the actual boundary between this step and the previous one is vague. Both steps should be <em>reproducible </em>and must make part of your <em>computation graph</em>. Both revolve around computing, sometimes involving job queues or clusters. None the less, it still makes sense to consider the modeling step separately, because it tends to have a special need: <em>experiment management</em>. As before, let me explain what I mean by example.</p>
<ul>
<li>You are training models for classifying Irises in the <code>iris.csv</code> dataset. You need to try out ten or so standard <code>sklearn</code> models, applying each with a number of different parameter values and testing different subsets of your handcrafted features. You do not have a proper computation graph or computing infrastructure set up - you just work in a single Jupyter notebook. You make sure, however, that the results of all training runs are saved in separate pickle files, which you can later analyze to select the best model.</li>
<li>You are designing a neural-network-based model for image classification. You use <a href="http://modeldb.csail.mit.edu:3000/projects/" target="_blank" rel="noopener">ModelDB</a> (or an alternative experiment management tool, such as <a href="https://www.tensorflow.org/guide/summaries_and_tensorboard" target="_blank" rel="noopener">TensorBoard</a>, <a href="https://github.com/IDSIA/sacred" target="_blank" rel="noopener">Sacred</a>, <a href="https://kaixhin.github.io/FGLab/" target="_blank" rel="noopener">FGLab</a>, <a href="https://hyperdash.io/" target="_blank" rel="noopener">Hyperdash</a>, <a href="https://www.floydhub.com/" target="_blank" rel="noopener">FloydHub</a>, <a href="https://www.comet.ml/" target="_blank" rel="noopener">Comet.ML</a>, <a href="https://github.com/datmo/datmo" target="_blank" rel="noopener">DatMo</a>, <a href="https://www.mlflow.org/" target="_blank" rel="noopener">MLFlow</a>, ...) to record the learning curves and the results of all the experiments in order to choose the best one later on.</li>
<li>You implement your whole pipeline using Makefiles (or DVC, or a workflow engine). Model training is just one of the steps in the computation graph, which outputs a <code>model-&lt;id&gt;.pkl</code> file, appends the model final AUC score to a CSV file and creates a <code>model-&lt;id&gt;.html</code> report, with a bunch of useful model performance charts for later evaluation.</li>
<li>This is how experiment management / model versioning looks in the UI of the custom workbench mentioned above:
<p><div id="attachment_2244" style="width: 460px" class="wp-caption aligncenter"><img class="wp-image-2244" src="http://fouryears.eu/wp-content/uploads/2018/11/Adam2.png" alt="Experiment management" width="450" height="223" srcset="https://fouryears.eu/wp-content/uploads/2018/11/Adam2.png 694w, https://fouryears.eu/wp-content/uploads/2018/11/Adam2-300x149.png 300w" sizes="(max-width: 450px) 100vw, 450px" /><p class="wp-caption-text">Experiment management</p></div></li>
</ul>
<p>The takeaway message: decide on how you plan to manage fitting multiple models with different hyperparameters and then selecting the best result. You do not have to rely on complex tools - sometimes even a manually updated Excel sheet works well, when used consistently. If you plan lengthy neural network trainings, however, do consider using a web-based dashboard. All the cool kids do it.</p>
<h3>Model deployment</h3>
<p>Unless your project is purely exploratory, chances are you will need to deploy your final model to production. Depending on the circumstances this can turn out to be a rather painful step, but careful planning will alleviate the pain. Here are some examples:</p>
<ul>
<li>Your modeling pipeline spits out a pickle file with the trained model. All of your data access and feature engineering code was implemented as a set of Python functions. You need to deploy your model into a Python application. You create a Python package which includes the necessary function and the model pickle file as a <a href="https://docs.python.org/3/library/importlib.html#module-importlib.resources" target="_blank" rel="noopener">file resource</a> inside. You remember to test your code. The deployment procedure is a simple package installation followed by a run of integration tests.</li>
<li>Your pipeline spits out a pickle file with the trained model. To deploy the model you <a href="https://towardsdatascience.com/a-flask-api-for-serving-scikit-learn-models-c8bcdaa41daa" target="_blank" rel="noopener">create a REST service using Flask</a>, package it as a docker container and serve via your company's Kubernetes cloud. Alternatively, you <a href="https://datascience.com.co/creating-an-api-using-scikit-learn-aws-lambda-s3-and-amazon-api-gateway-d9d10317e38d" target="_blank" rel="noopener">upload the saved model to an S3 bucket and serve it via Amazon Lambda</a>. You make sure your deployment is tested.</li>
<li>Your training pipeline produces a TensorFlow model. You use <a href="https://medium.com/zendesk-engineering/how-zendesk-serves-tensorflow-models-in-production-751ee22f0f4b" target="_blank" rel="noopener">Tensorflow Serving</a> (or <a href="https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58" target="_blank" rel="noopener">any of the alternatives</a>) to serve it as a REST service. You do not forget to create tests and run them every time you update the model.</li>
<li>Your pipeline <a href="https://stackoverflow.com/questions/33221331/export-python-scikit-learn-models-into-pmml" target="_blank" rel="noopener">produces</a> <a href="https://cran.r-project.org/web/packages/pmml/index.html" target="_blank" rel="noopener">a</a> <a href="https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language" target="_blank" rel="noopener">PMML</a> file. Your Java application can read it using the <a href="https://github.com/jpmml" target="_blank" rel="noopener">JPMML</a> library. You make sure that your PMML exporter includes model validation tests in the PMML file.</li>
<li>Your pipeline saves the model and the description of the preprocessing steps in a custom JSON format. To deploy it into your C# application you have developed a dedicated component which knows how to load and execute these JSON-encoded models. You make sure you have 100% test coverage of your model export code in Python, the model import code in C# and predictions of each new model you deploy.</li>
<li>Your pipeline compiles the model into an SQL query using <a href="https://github.com/konstantint/SKompiler" target="_blank" rel="noopener">SKompiler</a>. You hard-code this query into your application. You remember about testing.</li>
<li>You train your models via a paid service, which also offers a way to publish them as REST (e.g. <a href="https://studio.azureml.net/" target="_blank" rel="noopener">Azure ML Studio</a>, <a href="http://docs.yhat.com/" target="_blank" rel="noopener">YHat ScienceOps</a>). You pay a lot of money, but you still test the deployment.</li>
</ul>
<p>Summarizing this:</p>
<ol>
<li>There are many ways how a model can be deployed. Make sure you understand your circumstances and plan ahead. Will you need to deploy the model into a codebase written in a different language than the one you use to train it? If you decide to serve it via REST, what load does the service expect, should it be capable of predicting in batches? If you plan to buy a service, estimate how much it will cost you. If you decide to use PMML, make sure it can support your expected preprocessing logic and that fancy Random Forest implementation you plan to use. If you used third party data sources during training, think whether you will need to integrate with them in production and how will you encode this access information in the model exported from your pipeline.</li>
<li>As soon as you deploy your model to production, it turns from an artefact of data science to <em>actual code</em>, and should therefore be subject to all the requirements of application code. This means <em>testing</em>. Ideally, your deployment pipeline should produce both the model package for deployment as well as everything needed to test this model (e.g. sample data). It is not uncommon for the model to stop working correctly after being transferred from its birthplace to a production environment. It may be be a bug in the export code, a mismatch in the version of <code>pickle</code>, a wrong input conversion in the REST call. Unless you explicitly test the predictions of the deployed model for correctness, you risk running an invalid model without even knowing it. Everything would look fine, as it will keep predicting some values, just the wrong ones.</li>
</ol>
<h3>Model monitoring</h3>
<p>Your data science project does not end when you deploy the model to production. The heat is still on. Maybe the distribution of inputs in your training set differs from the real life. Maybe this distribution drifts slowly and the model needs to be retrained or recalibrated. Maybe the system does not work as you expected it to. Maybe you are into A-B testing. In any case you should set up the infrastructure to continuously collect data about model performance and monitor it. This typically means setting up a visualization dashboard, hence the primary example would be the following:</p>
<ul>
<li>For every request to your model you save the inputs and the model's outputs to <a href="https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html" target="_blank" rel="noopener">logstash</a> or a database table (making sure you stay <a href="https://en.wikipedia.org/wiki/General_Data_Protection_Regulation" target="_blank" rel="noopener">GDPR</a>-compliant somehow). You set up <a href="https://www.metabase.com/" target="_blank" rel="noopener">Metabase</a> (or <a href="https://www.tableau.com/" target="_blank" rel="noopener">Tableau</a>, <a href="https://mydbr.com/" target="_blank" rel="noopener">MyDBR</a>, <a href="https://grafana.com/" target="_blank" rel="noopener">Grafana</a>, <a href="https://www.quora.com/Is-there-an-inexpensive-alternative-to-Tableau" target="_blank" rel="noopener">etc</a>) and create reports which visualize the performance and calibration metrics of your model.</li>
</ul>
<h3>Exploration and reporting</h3>
<p>Throughout the life of the data science project you will constantly have to sidestep from the main modeling pipeline in order to explore the data, try out various hypotheses, produce charts or reports. These tasks differ from the main pipeline in two important aspects.</p>
<p>Firstly, most of them do not <em>have</em> to be reproducible. That is, you do not absolutely need to include them in the computation graph as you would with your data preprocessing and model fitting logic. You should always <em>try</em> to stick to reproducibility, of course - it is great when you have all the code to regenerate a given report from raw data, but there would still be many cases where this hassle is unnecessary. Sometimes making some plots manually in Jupyter and pasting them into a Powerpoint presentation serves the purpose just fine, no need to overengineer.</p>
<p>The second, <em>actually</em> problematic particularity of these "Exploration" tasks is that they tend to be somewhat disorganized and unpredictable. One day you might need to analyze a curious outlier in the performance monitoring logs. Next day you want to test a new algorithm, etc. If you do not decide on a suitable folder structure, soon your project directory will be filled with notebooks with weird names, and no one in the team would understand what is what. Over the years I have only found one more or less working solution to this problem: <i>ordering subprojects by date</i>. Namely:</p>
<ul>
<li>You create a <code>projects</code> directory in your project folder. You agree that each "exploratory" project must create a folder named <code>projects/YYYY-MM-DD - Subproject title</code>, where <code>YYYY-MM-DD</code> is the date when the subproject was initiated. After a year of work your <code>projects</code> folder looks as follows:
<pre class="EnlighterJSRAW" data-enlighter-language="no-highlight">./2017-01-19 - Training prototype/
                (README, unsorted files)
./2017-01-25 - Planning slides/
                (README, slides, images, notebook)
./2017-02-03 - LTV estimates/
                 README
                 tasks/
                   (another set of 
                    date-ordered subfolders)
./2017-02-10 - Cleanup script/
                 README
                 script.py
./... 50 folders more ...
</pre>
<p>Note that you are free to organize the internals of each subproject as you deem necessary.  In particular, it may even be a "data science project" in itself, with its own <code>raw/processed</code> data subfolders, its own Makefile-based computation graph, as well as own subprojects (which I would tend to name <code>tasks</code> in this case). In any case, always document each subproject (have a <code>README</code> file at the very least). Sometimes it helps to also have a root <code>projects/README.txt</code> file, which briefly lists the meaning of each subproject directory.</p>
<p>Eventually you may discover that the project list becomes too long, and decide to reorganize the <code>projects</code> directory. You compress some of them and move to an <code>archive</code> folder. You regroup some related projects and move them to the <code>tasks</code> subdirectory of some parent project.</li>
</ul>
<p>Exploration tasks come in two flavors. Some tasks are truly one-shot analyses, which can be solved using a Jupyter notebook that will never be executed again. Others aim to produce <em>reusable</em> code (not to be confused with <em>reproducible</em> outputs). I find it important to establish some conventions for how the reusable code should be kept. For example, the convention may be to have a file named <code>script.py</code> in the subproject's root which outputs an <code>argparse</code>-based help message when executed. Or you may decide to require providing a <code>run</code> function, configured as a <a href="http://docs.celeryproject.org/en/latest/" target="_blank" rel="noopener">Celery</a> task, so it can easily be submitted to the job queue. Or it could be something else - anything is fine, as long as it is consistent.</p>
<h3>The service checklist</h3>
<p>There is an other, orthogonal perspective on the data science workflow, which I find useful. Namely, rather than speaking about it in terms of a pipeline of <em>processes</em>, we may instead discuss the key <em>services </em>that data science projects typically rely upon. This way you may describe your particular (or desired) setup by specifying how exactly should each of the following 9 key services be provided:</p>
<div id="attachment_2312" style="width: 460px" class="wp-caption aligncenter"><a href="http://fouryears.eu/wp-content/uploads/2018/11/services-1.png"><img class="wp-image-2312" src="http://fouryears.eu/wp-content/uploads/2018/11/services-300x133.png" alt="Data science services" width="450" height="199" srcset="https://fouryears.eu/wp-content/uploads/2018/11/services-300x133.png 300w, https://fouryears.eu/wp-content/uploads/2018/11/services-768x340.png 768w, https://fouryears.eu/wp-content/uploads/2018/11/services-1024x454.png 1024w" sizes="(max-width: 450px) 100vw, 450px" /></a><p class="wp-caption-text">Data science services</p></div>
<ol>
<li><strong>File storage</strong>. Your project must have a home. Often this home must be shared by the team. Is it a folder on a network drive? Is it a working folder of a Git repository? How do you organize its contents?</li>
<li><strong>Data services</strong>. How do you store and access your <em>data</em>? "Data" here includes your source data, intermediate results, access to third-party datasets, metadata, models and reports - essentially everything which is read by or written by a computer. Yes, keeping a bunch of HDF5 files is also an example of a "data service".</li>
<li><strong>Versioning</strong>. Code, data, models, reports and documentation - everything should be kept under some form of version control. Git for code? Quilt for data? DVC for models? Dropbox for reports? Wiki for documentation? Once we're at it, do not forget to set up regular back ups for everything.</li>
<li><strong>Metadata and documentation</strong>. How do you document your project or subprojects? Do you maintain any machine-readable metadata about your features, scripts, datasets or models?</li>
<li><strong>Interactive computing</strong>. Interactive computing is how most of the hard work is done in data science. Do you use JupyterLab, RStudio, ROOT, Octave or Matlab? Do you set up a cluster for interactive parallel computing (e.g. ipyparallel or dask)?</li>
<li><strong>Job queue &amp; scheduler</strong>. How do you run your code? Do you use a job processing queue? Do you have the capability (or the need) to schedule regular maintenance tasks?</li>
<li><strong>Computation graph</strong>. How do you describe the computation graph and establish reproducibility? Makefiles? DVC? Airflow?</li>
<li><strong>Experiment manager</strong>. How do you collect, view and analyze model training progress and results? ModelDB? Hyperdash? FloydHub?</li>
<li><strong>Monitoring dashboard</strong>. How do you collect and track the performance of the model in production? Metabase? Tableau? PowerBI? Grafana?</li>
</ol>
<h3>The tools</h3>
<p>To conclude and summarize the exposition, here is a small <a href="https://docs.google.com/spreadsheets/d/1ZHLzlMdBEHDorCUEeeiSx3WAgfLGw34pC08C9T0IMkE/edit?usp=sharing" target="_blank" rel="noopener">spreadsheet</a>, listing the tools mentioned in this post (as well as some extra ones I added or will add later on), categorizing them according to which stages of the data science workflow (in the terms defined in this post) they aim to support. Disclaimer - I did try out most, but not all of them. In particular, my understanding of the capabilities of the non-free solutions in the list is so far only based on their online demos or descriptions on the site.</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2018/11/29/the-data-science-workflow/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What is a SIM Card?</title>
		<link>https://fouryears.eu/2018/06/28/what-is-a-sim-card/</link>
		<comments>https://fouryears.eu/2018/06/28/what-is-a-sim-card/#respond</comments>
		<pubDate>Thu, 28 Jun 2018 20:56:22 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Computer science]]></category>
		<category><![CDATA[Electronics]]></category>
		<category><![CDATA[Explanation]]></category>
		<category><![CDATA[How to]]></category>
		<category><![CDATA[Procrastination]]></category>
		<category><![CDATA[Smart cards]]></category>
		<category><![CDATA[Software development]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2134</guid>
		<description><![CDATA[I had to replace my SIM card today. The old one was mini-sized and I needed a micro-sized one. My previous SIM card stayed with me for about 20 years or so. I have gone through high school, university, and traveled around to countless places with it. I changed my phone several times, starting from [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>I had to replace my SIM card today. The old one was <a href="https://en.wikipedia.org/wiki/Subscriber_identity_module#Mini-SIM" target="_blank" rel="noopener">mini-sized</a> and I needed a <a href="https://en.wikipedia.org/wiki/Subscriber_identity_module#Micro-SIM" target="_blank" rel="noopener">micro-sized</a> one. My previous SIM card stayed with me for about 20 years or so. I have gone through high school, university, and traveled around to countless places with it. I changed my phone several times, starting from an older, bulkier Nokia through several candy-bar Nokias and a couple of smartphones. My personal computer evolved from a 200MHz Pentium with 16MB RAM to a 2.8GHz quad-core with 32GB RAM. But my SIM card always stayed the same. Let me dedicate this post to the memory of this glorious tiny computer - the most long-lasting and reliable piece of computing equipment I have ever used.</p>
<div id="attachment_2135" style="width: 230px" class="wp-caption aligncenter"><img class="wp-image-2135" src="http://fouryears.eu/wp-content/uploads/2018/06/RLE-300x191.jpg" alt="My retiring SIM card. The mobile operator changed its name from Radiolinja to Elisa in 2000, but the SIM stayed." width="220" height="140" srcset="https://fouryears.eu/wp-content/uploads/2018/06/RLE-300x191.jpg 300w, https://fouryears.eu/wp-content/uploads/2018/06/RLE.jpg 500w" sizes="(max-width: 220px) 100vw, 220px" /><p class="wp-caption-text">My retiring SIM card. The mobile operator changed its name from Radiolinja to Elisa in 2000, but the SIM card did not care.</p></div>
<h3>What is a SIM card?</h3>
<p>Your SIM card is just <a href="https://en.wikipedia.org/wiki/Universal_integrated_circuit_card" target="_blank" rel="noopener">another</a> <a href="https://en.wikipedia.org/wiki/Smart_card" target="_blank" rel="noopener">smart card</a>, technologically <a href="https://en.wikipedia.org/wiki/ISO/IEC_7816" target="_blank" rel="noopener">similar</a> to any of the <a href="https://en.wikipedia.org/wiki/EMV" target="_blank" rel="noopener">"chip-and-pin" bank cards</a> in your wallet, your <a href="https://en.wikipedia.org/wiki/Estonian_ID_card" target="_blank" rel="noopener">ID card</a> and even your <a href="https://en.wikipedia.org/wiki/Contactless_smart_card" target="_blank" rel="noopener">contactless public transport tickets</a> (which communicate using the same set of protocols, just wirelessly). Note, however, that banks started to switch from magnetic to "chip-and-pin" credit cards and countries began issuing <a href="https://smartid.ee/countries-available-smart-card-identifications-methods/" target="_blank" rel="noopener">smart-card-based identity documents</a> only about 15 years ago or so. SIM cards, on the other hand, had been in wide use much earlier, long before the term "smart card" became a popularized buzzword.</p>
<p>All smart cards are built to serve one primary purpose - <em>identification</em>. The card stores a secret key (which is impossible to retrieve without disassembling the card using ultra-high-tech lab equipment), and provides a way to execute the following <a href="https://en.wikipedia.org/wiki/Challenge%E2%80%93response_authentication" target="_blank" rel="noopener">challenge-response protocol</a>: you send a random string (<em>a challenge</em>) to the card. The card encrypts the string using the stored key and returns the <em>response</em>. The correctness of this response can now be verified by a second party (e.g. the mobile operator, the bank or a <a href="https://github.com/konstantint/eid-webauth-samples" target="_blank" rel="noopener">website using ID-card authentication</a>). The actual details of the computation differ among the various smart cards. For example, some use symmetric while other - asymmetric cryptography. Some cards provide additional services, such as creating digital signatures or storing information on the card. None the less, <em>identification</em> is always the core function. This explains the name of the SIM card: <a href="https://en.wikipedia.org/wiki/Subscriber_identity_module" target="_blank" rel="noopener">Subscriber <i>Identity </i>Module</a>. It is a module, which <em>identifies</em> you (the subscriber) to the network provider.</p>
<h3>What is inside a SIM card?</h3>
<p>The best way to understand what is inside a SIM (or any other type of smart-) card is to connect it to your computer and "talk" to it. Many laptops have integrated smart-card readers nowadays, so if you find a suitable frame for your nano/micro/mini-SIM, you may simply put it into the reader just as you would do with an ID or a bank card.</p>
<div id="attachment_2137" style="width: 310px" class="wp-caption aligncenter"><img class="size-medium wp-image-2137" src="http://fouryears.eu/wp-content/uploads/2018/06/Frame-300x202.jpg" alt="Mini-SIM in a frame" width="300" height="202" srcset="https://fouryears.eu/wp-content/uploads/2018/06/Frame-300x202.jpg 300w, https://fouryears.eu/wp-content/uploads/2018/06/Frame.jpg 500w" sizes="(max-width: 300px) 100vw, 300px" /><p class="wp-caption-text">Old mini-SIM in a frame from a newer card</p></div>
<p>Note, though, that if your frame is flimsy and not fitting perfectly (as is mine on the photo above) you run the risk of losing the chip somewhere in the depths of the card reader slot before you even manage to slide it in completely. Hence, in my case, a tiny <a href="https://www.euronics.ee/t-en/73598/computers-and-tablets/id-card-reader-usb-id/4748001003724" target="_blank" rel="noopener">cross-style USB card reader</a> was a more convenient and reliable option:</p>
<div id="attachment_2138" style="width: 310px" class="wp-caption aligncenter"><img class="size-medium wp-image-2138" src="http://fouryears.eu/wp-content/uploads/2018/06/FrameReader-300x175.jpg" alt="SIM in a frame in a reader" width="300" height="175" srcset="https://fouryears.eu/wp-content/uploads/2018/06/FrameReader-300x175.jpg 300w, https://fouryears.eu/wp-content/uploads/2018/06/FrameReader.jpg 500w" sizes="(max-width: 300px) 100vw, 300px" /><p class="wp-caption-text">SIM in a frame in a reader</p></div>
<p>Plug it in, wait until the system recognizes the device, and we are ready to talk. There are many tools for "talking" to the card on a low level, but one of the best for the purposes of educational exploration is, in my opinion, the program called <a href="http://pannetrat.com/Cardpeek/" target="_blank" rel="noopener">CardPeek</a>. Let us fire it up. At start up it asks for the card reader to use (note that you can use it to explore both contact and contactless cards, if you have the necessary reader):</p>
<div id="attachment_2139" style="width: 310px" class="wp-caption aligncenter"><img class="size-medium wp-image-2139" src="http://fouryears.eu/wp-content/uploads/2018/06/SelectCardReader-300x218.png" alt="Select reader screen" width="300" height="218" srcset="https://fouryears.eu/wp-content/uploads/2018/06/SelectCardReader-300x218.png 300w, https://fouryears.eu/wp-content/uploads/2018/06/SelectCardReader.png 599w" sizes="(max-width: 300px) 100vw, 300px" /><p class="wp-caption-text">Select reader screen</p></div>
<p>We can now click "Analyzer → GSM SIM", provide the PIN, wait a bit, and have the program extract a wealth of information stored on the card:</p>
<div id="attachment_2141" style="width: 310px" class="wp-caption aligncenter"><img class="size-medium wp-image-2141" src="http://fouryears.eu/wp-content/uploads/2018/06/CardData-300x233.png" alt="SIM card analyzed" width="300" height="233" srcset="https://fouryears.eu/wp-content/uploads/2018/06/CardData-300x233.png 300w, https://fouryears.eu/wp-content/uploads/2018/06/CardData.png 723w" sizes="(max-width: 300px) 100vw, 300px" /><p class="wp-caption-text">SIM card analyzed</p></div>
<p>Fun, right? Let us now see where all this data came from and what it actually means.</p>
<h3>How does it work?</h3>
<p>At the hardware level, a smart card has a very simple connector with six active pins, which are designed for sending and receiving data to and from the card:</p>
<div id="attachment_2142" style="width: 510px" class="wp-caption aligncenter"><img class="wp-image-2142" src="http://fouryears.eu/wp-content/uploads/2018/06/SmartCardInterface.png" alt="Smart card connectors" width="500" height="417" srcset="https://fouryears.eu/wp-content/uploads/2018/06/SmartCardInterface.png 821w, https://fouryears.eu/wp-content/uploads/2018/06/SmartCardInterface-300x250.png 300w, https://fouryears.eu/wp-content/uploads/2018/06/SmartCardInterface-768x641.png 768w" sizes="(max-width: 500px) 100vw, 500px" /><p class="wp-caption-text">Smart card connectors (<a href="http://ww1.microchip.com/downloads/en/AppNotes/01370A.pdf" target="_blank" rel="noopener">source</a>)</p></div>
<p>Four pins (VCC, GND, CLK, I/O) are used for the basic half-duplex synchronous serial communication. RST is the reset pin, and VPP is used for supplying the higher voltage when (re)programming the card. When you first connect to the card, the protocol requires to zero the "reset" pin, to which the card will reply by sending a fixed sequence of bytes, identifying the card and its capabilities. It is known as the card's <a href="https://www.eftlab.co.uk/index.php/site-map/our-articles/169-demystifying-atr-answer-to-reset" target="_blank" rel="noopener">ATR ("Answer To Reset")</a> string. You can see this string listed as the "cold ATR" entry in the screenshot named "SIM card analyzed" above.</p>
<p>Besides providing pre-packaged analysis scripts for various kinds of smart cards, CardPeek allows to send custom commands to the card by using its <a href="http://downloads.pannetrat.com/get/11a444b7a7840b2aadcc/cardpeek_ref.en.pdf" target="_blank" rel="noopener">Lua scripting interface</a> directly. This is what the "Command" text input at the bottom of the screen is for. Let us now switch to the "logs" tab and try sending some commands. First of all, we need to establish the connection to the card. This is done via the <code>card.connect()</code> call:</p>
<div id="attachment_2145" style="width: 410px" class="wp-caption aligncenter"><img class="wp-image-2145" src="http://fouryears.eu/wp-content/uploads/2018/06/CardConnect.png" alt="card.connect()" width="400" height="189" srcset="https://fouryears.eu/wp-content/uploads/2018/06/CardConnect.png 654w, https://fouryears.eu/wp-content/uploads/2018/06/CardConnect-300x142.png 300w" sizes="(max-width: 400px) 100vw, 400px" /><p class="wp-caption-text">card.connect() example</p></div>
<p>The ATR string is received by the reader when the connection is first established. We can obtain it via <code>card.last_atr()</code> and print out to the log window in hex-encoded form using <code>log.print</code> and <code>bytes:format</code> (the documentation for all these APIs is available <a href="http://downloads.pannetrat.com/get/11a444b7a7840b2aadcc/cardpeek_ref.en.pdf" target="_blank" rel="noopener">here</a>):</p>
<div id="attachment_2147" style="width: 410px" class="wp-caption aligncenter"><img class="wp-image-2147" src="http://fouryears.eu/wp-content/uploads/2018/06/CardATR.png" alt="ATR" width="400" height="189" srcset="https://fouryears.eu/wp-content/uploads/2018/06/CardATR.png 654w, https://fouryears.eu/wp-content/uploads/2018/06/CardATR-300x142.png 300w" sizes="(max-width: 400px) 100vw, 400px" /><p class="wp-caption-text">ATR example</p></div>
<p>As we see, the ATR for my card happens to be <code>3BBA9400401447473352533731365320</code> (in hex).  If you <a href="https://www.eftlab.co.uk/index.php/site-map/knowledge-base/171-atr-list-full" target="_blank" rel="noopener">search the web</a>, you will find that this particular ATR is known to be  a signature of the Elisa SIM cards. It is not a random string, though, and every byte <a href="https://mobileforensics.files.wordpress.com/2007/03/sim-card-protocols.pdf" target="_blank" rel="noopener">has a meaning</a>. In particular:</p>
<ul>
<li><code>3B</code> is the "initial byte", and this particular value identifies the smart card <a href="https://www.eftlab.co.uk/index.php/site-map/our-articles/169-demystifying-atr-answer-to-reset" target="_blank" rel="noopener">as a SIM card</a>.</li>
<li><code>BA</code> is the "format" byte. Its first four bits (1011) tell us that we have to expect fields TA1, TB1 and TD1 to follow. The last four bits denote the number 10 - the number of "historical bytes" at the end of the ATR.</li>
<li><code>94</code> is the field TA1, specifying the clock rate of the serial protocol</li>
<li><code>00</code> is the field TB1, specifying the programming voltage (apparently, the card is not re-programmable)</li>
<li><code>40</code> tells us that we have to read out another byte field TC2 (this is in the left-side part of the byte, <code>4</code> ) and that the card uses <a href="http://www.gorferay.com/the-t-0-transmission-protocol/" target="_blank" rel="noopener">T=0 protocol</a> (this is in the right-side part, <code>0</code>).</li>
<li><code>14</code> is the promised TC2 field (not sure what it is meant for),</li>
<li>the last 10 bytes are the "historical bytes", providing card manufacturer-specific information.</li>
</ul>
<h3>Greeting your Card</h3>
<p>Now that we are connected, we can send <a href="https://is.muni.cz/th/324546/fi_b/text.pdf" target="_blank" rel="noopener">various commands</a> to the card. Let us proceed by example. The first command we might want to send is "VERIFY CHV", which is essentially greeting the card by providing our PIN1 code ("CHV" stands for "Card Holder Verification").</p>
<div id="attachment_2148" style="width: 410px" class="wp-caption aligncenter"><img class="wp-image-2148" src="http://fouryears.eu/wp-content/uploads/2018/06/VERIFYCHV.png" alt="VERIFY CHV" width="400" height="128" srcset="https://fouryears.eu/wp-content/uploads/2018/06/VERIFYCHV.png 589w, https://fouryears.eu/wp-content/uploads/2018/06/VERIFYCHV-300x96.png 300w" sizes="(max-width: 400px) 100vw, 400px" /><p class="wp-caption-text">VERIFY CHV (<a href="https://is.muni.cz/th/324546/fi_b/text.pdf" target="_blank" rel="noopener">source</a>)</p></div>
<p>Every smart card command starts with a two-byte identifier (for example, <code>A0 20</code> is the (hex) identifier of the VERIFY CHV command). It is followed by two parameter bytes P1 and P2. For example, the parameter P1 for VERIFY CHV is always 0, and P2 must indicate the number of the PIN we are submitting (i.e. 1 for PIN1, 2 for PIN2). Next comes P3 - a byte, specifying the length of the data which follows. For VERIFY CHV the data is the provided PIN itself, and it is always 8 bytes long. If the PIN is shorter than 8 bytes, it must be padded by bytes <code>FF</code>. The PIN itself is encoded in plain ASCII (i.e. 1234 would be <span style="color: #222222; font-family: monospace;"><span style="background-color: #e9ebec;">31 32 33 34</span></span>).</p>
<p>Now, supposing my PIN1 is, in fact "1234", I can authenticate myself to the card via CardPeek as follows:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="lua">sw, resp = card.send(bytes.new(8, "A0 20 00 01 08 31 32 33 34 FF FF FF FF"))</pre>
<p>Here, <code>card.send</code> is the sending command and <code>bytes.new(8, ...)</code> constructs an array of 8-bit bytes from a hex string (<a href="http://downloads.pannetrat.com/get/11a444b7a7840b2aadcc/cardpeek_ref.en.pdf" target="_blank" rel="noopener">see CardPeek reference</a>).</p>
<p>The <code>sw</code> and <code>resp</code> are the two components of the T=0 protocol response. For VERIFY CHV we only care that <code>sw</code> is equal to <code>9000</code>, which means "OK". Note how this is printed in the log.</p>
<div id="attachment_2151" style="width: 410px" class="wp-caption aligncenter"><img class="wp-image-2151" src="http://fouryears.eu/wp-content/uploads/2018/06/CardPIN.png" alt="VERIFY CHV" width="400" height="189" srcset="https://fouryears.eu/wp-content/uploads/2018/06/CardPIN.png 654w, https://fouryears.eu/wp-content/uploads/2018/06/CardPIN-300x142.png 300w" sizes="(max-width: 400px) 100vw, 400px" /><p class="wp-caption-text">VERIFY CHV example</p></div>
<p>Beware that if you do not receive a <code>9000</code> response, it means that the card denied your PIN for some reason. Trying to submit a wrong PIN three times in a row will block the card.</p>
<h3>Reading the Data</h3>
<p>Now that we have identified ourselves to the card, let us try to read the data stored on it. The data on the card is organized in a hierarchy of files. It is this exact hierarchy that you can observe as the output of the "Analyzer" script. The root file is called "MF", it has the ICCID, TELECOM and GSM sub-files, which, in turn, have a number of predefined sub-files themselves, etc. The names are just conventions, the card itself uses two-byte identifiers for each file. For example, "MF" is actually <code>3F 00</code>, "TELECOM" is <code>7F 10</code>, etc. While the card is connected you can navigate around the file structure just like you would do in a normal operating system using the <code>cd</code> command, except that in smart-card lingo the corresponding command is called <code>SELECT</code>.</p>
<p>The binary form of the <code>SELECT</code> command is <code>A0 A4 00 00 02 {x} {y}</code>, where <code>{x} {y}</code> is the file identifier. Just like before, <code>A0 A4</code> is the command code, <code>00 00</code> are the ignored P1 and P2 parameters, and <code>02</code> tells us that exactly two bytes must follow.</p>
<p>Consequently, if we wanted to select the file "MF (3f 00)/TELECOM (7F 10)/ADN (6F 3A)", which contains the address book records, we could achieve it by sending the following sequence of commands via CardPeek:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="lua">card.send(bytes.new(8, "A0 A4 00 00 02 3F 00"))
card.send(bytes.new(8, "A0 A4 00 00 02 7F 10"))
card.send(bytes.new(8, "A0 A4 00 00 02 6F 3A"))</pre>
<p>Selecting files is a common task, and CardPeek provides a convenient shorthand: <code>card.select("#7f10")</code>. For some cards (not mine, unfortunately) it should also be possible to do things like <code>card.select("/7f10/6f3a")</code>.</p>
<p>Once we have selected the ADN ("Abbreviated Dialing Numbers") file, we may read out the individual phone numbers from it using the READ RECORD command. The procedure is complicated by the fact that READ RECORD needs to be provided the "record size" as one of its parameters, which, in turn, must be taken from the response data of the last SELECT command, and this must be obtained via the GET RESPONSE command. The complete example would therefore be:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="lua">-- Select /TELECOM/ADN
card.select("#7F10")
sw, resp = card.select("#6F3A")

-- Read file metadata
GET_RESPONSE = "A0 C0 00 00"
sw, resp = card.send(bytes.new(8, GET_RESPONSE, bit.AND(sw, 0xff)))

-- Read out first record in the file
sw, resp = card.read_record('.', 1, resp:get(14))

-- Print the record to the log
log.print(log.INFO, resp:format("%P"))</pre>
<div id="attachment_2154" style="width: 410px" class="wp-caption aligncenter"><img class="wp-image-2154" src="http://fouryears.eu/wp-content/uploads/2018/06/CardPhone.png" alt="Reading out a phone record" width="400" height="210" srcset="https://fouryears.eu/wp-content/uploads/2018/06/CardPhone.png 654w, https://fouryears.eu/wp-content/uploads/2018/06/CardPhone-300x158.png 300w" sizes="(max-width: 400px) 100vw, 400px" /><p class="wp-caption-text">Reading out a phone record</p></div>
<p>Note that instead of printing the output to the log via <code>log.print</code> you could also show a message box:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="lua">ui.readline(resp:format("%P"))</pre>
<p>or append a new node to the tree in the "card view" tab:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="lua">nodes.append(nodes.root(), {classname="block", 
                            label="Name", 
                            val=resp:format("%P"), 
                            size=#resp})
</pre>
<p>In fact, at this point you should go and read the script <code>$HOME/.cardpeek/scripts/gsm (beta).lua</code>. It contains the code we ran in the beginning of this post to analyze the card. The script simply sends the relevant commands to the card and appends all responses as nodes to the tree.</p>
<h3>Authentication</h3>
<p>While data storage is a useful capability of a SIM card, its main purpose is subscriber authentication. Thus, our acquaintance with the SIM card would be incomplete without checking the corresponding function out as well. It is quite simple:</p>
<div id="attachment_2155" style="width: 410px" class="wp-caption aligncenter"><img class="wp-image-2155" src="http://fouryears.eu/wp-content/uploads/2018/06/RunAlgo.png" alt="RUN GSM ALGORITHM" width="400" height="210" srcset="https://fouryears.eu/wp-content/uploads/2018/06/RunAlgo.png 668w, https://fouryears.eu/wp-content/uploads/2018/06/RunAlgo-300x158.png 300w" sizes="(max-width: 400px) 100vw, 400px" /><p class="wp-caption-text">RUN GSM ALGORITHM (<a href="https://is.muni.cz/th/324546/fi_b/text.pdf" target="_blank" rel="noopener">source</a>)</p></div>
<p>That is, the process is the following: we send the byte sequence <code>A0 88 00 00 10</code>, followed by a 16 byte-long challenge string (which is normally given by the mobile operator when the phone joins the network). The SIM card responds with 12 bytes, of which the first 4 we should send back to the mobile operator for verification, and use the remaining 8 as a cipher key.</p>
<p>Before we can use the RUN GSM ALGORITHM command we need to verify our PIN1 (already done) and <code>SELECT</code> the GSM (<code>7F 20</code>) file:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="lua">RUN_GSM_ALGO = "A0 88 00 00 10"
GET_RESPONSE = "A0 C0 00 00"
DF_GSM = "#7F20"

CHALLENGE_STRING = "00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00"

-- Select GSM file
card.select(DF_GSM)

-- Send the challenge data
sw, resp = card.send(bytes.new(8, RUN_GSM_ALGO, CHALLENGE_STRING))

-- Read 12-byte-long response
sw, RESPONSE_STRING = card.send(bytes.new(8, GET_RESPONSE, 12))
log.print(log.INFO, RESPONSE_STRING:format("%D"))</pre>
<div id="attachment_2157" style="width: 411px" class="wp-caption aligncenter"><img class="wp-image-2157" src="http://fouryears.eu/wp-content/uploads/2018/06/CardGSMResponse.png" alt="RUN GSM ALGORITHM example" width="401" height="190" srcset="https://fouryears.eu/wp-content/uploads/2018/06/CardGSMResponse.png 654w, https://fouryears.eu/wp-content/uploads/2018/06/CardGSMResponse-300x142.png 300w" sizes="(max-width: 401px) 100vw, 401px" /><p class="wp-caption-text">RUN GSM ALGORITHM example</p></div>
<p>And that is it. I hope you learned to understand and appreciate your SIM card a bit more today.</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2018/06/28/what-is-a-sim-card/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenVPN over HTTPS (in 5 minutes)</title>
		<link>https://fouryears.eu/2018/04/18/openvpn-over-https-in-5-minutes/</link>
		<comments>https://fouryears.eu/2018/04/18/openvpn-over-https-in-5-minutes/#respond</comments>
		<pubDate>Tue, 17 Apr 2018 23:25:50 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Cryptography]]></category>
		<category><![CDATA[How to]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Tutorial]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2098</guid>
		<description><![CDATA[Every once in a while I happen to find myself in a public network, where all access besides HTTP and HTTPS is blocked by the firewall. This is extremely inconvenient, as I routinely need to access SSH, VPN or other ports besides HTTP(S). Over time I have developed a reasonably fast and simple way of [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Every once in a while I happen to find myself in a public network, where all access besides HTTP and HTTPS is blocked by the firewall. This is extremely inconvenient, as I routinely need to access SSH, VPN or other ports besides HTTP(S). Over time I have developed a reasonably fast and simple way of overcoming the restriction whenever I need it. Let me document it here.</p>
<h3>Google Cloud Shell</h3>
<p>There are probably hundreds of cloud providers nowadays, each of them trying to outcompete the others by offering better, cheaper, faster, or more diverse set of services. One killer feature of the <a href="https://cloud.google.com/" target="_blank" rel="noopener">Google Cloud platform</a> is its <em>cloud shell, </em>which gives you command-line access to a tiny Linux virtual machine directly from their webpage for free:</p>
<div id="attachment_2099" style="width: 399px" class="wp-caption aligncenter"><img class="size-full wp-image-2099" src="http://fouryears.eu/wp-content/uploads/2018/04/cloudshell.png" alt="Once you are logged into Google Cloud platform you may open the shell here" width="389" height="185" srcset="https://fouryears.eu/wp-content/uploads/2018/04/cloudshell.png 389w, https://fouryears.eu/wp-content/uploads/2018/04/cloudshell-300x143.png 300w" sizes="(max-width: 389px) 100vw, 389px" /><p class="wp-caption-text">Once you are logged into Google Cloud platform you may open the shell window via this button</p></div>
<p>Even if you do not have any serious use for a cloud provider, the cloud shell is one good reason to get an account at the Google Cloud platform. Because whenever I find myself locked out of SSH behind a paranoid firewall, I can still SSH into any of my servers via the cloud shell. This works even when your access is limited to an HTTP proxy server.</p>
<p>Once upon a time there was a great service named <a href="http://koding.com" target="_blank" rel="noopener">koding.com</a>, which also provided free access to a Linux console via HTTP. Unfortunately, they have changed their pricing model since then and do not seem to have any similar free offerings anymore. If you know any alternative services that offer a web-based shell access to a Linux VM for free, do post them in the comments.</p>
<h3>OpenVPN via HTTPS</h3>
<p>Sometimes SSH access offered by the cloud shell is not enough. For example, I would often need to access the company's VPN server. It runs on port 1194 and in a properly paranoid network this port is, of course, also blocked. The way to sneak through this restriction is the following.</p>
<ol>
<li>Launch a server in the cloud, running an OpenVPN service on port 443 (which corresponds to HTTPS). Even the most paranoid firewalls would typically let HTTPS traffic through, because otherwise they would block most of the web for their users.</li>
<li>Connect to that VPN server and tunnel all traffic through it to the outside world.</li>
<li>Now we are free to connect anywhere we please. In particular, we may open a VPN tunnel to the company's server from within that "outer" VPN tunnel.</li>
<li>At this point I would sometimes SSH into a machine behind the company's VPN and never cease to be amused by the concept of having a SSH tunnel within a VPN tunnel within another VPN tunnel.</li>
</ol>
<p>Let us now go through all these steps in detail.</p>
<h4>Setting up an OpenVPN server</h4>
<p>We start by launching a machine in the cloud. You are free to choose any cloud provider here, but as we are using Google's cloud shell already anyway (we are working behind a paranoid firewall already, remember), it makes sense to launch the server from Google's cloud as well. This can be as simple as copy-pasting the following command into the same cloud shell prompt:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">gcloud compute instances create openvpn-server --zone=europe-west3-a --machine-type=f1-micro --tags=https-server --image=ubuntu-1604-xenial-v20180405 --image-project=ubuntu-os-cloud --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=openvpn-server</pre>
<p>(obviously, detailed documentation of Google cloud functionality is way beyond the scope of this blog post. All the necessary <a href="https://cloud.google.com/compute/docs/instances/create-start-instance" target="_blank" rel="noopener">references and tutorials</a> are rather easy to find, though).  You may play with some of the settings passed to the command above, however the choice of the <code>ubuntu-1604-***</code> image is important, because the script from the next part was only ever tested on that Linux version. The chosen machine type (<code>f1-micro</code>) is the cheapest and should cost around 5 euros per month (if you keep it open constantly), or a matter of cents, if you only use it for some hours.</p>
<div id="attachment_2103" style="width: 310px" class="wp-caption aligncenter"><img class="wp-image-2103 size-medium" src="http://fouryears.eu/wp-content/uploads/2018/04/cloudshell-createinstances-300x175.png" alt="Launching a machine in the cloud" width="300" height="175" srcset="https://fouryears.eu/wp-content/uploads/2018/04/cloudshell-createinstances-300x175.png 300w, https://fouryears.eu/wp-content/uploads/2018/04/cloudshell-createinstances.png 725w" sizes="(max-width: 300px) 100vw, 300px" /><p class="wp-caption-text">Launching a machine in the cloud</p></div>
<p>Once the machine is up, we SSH into it by typing:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">gcloud compute ssh openvpn-server</pre>
<p>Here we'll need to install and configure the OpenVPN server. This may be a fairly lengthy process of following step-by-step instructions from, for example, <a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-an-openvpn-server-on-ubuntu-16-04" target="_blank" rel="noopener">this well-written tutorial</a>. Luckily, I've gone through this already and wrote down all the steps down into a <a href="https://gist.github.com/konstantint/08ab09202b68e4e3542622e99d21a82e" target="_blank" rel="noopener">replayable script</a>, which seems to work fine so far with the chosen Linux image. Of course, there's no guarantee it will continue working forever (some rather loose configuration editing is hard-coded there). However, as we have just launched a throwaway virtual server, the worst that can happen is the need to throw that server away if it breaks. (<em>Do not run the script blindly on a machine you care about, though</em>). So let's just download and run it:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">curl -s https://gist.githubusercontent.com/konstantint/08ab09202b68e4e3542622e99d21a82e/raw/1a3ee68008d5b565565ebb8c126ae68a8cebe549/ovpn_setup.sh | bash -s</pre>
<p>Once completed, the script prints the filename "<code>/home/&lt;username&gt;/client-configs/files/client1.ovpn</code>". This is the name of the file which we need to transfer back to our computer. A clumsy, yet fast and straightforward way is to simply copy-paste its contents from the shell into a local text file:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">cat /home/your_username/client-configs/files/client1.ovpn</pre>
<p>We then select all the output starting from the first lines of the file</p>
<pre class="EnlighterJSRAW" data-enlighter-language="null">client
dev tun
proto tcp
...</pre>
<p>all the way down to</p>
<pre class="EnlighterJSRAW" data-enlighter-language="null">...
-----END OpenVPN Static key V1-----
&lt;/tls-auth&gt;</pre>
<p>(holding "shift", scrolling and clicking the mouse helps).</p>
<p>We then create a new file (on the local machine), name it <code>client1.ovpn</code> (for example), paste the copied text and save. That's it, we have successfully set up an OpenVPN server running on port 443. Type <span style="color: #222222; font-family: monospace;"><span style="background-color: #e9ebec;">exit</span></span> in the cloud shell to log out of the server as we don't need to configure anything there.</p>
<h4>Setting up an OpenVPN client</h4>
<p>Next we must set up an OpenVPN client on the local computer. I am using a Windows laptop, hence the instructions are Windows-specific, although the logic for Linux or Mac should be rather similar. First, install OpenVPN. The nicest way to do it in Windows is via <a href="https://chocolatey.org/" target="_blank" rel="noopener">Chocolatey</a>. Open <code>cmd.exe</code> with administrative privileges and:</p>
<p>1. Install Chocolatey, if you still don't have it (trust me, it's a good piece of software to have):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" &amp;&amp; SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"</pre>
<p>2. Now install OpenVPN (if you still don't have it):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">choco install -y openvpn</pre>
<p>3. Launch OpenVPN GUI (Windows button + type "OpenV" + Enter), right-click on the newly appeared tray icon, select "Import File..." and choose the <code>client1.ovpn</code> file we created:</p>
<div id="attachment_2107" style="width: 250px" class="wp-caption aligncenter"><img class="size-full wp-image-2107" src="http://fouryears.eu/wp-content/uploads/2018/04/import-file.png" alt="Import OVPN file" width="240" height="132" /><p class="wp-caption-text">Import OVPN file</p></div>
<p>4. Once you've done it, the OpenVPN tray menu will offer you a "Connect" option (or a "client1" submenu with a "Connect" option if you have other connections configured already). Click it, observe the connection dialog, wait until the tray icon becomes green, and congratulations, all your traffic is now tunneled through port 443 of the cloud machine you launched some minutes ago.</p>
<div id="attachment_2108" style="width: 86px" class="wp-caption aligncenter"><img class="wp-image-2108" src="http://fouryears.eu/wp-content/uploads/2018/04/ovpn-green.png" alt="OpenVPN client connected" width="76" height="92" /><p class="wp-caption-text">Connected</p></div>
<p>You may verify the effect by <a href="https://www.google.com/search?q=my+ip" target="_blank" rel="noopener">googling the words "my ip"</a>. You are now also free to connect to any ports or services you need.</p>
<h4>Tunnel in a Tunnel</h4>
<p>As I mentioned in the beginning, having freed myself from the firewalls of a paranoid network administrator, I would then sometimes need to connect to a corporate or a university VPN. This happens to be surprisingly easy (this part is, however, Windows specific - I am not sure how an equivalent action should look like on Linux or Mac, although I'm sure it should be possible).</p>
<ol>
<li>OpenVPN uses a virtual <a href="https://en.wikipedia.org/wiki/TUN/TAP" target="_blank" rel="noopener">network tunnel adapter</a> to forward traffic. Initially it only installs one such adapter, but if you want to run a tunnel within a tunnel you will need to add a second one. This is done by simply running <code>C:\Program Files\TAP-Windows\bin\addtap.bat</code> with administrator privileges. It only needs to be done once, of course (unless you need to run a tunnel within a tunnel within a tunnel - then you need to add a a third TAP adapter by running <code>addtap.bat</code> again).</li>
<li>Now running a VPN within a VPN is simply a matter of asking OpenVPN to "Connect" to VPNs in the appropriate order. As we are already connected to <code>client1</code>, we simply connect to another profile without disconnecting the first one - this will happily forward a tunnel through an existing tunnel. Fun, right?</li>
</ol>
<div id="attachment_2109" style="width: 219px" class="wp-caption aligncenter"><img class="size-full wp-image-2109" src="http://fouryears.eu/wp-content/uploads/2018/04/inception.png" alt="VPN via VPN" width="209" height="224" /><p class="wp-caption-text">VPN via VPN</p></div>
<h4>Cleaning Up</h4>
<p>If you only needed the VPN temporarily, do not forget to destroy the cloud machine before going home - otherwise you'll have to pay the unnecessary costs of keeping a server up. Destroying servers is simple. Just go back to the cloud shell where we launched the server and run:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="shell">gcloud compute instances delete openvpn-server</pre>
<p>That's it. You are back at the mercy of the firewalls.</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2018/04/18/openvpn-over-https-in-5-minutes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Program for Adding 10 Numbers</title>
		<link>https://fouryears.eu/2018/03/17/a-program-for-adding-10-numbers/</link>
		<comments>https://fouryears.eu/2018/03/17/a-program-for-adding-10-numbers/#comments</comments>
		<pubDate>Fri, 16 Mar 2018 22:07:58 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Procrastination]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software development]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2074</guid>
		<description><![CDATA[I have randomly stumbled upon a Quora question "Can you write a program for adding 10 numbers" yesterday. The existing answers competed in geeky humor and code golf, so I could not help adding another take on the problem. Can you write a program for adding 10 numbers? The question offers a great chance to illustrate [&#8230;]]]></description>
				<content:encoded><![CDATA[<blockquote><p>I have randomly stumbled upon a <a href="https://www.quora.com/Can-you-write-a-program-for-adding-10-numbers" target="_blank" rel="noopener">Quora question</a> "Can you write a program for adding 10 numbers" yesterday. The existing answers competed in geeky humor and code golf, so I could not help adding <a href="https://www.quora.com/Can-you-write-a-program-for-adding-10-numbers/answer/Konstantin-Tretyakov" target="_blank" rel="noopener">another take</a> on the problem.</p></blockquote>
<h3>Can you write a program for adding 10 numbers?</h3>
<p class="ui_qtext_para">The question offers a great chance to illustrate how to properly develop software solutions to real-life problems such as this one.</p>
<p class="ui_qtext_para">First things first - let us analyze the requirements posed by the customer. They are rather vague, as usual. It is not clear what “numbers” we need to add, where and how should these “numbers” come from, what is really meant under “adding”, what should we do with the result, what platform the software is supposed to be running on, what are the service guarantees, how many users are expected, etc.</p>
<p class="ui_qtext_para">Of course, we do not want to discover that we misunderstood some of the requirements <span class="qlink_container"><a class="external_link" href="https://stackoverflow.com/questions/4130051/software-development-costs-pyramid/4130091" target="_blank" rel="noopener nofollow" data-qt-tooltip="stackoverflow.com">late in the development cycle</a></span>, as this could potentially require us to re-do all of the work. To avoid such unpleasant surprises we should be planning for a general, solid, enterprise-grade solution to the problem. After a short meeting of the technical committee we decided to pick C# as the implementation platform. It is OS-independent and has many powerful features which should cover any possible future needs. For example, if the customer would decide to switch to a cluster-based, parallel implementation later along the way, we’d quickly have this base covered. Java could also be a nice alternative, but, <span class="qlink_container"><a class="external_link" href="https://insights.stackoverflow.com/survey/2018/#technology-what-languages-are-associated-with-the-highest-salaries-worldwide" target="_blank" rel="noopener nofollow" data-qt-tooltip="stackoverflow.com" data-tooltip="attached">according to the recent developer surveys, C# development pays more</a></span>.</p>
<h3>The Architecture</h3>
<p class="ui_qtext_para">Let us start by modeling the problem <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Abstraction_(software_engineering)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">on a higher level</a></span>. The customer obviously needs to process (“add”) some data (“10 numbers”). Without getting into too much detail, this task can be modeled as follows:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">interface IInputProvider {}
interface IOutput {}
interface ISolution {
    IOutput add10(IInputProvider input);    
}</pre>
<p class="ui_qtext_para">Note how we avoid specifying the actual sources of input and output yet. Indeed, we really don’t know where the “10 numbers” may be coming from in the future - these could be read from standard input, sent from the Internet, delivered by homing pigeons, or teleported via holographic technology of the future - all these options are easily supported by simply implementing <code class="prettyprint inline prettyprinted"><span class="typ">IInputProvider</span></code> appropriately.</p>
<p class="ui_qtext_para">Of course, we need to do something about the output once we obtain it, even though the customer forgot to mention this part of the problem. This means we will also have to implement the following interface:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">interface IOutputConsumer {
    void consumeOutput(IOutput output);
}</pre>
<p class="ui_qtext_para">And that is it - our general solution architecture! Let us start implementing it now.</p>
<h3>The Configuration</h3>
<p class="ui_qtext_para">The architecture we work with is completely abstract. An actual solution would need to provide implementations for the <code class="prettyprint inline prettyprinted"><span class="typ">IInputProvider</span></code>, <code class="prettyprint inline prettyprinted"><span class="typ">IOutputConsumer</span></code> and <code class="prettyprint inline prettyprinted"><span class="typ">ISolution</span></code> interfaces. How do we specify which classes are implementing these interfaces? There are many possibilities - we could load this information from a database, for example, and create a dedicated administrative interface for managing the settings. For reasons of brevity, we’ll illustrate a simplistic XML-based <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Factory_method_pattern" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">factory method pattern</a></span>.</p>
<p class="ui_qtext_para">Namely, we shall describe the necessary implementations in the XML file <code class="prettyprint inline prettyprinted"><span class="pln">config</span><span class="pun">.</span><span class="pln">xml</span></code> as follows:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="xml">&lt;Config&gt;
    &lt;InputProvider class="Enterprise.NumberSequenceProvider"/&gt;
    &lt;OutputConsumer class="Enterprise.PeanoNumberPrinter"/&gt;
    &lt;Solution class="Enterprise.TenNumbersAddingSolution"/&gt;
&lt;/Config&gt;</pre>
<p class="ui_qtext_para">A special <code class="prettyprint inline prettyprinted"><span class="typ">SolutionFactory</span></code> class can now load this configuration and create the necessary object instances. Here’s a prototype implementation:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class SolutionFactory {
    private XDocument cfg;
    public SolutionFactory(string configFile) {
        cfg = XDocument.Load(configFile);
    }
    public IInputProvider GetInputProvider() {
        return Instantiate&lt;IInputProvider&gt;("InputProvider");
    }
    public IOutputConsumer GetOutputConsumer() {
        return Instantiate&lt;IOutputConsumer&gt;("OutputConsumer");
    }
    public ISolution GetSolution() {
        return Instantiate&lt;ISolution&gt;("Solution");
    }
    private T Instantiate&lt;T&gt;(string elementName) {
        var typeName = cfg.Root.Element(elementName)
                               .Attribute("class").Value;
        return (T)Activator.CreateInstance(Type.GetType(typeName));
    }
}</pre>
<p class="ui_qtext_para">Of course, in a real implementation we would also worry about specifying the XML Schema for our configuration file, and make sure it is possible to override the (currently hard-coded) “config.xml” file name with an arbitrary URI using command-line parameters or environment variables. In many real-life enterprise solutions in Java, for example, even the choice of the XML parsing library would need to be configured and initialized using <span class="qlink_container"><a class="external_link" href="https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilderFactory.html" target="_blank" rel="noopener" data-qt-tooltip="oracle.com">its own factory pattern</a></span>. I omit many of such (otherwise crucial) details for brevity here.</p>
<p class="ui_qtext_para">I am also omitting the <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Unit_testing" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">unit-tests</a></span>, which, of course, <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Test-driven_development" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">should be covering every single method</a></span> we are implementing.</p>
<h3>The Application</h3>
<p class="ui_qtext_para">Now that we have specified the architecture and implemented the configuration logic, let us put it all together into a working application. Thanks to our flexible design, the main application code is extremely short and concise:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class Program {
    static void Main(string[] args) {
        var sf = new SolutionFactory("config.xml");
        var ip = sf.GetInputProvider();
        var oc = sf.GetOutputConsumer();
        var sol = sf.GetSolution();
        var op = sol.add10(ip);
        oc.consumeOutput(op);
    }
}</pre>
<p class="ui_qtext_para">Amazing, right? Well, it does not really work yet, of course, because we still need to implement the core interfaces. However, at this point we may conclude the work of the senior architect and assign the remaining tasks of filling in the blanks to the the main engineering team.</p>
<h3>The Inputs and Outputs</h3>
<p class="ui_qtext_para">Now that we have set up the higher-level architecture, we may think a bit more specifically about the algorithm we plan to implement. Recall that we need to “add 10 numbers”. We don’t really know what these “numbers” should be - they could be real numbers, complex numbers, Roman numerals or whatnot, so we have to be careful and not rush into making strict assumptions yet. Let’s just say that a “number” is something that can be <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Group_(mathematics)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">added to another number</a></span>:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">interface INumber {
    INumber add(INumber other);
}</pre>
<p class="ui_qtext_para">We’ll leave the implementation of this interface to our mathematicians on the team later on.</p>
<p class="ui_qtext_para">At this step we can also probably make the assumption that our <code class="prettyprint inline prettyprinted"><span class="typ">IInputProvider</span></code>implementation should somehow give access to ten different instances of an <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code>. We don’t know how these instances are provided - in the worst case each of them may be obtained using a completely different method and at completely different times. Consequently, one possible template for an <code class="prettyprint inline prettyprinted"><span class="typ">IInputProvider</span></code> could be the following:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">interface ITenNumbersProvider: IInputProvider {
    INumber GetNumber1();
    INumber GetNumber2();
    INumber GetNumber3();
    INumber GetNumber4();
    INumber GetNumber5();
    INumber GetNumber6();
    INumber GetNumber7();
    INumber GetNumber8();
    INumber GetNumber9();
    INumber GetNumber10();
}</pre>
<p class="ui_qtext_para">Note how, by avoiding the use of array indexing, we force the compiler to <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Static_program_analysis" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">require</a> </span>that any implementation of our <code class="prettyprint inline prettyprinted"><span class="typ">ITenNumbersProvider</span></code> interface indeed provides exactly ten numbers. For brevity, however, let us <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Code_refactoring" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">refactor</a></span> this design a bit:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">enum NumberOfANumber {
    ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN
}
interface ITenNumbersProvider: IInputProvider {
    INumber GetNumber(NumberOfANumber noan);
}</pre>
<p class="ui_qtext_para">By listing the identities of our “numbers” in an <code class="prettyprint inline prettyprinted"><span class="kwd">enum</span></code> we still get some level of compile-time safety, although it is not as strong any more, because <code class="prettyprint inline prettyprinted"><span class="kwd">enum</span></code> is, internally, just an integer. However, we god rid of unnecessary repetitions, which is <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">a good thing</a></span>. Refactoring is an important aspect of enterprise software development, you see.</p>
<p class="ui_qtext_para">The senior architect looked at the proposed interface at one of our regular <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Stand-up_meeting#Software_development" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">daily stand-ups</a></span>, and was concerned with the chosen design. “Your interface assumes you can provide immediate access to any of the ten numbers”, he said. But what if the numbers cannot be provided simultaneously and will be arriving at unpredictable points in time? If this were the case, an <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Observer_pattern" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">event-driven</a></span> design would be much more appropriate:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">delegate void NumberHandler(NumberOfANumber id, INumber n);
 
interface IAsynchronousInputProvider: IInputProvider {
    void AddNumberListener(NumberHandler handler);
}</pre>
<p class="ui_qtext_para">The adding subsystem would then simply <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Publish-subscribe" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">subscribe</a></span> to receive events about the incoming numbers and handle them as they come in.</p>
<p class="ui_qtext_para">“This is all good and nice”, responded the mathematician, “but for efficient implementation of the addition algorithm we might need to have all ten numbers available at the same time”. “Ah, <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Design_Patterns" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">software design 101</a></span>”, says the senior architect. We simply install an <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Adapter_pattern" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">adapter class</a></span>. It would pool the incoming data until we have all of it, thus converting the <code class="prettyprint inline prettyprinted"><span class="typ">IAsynchronousInputProvider</span></code>, used for feeding the data, into an <code class="prettyprint inline prettyprinted"><span class="typ">ITenNumbersProvider</span></code>, needed by the mathematician:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class SyncronizationAdapter: ITenNumbersProvider {
   private Dictionary&lt;NumberOfANumber, INumber&gt; nums;
   private ManualResetEvent allDataAvailableEvent;
 
   public SynchronizationAdapter(IAsynchronousInputProvider ainput){
       nums = new Dictionary&lt;NumberOfANumber, INumber&gt;();
       allDataAvailableEvent = new ManualResetEvent(false);
       ainput.AddNumberListener(this.HandleIncomingNumber);
   }
   private void HandleIncomingNumber(NumberOfANumber id, INumber n){
       nums[id] = n;
       if (Enum.GetValues(typeof(NumberOfANumber))
               .Cast&lt;NumberOfANumber&gt;()
               .All(k =&gt; nums.ContainsKey(k)))
            allDataAvailableEvent.Set();
   }
   public INumber GetNumber(NumberOfANumber noan) {
       allDataAvailableEvent.WaitOne();
       return nums[noan];
   }
}</pre>
<p class="ui_qtext_para">Now the mathematician can work on his addition logic without having to know anything about the way the numbers are coming in. <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Decomposition_(computer_science)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">Convenient</a></span>, isn’t it?</p>
<p class="ui_qtext_para">Note that we are still only providing the input <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Interface_(computing)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">interface specification</a></span> (along with an adapter) here. The actual implementation has to wait until our mathematicians come up with an implementation of <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code> and the data engineers decide on how to obtain ten of these in the most optimal way.</p>
<p class="ui_qtext_para">But what about <code class="prettyprint inline prettyprinted"><span class="typ">IOutput</span></code>? Let us assume that we expect to output a single number. This means that <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code> must itself <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">already be an instance</a></span> of <code class="prettyprint inline prettyprinted"><span class="typ">IOutput</span></code>:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">interface INumber: IOutput {
   INumber add(INumber other);
}</pre>
<p class="ui_qtext_para">No need to implement anything, we just add an interface tag to <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code>! See how <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Object-oriented_design" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">object-oriented design techniques</a></span> allow us to save development time!</p>
<h3>The Order of Addition</h3>
<p class="ui_qtext_para">OK, so we now have a concept of an <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code> which has a (binary) addition operation defined, an <code class="prettyprint inline prettyprinted"><span class="typ">ITenNumbersProvider</span></code> which can provide ten <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code> instances (conveniently abstracting away the <code class="prettyprint inline prettyprinted"><span class="typ">IAsynchrhonousInputProvider</span></code> which actually obtains the numbers), and our goal is to add them up to get an <code class="prettyprint inline prettyprinted"><span class="typ">IOutput</span></code> which is itself an <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code>. Sounds easy, right? <span class="qlink_container"><a class="external_link" href="http://www.phys.uconn.edu/~rozman/Courses/P2200_11F/downloads/sum-howto.pdf" target="_blank" rel="noopener nofollow" data-qt-tooltip="uconn.edu">Not so fast</a></span>! How exactly are we going to add these numbers? After all, maybe in some cases adding ((a+b)+c)+d)… can be less efficient or precise than (a+(b+(c+(d…. Or maybe the optimal addition strategy is to start from the middle and then add numbers in some order? There do exist <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Kahan_summation_algorithm" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">nontrivial ways to add up numbers</a></span>, you know. To accommodate for any possible options in the future (so that we wouldn’t have to rewrite the code unnecessarily), we should design our solution in a way that would let us switch our addition strategy easily, should we discover a better algorithm. One way to do it is by abstracting the implementation behind the following interface:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">interface IAdditionStrategy {
   INumber fold(Func&lt;NumberOfANumber, INumber&gt; elements,
                Func&lt;INumber, INumber, INumber&gt; op); 
}</pre>
<p class="ui_qtext_para">You see, it is essentially a <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Higher-order_function" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">functor</a></span>, which gets a way to access our set of numbers (via an accessor function) along with a binary operator “op”, and “<span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Fold_(higher-order_function)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">folds</a></span>” this operator along the number set in any way it deems necessary. This particular piece was designed by Harry, who is a huge fan of <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Functional_programming" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">functional programming</a></span>. He was somewhat disappointed when we decided not to implement everything in <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Haskell_(programming_language)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">Haskell</a></span>. Now he can show how everyone was wrong. Indeed, the <code class="prettyprint inline prettyprinted"><span class="typ">IAdditionStrategy</span></code> <i>is </i>a core element of our design, after all, and it happens to look like a fold-functor which takes functions as inputs! “I told you we had to go with Haskell!”, says Harry! It would allow us to implement all of our core functionality with a much <span class="qlink_container"><a class="external_link" href="https://wiki.haskell.org/Polymorphism" target="_blank" rel="noopener" data-qt-tooltip="haskell.org">higher level</a></span> of <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Polymorphism_(computer_science)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">polymorphism</a></span> than that of a simplistic C# interface!</p>
<h3>The Solution Logic</h3>
<p class="ui_qtext_para">So, if we are provided with the ten numbers via <code class="prettyprint inline prettyprinted"><span class="typ">ITenNumbersProvider</span></code> and an addition strategy via <code class="prettyprint inline prettyprinted"><span class="typ">IAdditionStrategy</span></code>, the implementation of the solution becomes a very simple matter:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class TenNumbersAddingSolution: ISolution {
   private IAdditionStrategy strategy;
   public TenNumbersAddingSolution() {
       strategy = ...
   }
   public IOutput add10(IInputProvider input) {
       var tenNumbers = new SynchronizationAdapter(
                      (IAsynchronousInputProvider)input);
       return strategy.fold(i =&gt; tenNumbers.GetNumber(i), 
                            (x,y) =&gt; x.add(y));
   }
}</pre>
<p class="ui_qtext_para">We still need to specify where to take the implementation of the <code class="prettyprint inline prettyprinted"><span class="typ">IAdditionStrategy</span></code> from, though. This would be a good place to refactor our code by introducing a <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Dependency_injection" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">dependency injection</a></span> configuration framework such as the <span class="qlink_container"><a class="external_link" href="https://autofac.org/" target="_blank" rel="noopener nofollow" data-qt-tooltip="autofac.org">Autofac library</a></span>. However, to keep this text as short as possible, I am forced to omit this step. Let us simply add the “Strategy” field to our current <code class="prettyprint inline prettyprinted"><span class="pln">config</span><span class="pun">.</span><span class="pln">xml</span></code> as follows:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="xml">&lt;Config&gt;
    ...
    &lt;Solution class="Enterprise.TenNumbersAddingSolution"&gt;
        &lt;Strategy class="Enterprise.AdditionStrategy"/&gt;
    &lt;/Solution&gt;
&lt;/Config&gt;</pre>
<p class="ui_qtext_para">We could now load this configuration setting from the solution class:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">    ...
    public TenNumbersAddingSolution() {
        var cfg = XDocument.Load("config.xml");
        var typeName = cfg.Root
               .Element("Solution")
               .Element("Strategy")
               .Attribute("class").Value;
        strategy = (IAdditionStrategy)Activator
               .CreateInstance(Type.GetType(typeName));
    }
    ...</pre>
<p class="ui_qtext_para">And voilà, we have our solution logic in place. We still need to implement <code class="prettyprint inline prettyprinted"><span class="typ">INumber</span></code>, <code class="prettyprint inline prettyprinted"><span class="typ">IAdditionStrategy</span></code>, <code class="prettyprint inline prettyprinted"><span class="typ">ITenNumbersProvider</span> </code>and <code class="prettyprint inline prettyprinted"><span class="typ">IOutputConsumer</span></code>, though. These are the lowest-level tasks that will force us to make the most specific decisions and thus determine the actual shape of our final product. These will be done by the most expert engineers and mathematicians, who understand how things <i>actually </i>work inside.</p>
<h3>The Numbers</h3>
<p class="ui_qtext_para">How should we implement our numbers? As this was not specified, we should probably start with the simplest possible option. One of the most basic number systems from the mathematician’s point of view is that of <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Natural_number#Peano_axioms" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">Peano natural numbers</a></span>. It is also quite simple to implement, so let’s go for it:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class PeanoInteger: INumber {
    public PeanoInteger Prev { get; private set; }
    public PeanoInteger(PeanoInteger prev) { Prev = prev; }
    public INumber add(INumber b) {
        if (b == null) return this;
        else return new PeanoInteger(this)
                .add(((PeanoInteger)b).Prev);
    }
}</pre>
<p class="ui_qtext_para">Let us have <code class="prettyprint inline prettyprinted"><span class="typ">IOutputConsumer</span></code> print out the given Peano integer as a sequence of “1”s to the console:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class PeanoNumberPrinter: IOutputConsumer {
    public void consumeOutput(IOutput p) {
        for (var x = (PeanoInteger)p; x != null; x = x.Prev)
             Console.Write("1");
        Console.WriteLine();
    }
}</pre>
<p class="ui_qtext_para">Finally, our prototype <code class="prettyprint inline prettyprinted"><span class="typ">IAdditionStrategy</span></code> will be adding the numbers left to right. We shall leave the option of considering other strategies for later <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Iterative_and_incremental_development" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">development iterations</a></span>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class AdditionStrategy: IAdditionStrategy {
    public INumber fold(Func&lt;NumberOfANumber, INumber&gt; elements,
                        Func&lt;INumber, INumber, INumber&gt; op) {
       return Enum.GetValues(typeof(NumberOfANumber))
                  .Cast&lt;NumberOfANumber&gt;()
                  .Select(elements).Aggregate(op);
    }
}</pre>
<p class="ui_qtext_para">Take a moment to contemplate the beautiful abstraction of this functional method once again. Harry’s work, no doubt!</p>
<h3>The Input Provider</h3>
<p class="ui_qtext_para">The only remaining piece of the puzzle is the <i>source</i> of the numbers, i.e. the <code class="prettyprint inline prettyprinted"><span class="typ">IAsynchronousInputProvider</span></code> interface. Its implementation is a fairly arbitrary choice at this point - most probably the customer will want to customize it later, but for the purposes of our <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Minimum_viable_product" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">MVP</a></span> we shall implement a simple sequential asynchronous generator of Peano numbers {1, 2, 3, …, 10}:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="csharp">class NumberSequenceProvider: IAsynchronousInputProvider {
    private event NumberHandler handler;
    private ManualResetEvent handlerAvailable;
 
    public NumberSequenceProvider() {
        handlerAvailable = new ManualResetEvent(false);
        new Thread(ProduceNumbers).Start();
    }
    public void AddNumberListener(NumberHandler nh) {
        handler += nh;
        handlerAvailable.Set();
    }
    private void ProduceNumbers() {
        handlerAvailable.WaitOne();
        PeanoInteger pi = null;
        foreach (var v in Enum.GetValues(typeof(NumberOfANumber))
                              .Cast&lt;NumberOfANumber&gt;()) {
                pi = new PeanoInteger(pi);
                handler(v, pi);
        }
    }
}</pre>
<p class="ui_qtext_para">Note that we have to be careful to not start publishing the inputs before the number processing subsystem attaches to the input producer. To achieve that we rely on the <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Event_(synchronization_primitive)" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">event semaphore</a></span> synchronization primitive. At this point we can clearly see the benefit of choosing a powerful, enterprise-grade platform from the start! Semaphores would look much clumsier in Haskell, don’t you think, Harry? <i>(Harry <a href="https://hackage.haskell.org/package/concurrent-extra" target="_blank" rel="noopener">disagrees</a>)</i></p>
<p class="ui_qtext_para">So here we are - we have a solid, enterprise-grade, asynchronous, configurable implementation for an abstractly defined addition of abstractly defined numbers, using an abstract input-output mechanism.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="null">$&gt; dotnet run
1111111111111111111111111111111111111111111111111111111</pre>
<p class="ui_qtext_para">We do need some more months to ensure full test coverage, update our numerous <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Unified_Modeling_Language" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">UML diagrams</a></span>, write documentation for users and API docs for developers, work on packaging and installers for various platforms, arrange marketing and sales for the project (logo, website, Facebook page, customer relations, all that, you know), and attract investors. Investors could then propose to <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Lean_startup#Pivot" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">pivot</a></span> the product into a <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Blockchain" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">blockchain</a></span>-based, distributed <span class="qlink_container"><a class="external_link" href="https://en.wikipedia.org/wiki/Initial_coin_offering" target="_blank" rel="noopener nofollow" data-qt-tooltip="wikipedia.org">solution</a></span>. Luckily, thanks to our rock solid design abstractions, this would all boil down to reimplementing just a few of the lower-level interfaces!</p>
<p class="ui_qtext_para">Software engineering <a href="https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition" target="_blank" rel="noopener">is fun</a>, isn’t it?</p>
<p class="ui_qtext_para"><i>The source code for the developed solution is available </i><span class="qlink_container"><a class="external_link" href="https://gist.github.com/konstantint/f8c45b64342c56a6f86a77f945ac8b67" target="_blank" rel="noopener" data-qt-tooltip="github.com"><i>here</i></a></span><i>.</i></p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2018/03/17/a-program-for-adding-10-numbers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Mystery of Early Stopping</title>
		<link>https://fouryears.eu/2017/12/06/the-mystery-of-early-stopping/</link>
		<comments>https://fouryears.eu/2017/12/06/the-mystery-of-early-stopping/#comments</comments>
		<pubDate>Tue, 05 Dec 2017 23:38:46 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data analysis]]></category>
		<category><![CDATA[Explanation]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[Unclear]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=2025</guid>
		<description><![CDATA[Early stopping is a technique that is very often used when training neural networks, as well as with some other iterative machine learning algorithms. The idea is quite intuitive - let us measure the performance of our model on a separate validation dataset during the training iterations. We may then observe that, despite constant score improvements on [&#8230;]]]></description>
				<content:encoded><![CDATA[<p><a href="https://en.wikipedia.org/wiki/Early_stopping" target="_blank" rel="noopener">Early stopping</a> is a technique that is very often used when training neural networks, as well as with some other iterative machine learning algorithms. The idea is quite intuitive - let us measure the performance of our model on a separate <em>validation </em>dataset during the training iterations. We may then observe that, despite constant score improvements on the training data, the model's performance on the validation dataset would only improve during the first stage of training, reach an optimum at some point and then turn to getting worse with further iterations.</p>
<div id="attachment_2027" style="width: 463px" class="wp-caption aligncenter"><img class="size-full wp-image-2027" src="http://fouryears.eu/wp-content/uploads/2017/12/early_stopping.png" alt="The early stopping principle" width="453" height="284" srcset="https://fouryears.eu/wp-content/uploads/2017/12/early_stopping.png 453w, https://fouryears.eu/wp-content/uploads/2017/12/early_stopping-300x188.png 300w" sizes="(max-width: 453px) 100vw, 453px" /><p class="wp-caption-text">The early stopping principle</p></div>
<p>It thus seems reasonable to stop training at the point when the minimal validation error is achieved. Training the model any further only leads to <a href="https://en.wikipedia.org/wiki/Overfitting" target="_blank" rel="noopener">overfitting</a>. Right? The reasoning sounds solid and, indeed, early stopping is often claimed to improve generalization in practice. Most people seem to take the benefit of the technique for granted. In this post I would like to introduce some skepticism into this view or at least illustrate that things are not necessarily as obvious as they may seem from the diagram with the two lines above.</p>
<h3>How does Early Stopping Work?</h3>
<p>To get a better feeling of what early stopping actually <em>does</em>, let us examine its application to a very simple "machine learning model" - the estimation of the mean. Namely, suppose we are given a sample of 50 points <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-ecf030f940f517787145a85f7a90f283_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;" title="Rendered by QuickLaTeX.com" height="11" width="16" style="vertical-align: -3px;"/> from a normal distribution with unit covariance and we need to estimate the mean <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-098fe4fc91886b0f8da407e07e59a15f_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#119;&#125;" title="Rendered by QuickLaTeX.com" height="9" width="15" style="vertical-align: 0px;"/> of this distribution.</p>
<div id="attachment_2033" style="width: 350px" class="wp-caption aligncenter"><img class="wp-image-2033 size-full" src="http://fouryears.eu/wp-content/uploads/2017/12/1_sample-1.png" alt="Sample" width="340" height="329" srcset="https://fouryears.eu/wp-content/uploads/2017/12/1_sample-1.png 340w, https://fouryears.eu/wp-content/uploads/2017/12/1_sample-1-300x290.png 300w" sizes="(max-width: 340px) 100vw, 340px" /><p class="wp-caption-text">Sample</p></div>
<p>The maximum likelihood estimate of <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-098fe4fc91886b0f8da407e07e59a15f_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#119;&#125;" title="Rendered by QuickLaTeX.com" height="9" width="15" style="vertical-align: 0px;"/> can be found as the point which has the smallest sum of squared distances to all the points in the sample. In other words, "model fitting" boils down to finding the minimum of the following objective function:</p>
<p class="ql-center-displayed-equation" style="line-height: 54px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-4a4ff04b2699f9033f5d09897e4dfc5e_l3.png" height="54" width="200" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#116;&#114;&#97;&#105;&#110;&#125;&#40;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#119;&#125;&#41;&#32;&#58;&#61;&#32;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#53;&#48;&#125;&#32;&#92;&#86;&#101;&#114;&#116;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;&#32;&#45;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#119;&#125;&#92;&#86;&#101;&#114;&#116;&#94;&#50;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>
<p>As our estimate is based on a finite sample, it, of course, won't necessarily be exactly equal to the true mean of the distribution, which I chose in this particular example to be exactly (0,0):</p>
<div id="attachment_2034" style="width: 504px" class="wp-caption aligncenter"><img class="wp-image-2034 size-full" src="http://fouryears.eu/wp-content/uploads/2017/12/2_estimate.png" alt="Sample mean as a minimum of the objective function" width="494" height="301" srcset="https://fouryears.eu/wp-content/uploads/2017/12/2_estimate.png 494w, https://fouryears.eu/wp-content/uploads/2017/12/2_estimate-300x183.png 300w" sizes="(max-width: 494px) 100vw, 494px" /><p class="wp-caption-text">Sample mean as a minimum of the objective function</p></div>
<p>The circles in the illustration above are the contours of the objective function, which, as you might guess, is a <a href="https://en.wikipedia.org/wiki/Paraboloid" target="_blank" rel="noopener">paraboloid</a> bowl. The red dot marks its bottom and is thus the solution to our optimization problem, i.e. the estimate of the mean we are looking for. We may find this solution in various ways. For example, a natural closed-form analytical solution is simply the mean of the training set. For our purposes, however, we will be using the <a href="https://en.wikipedia.org/wiki/Gradient_descent" target="_blank" rel="noopener">gradient descent</a> iterative optimization algorithm. It is also quite straightforward: start with any point (we'll pick (-0.5, 0) for concreteness' sake) and descend in small steps downwards until we reach the bottom of the bowl:</p>
<div id="attachment_2038" style="width: 504px" class="wp-caption aligncenter"><img class="wp-image-2038 size-full" src="http://fouryears.eu/wp-content/uploads/2017/12/3_descent-1.png" alt="Gradient descent" width="494" height="301" srcset="https://fouryears.eu/wp-content/uploads/2017/12/3_descent-1.png 494w, https://fouryears.eu/wp-content/uploads/2017/12/3_descent-1-300x183.png 300w" sizes="(max-width: 494px) 100vw, 494px" /><p class="wp-caption-text">Gradient descent</p></div>
<p>Let us now introduce <em>early stopping</em> into the fitting process. We will split our 50 points randomly into two separate sets: 40 points will be used to fit the model and 10 will form the early stopping <em>validation set</em>. Thus, technically, we now have two different objective functions to deal with:</p>
<p class="ql-center-displayed-equation" style="line-height: 54px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-5adb94ca761b724a00c38b830a6bd831_l3.png" height="54" width="184" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#102;&#105;&#116;&#125;&#40;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#119;&#125;&#41;&#32;&#58;&#61;&#32;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#52;&#48;&#125;&#32;&#92;&#86;&#101;&#114;&#116;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;&#32;&#45;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#119;&#125;&#92;&#86;&#101;&#114;&#116;&#94;&#50;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>
<p>and</p>
<p class="ql-center-displayed-equation" style="line-height: 54px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-8ac10bff72d2e3925e207369119b4537_l3.png" height="54" width="205" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#115;&#116;&#111;&#112;&#125;&#40;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#119;&#125;&#41;&#32;&#58;&#61;&#32;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#52;&#49;&#125;&#94;&#123;&#53;&#48;&#125;&#32;&#92;&#86;&#101;&#114;&#116;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;&#32;&#45;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#119;&#125;&#92;&#86;&#101;&#114;&#116;&#94;&#50;&#46;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>
<p>Each of those defines its own "paraboloid bowl", both slightly different from the original one (because those are different subsets of data):</p>
<div id="attachment_2040" style="width: 504px" class="wp-caption aligncenter"><img class="wp-image-2040 size-full" src="http://fouryears.eu/wp-content/uploads/2017/12/4_twobowls.png" alt="Fitting and early stopping objectives" width="494" height="301" srcset="https://fouryears.eu/wp-content/uploads/2017/12/4_twobowls.png 494w, https://fouryears.eu/wp-content/uploads/2017/12/4_twobowls-300x183.png 300w" sizes="(max-width: 494px) 100vw, 494px" /><p class="wp-caption-text">Fitting and early stopping objectives</p></div>
<p>As our algorithm descends towards the red point, we will be tracking the value of <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-b885b7f2118ee8eaead948e04e2020b4_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#115;&#116;&#111;&#112;&#125;" title="Rendered by QuickLaTeX.com" height="18" width="34" style="vertical-align: -6px;"/> at each step along the way:</p>
<div id="attachment_2041" style="width: 524px" class="wp-caption aligncenter"><img class="size-full wp-image-2041" src="http://fouryears.eu/wp-content/uploads/2017/12/5_descent2.png" alt="Gradient descent with validation" width="514" height="268" srcset="https://fouryears.eu/wp-content/uploads/2017/12/5_descent2.png 514w, https://fouryears.eu/wp-content/uploads/2017/12/5_descent2-300x156.png 300w" sizes="(max-width: 514px) 100vw, 514px" /><p class="wp-caption-text">Gradient descent with validation</p></div>
<p>With a bit of imagination you should see on the image above, how the validation error decreases as the yellow trajectory approaches the purple dot and then starts to increase after some point midway. The spot where the validation error achieves the minimum (and thus the result of the early stopping algorithm) is shown by the green dot on the figure below:</p>
<div id="attachment_2042" style="width: 524px" class="wp-caption aligncenter"><img class="size-full wp-image-2042" src="http://fouryears.eu/wp-content/uploads/2017/12/6_earlystop.png" alt="Early stopping" width="514" height="268" srcset="https://fouryears.eu/wp-content/uploads/2017/12/6_earlystop.png 514w, https://fouryears.eu/wp-content/uploads/2017/12/6_earlystop-300x156.png 300w" sizes="(max-width: 514px) 100vw, 514px" /><p class="wp-caption-text">Early stopping</p></div>
<p>In a sense, the validation function now acts as a kind of a "guardian", preventing the optimization from converging towards the bottom of our main objective. The algorithm is forced to settle on a model, which is neither an optimum of <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-8a30217906290ec44159bffa9d64bc43_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#102;&#105;&#116;&#125;" title="Rendered by QuickLaTeX.com" height="16" width="22" style="vertical-align: -4px;"/> nor of <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-b885b7f2118ee8eaead948e04e2020b4_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#115;&#116;&#111;&#112;&#125;" title="Rendered by QuickLaTeX.com" height="18" width="34" style="vertical-align: -6px;"/>. Moreover, both <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-8a30217906290ec44159bffa9d64bc43_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#102;&#105;&#116;&#125;" title="Rendered by QuickLaTeX.com" height="16" width="22" style="vertical-align: -4px;"/> and <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-b885b7f2118ee8eaead948e04e2020b4_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#115;&#116;&#111;&#112;&#125;" title="Rendered by QuickLaTeX.com" height="18" width="34" style="vertical-align: -6px;"/> use <em>less</em> data than <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-e811c70c105b38127b77f33d78f08b09_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#102;&#95;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#116;&#114;&#97;&#105;&#110;&#125;" title="Rendered by QuickLaTeX.com" height="16" width="38" style="vertical-align: -4px;"/>, and are thus inherently a worse representation of the problem altogether.</p>
<p>So, by applying early stopping we effectively reduced our training set size, used an even less reliable dataset to abort training, and settled on a solution which is not an optimum of anything at all. Sounds rather stupid, doesn't it?</p>
<p>Indeed, observe the distribution of the estimates found with (blue) and without (red) early stopping in repeated experiments (each time with a new random dataset):</p>
<div id="attachment_2044" style="width: 126px" class="wp-caption aligncenter"><img class="size-full wp-image-2044" src="http://fouryears.eu/wp-content/uploads/2017/12/output_32_0.png" alt="Solutions found with and without early stopping" width="116" height="329" srcset="https://fouryears.eu/wp-content/uploads/2017/12/output_32_0.png 116w, https://fouryears.eu/wp-content/uploads/2017/12/output_32_0-106x300.png 106w" sizes="(max-width: 116px) 100vw, 116px" /><p class="wp-caption-text">Solutions found with and without early stopping</p></div>
<p>As we see, early stopping greatly increases the variance of the estimate and adds a small bias towards our optimization starting point.</p>
<p>Finally, let us see how the quality of the fit depends on the size of the validation set:</p>
<div id="attachment_2045" style="width: 511px" class="wp-caption aligncenter"><img class="size-full wp-image-2045" src="http://fouryears.eu/wp-content/uploads/2017/12/8_setsize.png" alt="Fit quality vs validation set size" width="501" height="261" srcset="https://fouryears.eu/wp-content/uploads/2017/12/8_setsize.png 501w, https://fouryears.eu/wp-content/uploads/2017/12/8_setsize-300x156.png 300w" sizes="(max-width: 501px) 100vw, 501px" /><p class="wp-caption-text">Fit quality vs validation set size</p></div>
<p>Here the <em>y</em> axis shows the squared distance of the estimated point to the true value (0,0), smaller is better (the dashed line is the expected distance of a randomly picked point from the data).  The <em>x</em> axis shows all possible sizes of the validation set. We see that using no early stopping at all (<em>x=0</em>) results in the best expected fit. If we do decide to use early stopping, then for best results we should split the data approximately equally into training and validation sets. Interestingly, there do not seem to be much difference in whether we pick 30%, 50% or 70% of data for the validation set - the validation set seems to play just as much role in the final estimate as the training data.</p>
<h3>Early Stopping with Non-convex Objectives</h3>
<p>The experiment above seems to demonstrate that early stopping should be almost certainly useless (if not harmful) for fitting simple convex models. However, it is never used with such models in practice. Instead, it is most often applied to the training of multilayer neural networks. Could it be the case that the method somehow becomes useful when the objective is highly non-convex? Let us run a small experiment, measuring the benefits of early stopping for fitting a convolutional neural-network on the MNIST dataset. For simplicity, I took the <a href="https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py" target="_blank" rel="noopener">standard example from the Keras codebase</a>, and <a href="https://gist.github.com/konstantint/6e5a8ba99da300f1a7c6984c893dec50" target="_blank" rel="noopener">modified it slightly</a>. Here is the result we get when training the the most basic model:</p>
<div id="attachment_2068" style="width: 406px" class="wp-caption aligncenter"><img class="size-full wp-image-2068" src="http://fouryears.eu/wp-content/uploads/2017/12/bare.png" alt="MNIST - Basic" width="396" height="275" srcset="https://fouryears.eu/wp-content/uploads/2017/12/bare.png 396w, https://fouryears.eu/wp-content/uploads/2017/12/bare-300x208.png 300w" sizes="(max-width: 396px) 100vw, 396px" /><p class="wp-caption-text">MNIST - Basic</p></div>
<p>The <em>y</em> axis depicts log-loss on the 10k MNIST test set, the <em>x</em> axis shows the proportion of the 60k MNIST training set set aside for early stopping. Ignoring small random measurement noise, we may observe that using early stopping with about 10% of the training data does seem to convey a benefit. Thus, contrary to our previous primitive example, when the objective is complex, early stopping does work as a regularization method. Why and how does it work here? Here's one intuition I find believable (<a href="https://www.ncbi.nlm.nih.gov/pubmed/18255701" target="_blank" rel="noopener">there</a> <a href="https://papers.nips.cc/paper/816-optimal-stopping-and-effective-machine-complexity-in-learning.pdf" target="_blank" rel="noopener">are</a> <a href="https://arxiv.org/abs/1306.3574" target="_blank" rel="noopener">alternative</a> possible explanations and measurements, none of which I find too convincing or clear, though): stopping the training early prevents the algorithm from walking too far away from the initial parameter values. This limits the overall space of models and is vaguely analogous to <em>suppressing the norm of the parameter vector. </em>In other words, early stopping resembles an ad-hoc version of <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-7fdb6278163e808f17e5366754e3b25a_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#101;&#108;&#108;&#95;&#112;" title="Rendered by QuickLaTeX.com" height="19" width="14" style="vertical-align: -6px;"/> regularization.</p>
<p>Indeed, observe how the use of early stopping affects the results of fitting the same model with a small <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-dd583ea6096b89172262dd03722f6731_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#101;&#108;&#108;&#95;&#50;" title="Rendered by QuickLaTeX.com" height="16" width="14" style="vertical-align: -3px;"/>-penalty added to the objective:</p>
<div id="attachment_2069" style="width: 406px" class="wp-caption aligncenter"><img class="size-full wp-image-2069" src="http://fouryears.eu/wp-content/uploads/2017/12/l2.png" alt="MNIST - L2" width="396" height="275" srcset="https://fouryears.eu/wp-content/uploads/2017/12/l2.png 396w, https://fouryears.eu/wp-content/uploads/2017/12/l2-300x208.png 300w" sizes="(max-width: 396px) 100vw, 396px" /><p class="wp-caption-text">MNIST - L2</p></div>
<p>All of the benefits of early stopping are gone now, and the baseline (non-early-stopped, <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-dd583ea6096b89172262dd03722f6731_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#101;&#108;&#108;&#95;&#50;" title="Rendered by QuickLaTeX.com" height="16" width="14" style="vertical-align: -3px;"/>-regularized) model is actually better overall than it was before. Let us now try an even more heavily regularized model by adding dropout (instead of the <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-dd583ea6096b89172262dd03722f6731_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#101;&#108;&#108;&#95;&#50;" title="Rendered by QuickLaTeX.com" height="16" width="14" style="vertical-align: -3px;"/> penalty), as is customary for deep neural networks. We can observe an even cleaner result:</p>
<div id="attachment_2070" style="width: 406px" class="wp-caption aligncenter"><img class="size-full wp-image-2070" src="http://fouryears.eu/wp-content/uploads/2017/12/dropout2.png" alt="MNIST - Dropout" width="396" height="275" srcset="https://fouryears.eu/wp-content/uploads/2017/12/dropout2.png 396w, https://fouryears.eu/wp-content/uploads/2017/12/dropout2-300x208.png 300w" sizes="(max-width: 396px) 100vw, 396px" /><p class="wp-caption-text">MNIST - Dropout</p></div>
<p>Early stopping is again not useful at all, and the overall model is better than all of our previous attempts.</p>
<h3>Conclusion: Do We Need Early Stopping?</h3>
<p>Given the reasoning and the anecdotal experimental evidence above, I personally tend to think that beliefs in the usefulness of early stopping (in the context of neural network training) may be well overrated. Even if it may improve generalization for some nonlinear models, you would most probably achieve the same effect more reliably using other regularization techniques, such as dropout or a simple <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-dd583ea6096b89172262dd03722f6731_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#101;&#108;&#108;&#95;&#50;" title="Rendered by QuickLaTeX.com" height="16" width="14" style="vertical-align: -3px;"/> penalty.</p>
<p>Note, though, that there is a difference between early stopping in the context of neural networks and, say, boosting models. In the latter case early stopping is actually more explicitly limiting the complexity of the final model and, I suspect, might have a much more meaningful effect. At least we can't directly carry over the experimental examples and results in this blog post to that case.</p>
<p>Also note, that no matter whether early stopping helps or harms the generalization of the trained model, it is still a useful heuristic as to <em>when</em> to stop a lengthy training process automatically if we simply need results that are good enough.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2017/12/06/the-mystery-of-early-stopping/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Sorting in Linear Time</title>
		<link>https://fouryears.eu/2017/07/25/sorting-in-linear-time/</link>
		<comments>https://fouryears.eu/2017/07/25/sorting-in-linear-time/#comments</comments>
		<pubDate>Tue, 25 Jul 2017 13:56:05 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Computer science]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Theory]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=1974</guid>
		<description><![CDATA[Every student of computer science, who has managed to keep even a tiny shred of attention at their algorithms course, should know that sorting numbers is a task that requires at least time in general. There are some special cases, such as sorting small integers, where you can use counting sort or radix sort to beat [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Every student of computer science, who has managed to keep even a tiny shred of attention at their algorithms course, should know that sorting <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#110;" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;"/> numbers is a task that requires at least <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-68b5f813b631bbb8523c7582a9be1fc0_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#79;&#109;&#101;&#103;&#97;&#40;&#110;&#32;&#92;&#108;&#111;&#103;&#32;&#110;&#41;" title="Rendered by QuickLaTeX.com" height="18" width="76" style="vertical-align: -4px;"/> time in general. There are some special cases, such as sorting small integers, where you can use <a href="https://en.wikipedia.org/wiki/Counting_sort" target="_blank" rel="noopener">counting sort</a> or <a href="https://en.wikipedia.org/wiki/Radix_sort" target="_blank" rel="noopener">radix sort</a> to beat this baseline, but as long as your numbers are hypothetically arbitrarily large, you are stuck with the <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-68b5f813b631bbb8523c7582a9be1fc0_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#79;&#109;&#101;&#103;&#97;&#40;&#110;&#32;&#92;&#108;&#111;&#103;&#32;&#110;&#41;" title="Rendered by QuickLaTeX.com" height="18" width="76" style="vertical-align: -4px;"/> lower bound. Right?</p>
<p>Well, not really. One thing that many algorithms courses tend to skim over rather briefly is the discussion of the choice of the <em><a href="https://en.wikipedia.org/wiki/Model_of_computation" target="_blank" rel="noopener">computation model</a>,</em> under which the algorithm of interest is supposed to run. In particular, the <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-68b5f813b631bbb8523c7582a9be1fc0_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#79;&#109;&#101;&#103;&#97;&#40;&#110;&#32;&#92;&#108;&#111;&#103;&#32;&#110;&#41;" title="Rendered by QuickLaTeX.com" height="18" width="76" style="vertical-align: -4px;"/> bound for sorting holds for the <em>comparison-only</em> model of computation — the abstract situation where the algorithm may only perform pairwise comparisons of the numbers to be sorted. No arithmetic, bit-shifts or anything else your typical processor is normally trained to do is allowed. This is, obviously, not a very realistic model for a modern computer.</p>
<p>Let us thus consider a different computation model instead, which allows our computer to perform any of the basic arithmetic or bitwise operations on numbers in constant time. In addition, to be especially abstract, let us also assume that our computer is capable of handling numbers of arbitrary size. This is the so-called <em>unit-cost <a href="https://en.wikipedia.org/wiki/Random-access_machine" target="_blank" rel="noopener">RAM</a> model</em>.</p>
<p>It turns out that in this case one can sort arbitrarily large numbers <em>in linear time</em>. The method for achieving this (presented in the work of <a href="http://www-wjp.cs.uni-saarland.de/publikationen/PS80.pdf" target="_blank" rel="noopener">W. Paul and J. Simon</a>, not to be confused with <a href="https://www.youtube.com/watch?v=s3Wp_C0AHKk" target="_blank" rel="noopener">Paul Simon</a>) is completely impractical, yet quite insightful and amusing (in the geeky sense). Let me illustrate it here.</p>
<h3>Paul-and-Simon Sorting</h3>
<p>The easiest way to show an algorithm is to step it through an example. Let us therefore consider the example task of sorting the following array of three numbers:</p>
<pre>a = [5, 3, 9]</pre>
<p>Representing the same numbers in binary:</p>
<pre>[101, 11, 1001]</pre>
<p>Our algorithm starts with a linear pass to find the bit-width of the largest number in the array. In our case the largest number is 9 and has 4 bits:</p>
<pre>bits = max([ceil(log2(x)) <b>for</b> x <b>in</b> a])     # bits = 4
n = <b>len</b>(a)                                 # n = 3</pre>
<p>Next the algorithm will create a <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-0d8cf5082d1bb5d5dec1570c1aa40dde_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#40;&#52;&#43;&#49;&#41;&#92;&#99;&#100;&#111;&#116;&#32;&#51;&#94;&#50;&#32;&#61;&#32;&#52;&#53;" title="Rendered by QuickLaTeX.com" height="19" width="122" style="vertical-align: -4px;"/>-bit number <code>A</code> of the following binary form:</p>
<pre> 1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9}</pre>
<p>where <code>{9}</code>, <code>{3}</code> and <code>{5}</code> denote the 4-bit representations of the corresponding numbers. In simple terms, we need to pack each array element repeated <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#110;" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;"/> times together into a single number. It can be computed in linear time using, for example, the following code:</p>
<pre>temp, A = 0, 0
<b>for</b> x <b>in</b> a:
    temp = (temp&lt;&lt;(n*(bits+1))) + (1&lt;&lt;bits) + x
<b>for</b> i <b>in</b> range(n):
    A = (A&lt;&lt;(bits+1)) + temp</pre>
<p>The result is 23834505373497, namely:</p>
<pre>1<u>0101</u>1<u>0101</u>1<u>0101</u>1<u>0011</u>1<u>0011</u>1<u>0011</u>1<u>1001</u>1<u>1001</u>1<u>1001</u></pre>
<p>Next, we need to compute another 45-bit number <code>B</code>, which will also pack all the elements of the array <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#110;" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;"/> times, however this time they will be separated by 0-bits and interleaved as follows:</p>
<pre> 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}</pre>
<p>This again can be done in linear time:</p>
<pre>temp, B = 0, 0
<b>for</b> x <b>in</b> a:
    temp = (temp&lt;&lt;(bits+1)) + x
<b>for</b> i <b>in</b> range(n):
    B = (B&lt;&lt;(n*(bits+1))) + temp</pre>
<p>The result is 5610472248425, namely:</p>
<pre>0<u>0101</u>0<u>0011</u>0<u>1001</u>0<u>0101</u>0<u>0011</u>0<u>1001</u>0<u>0101</u>0<u>0011</u>0<u>1001</u></pre>
<p>Finally, here comes the magic trick: we subtract <code>B</code> from <code>A</code>. Observe how with this single operation we now actually perform <i>all pairwise subtractions</i> of the numbers in the array:</p>
<pre>A = 1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9} 
B = 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}</pre>
<p>Consider what happens to the bits separating all the pairs. If the number on top is greater or equal to the number on the bottom of the pair, the corresponding separating bit on the left will not be carried in the subtraction, and the corresponding bit of the result will be 1. However, whenever the number on the top is less than the number on the bottom, the resulting bit will be zeroed out due to carrying:</p>
<pre>A   = 1 {5} 1 {5} 1 { 5} 1 { 3} 1 {3} 1 { 3} 1 {9} 1 {9} 1 {9} 
B   = 0 {5} 0 {3} 0 { 9} 0 { 5} 0 {3} 0 { 9} 0 {5} 0 {3} 0 {9}
A-B = 1 {0} 1 {2} 0 {12} 0 {14} 1 {0} 0 {10} 1 {4} 1 {6} 1 {0}</pre>
<p>The same in binary (highlighted groups correspond to repetitions of the original array elements in the number <code>A</code>):</p>
<pre>A   = 1 <u>0101</u> 1 <u>0101</u> 1 <u>0101</u>|1 <u>0011</u> 1 <u>0011</u> 1 <u>0011</u>|1 <u>1001</u> 1 <u>1001</u> 1 <u>1001</u>
B   = 0 <u>0101</u> 0 <u>0011</u> 0 <u>1001</u>|0 <u>0101</u> 0 <u>0011</u> 0 <u>1001</u>|0 <u>0101</u> 0 <u>0011</u> 0 <u>1001</u>
A-B = 1 <u>0000</u> 1 <u>0010</u> 0 <u>1100</u>|0 <u>1110</u> 1 <u>0000</u> 0 <u>1010</u>|1 <u>0100</u> 1 <u>0110</u> 1 <u>0000</u>
</pre>
<p>Each "separator" bit of <code>A-B</code> is effectively the result of a comparison of every array element with every other. Let us now extract these bits using a bitwise <code>AND</code> and sum them within each group. It takes another couple of linear passes:</p>
<pre>x = A-B &gt;&gt; bits
mask, result = 0, 0
<b>for</b> i <b>in</b> range(n):
    mask = (mask&lt;&lt;(n*(bits+1))) + 1
<b>for</b> i <b>in</b> range(n):
    result += x &amp; mask
    x = x &gt;&gt; (bits+1)</pre>
<p>The <code>result</code> is now the following number:</p>
<pre>result = 10|000000000000001|000000000000011</pre>
<p>It is a packed binary representation of the array <code>r = [2, 1, 3]</code>. The number 2 here tells us that there are two elements in <code>a</code>, which are less or equal than <code>a[0]=5</code>. Similarly, the number 1 says that there is only one element less or equal than <code>a[1]=3</code>, and the number 3 means there are three elements less or equal than <code>a[2]=9</code>. In other words, this is an array of <i>ranks</i>, which tells us how the original array elements should be rearranged into sorted order:</p>
<pre>r = [result &gt;&gt; (n*(bits+1)*(n-i-1)) &amp; ((1&lt;&lt;(n*(bits+1)))-1) 
                                          <b>for</b> i <b>in</b> range(n)]
a_sorted = [None]*n
<b>for</b> i <b>in</b> range(n):
    a_sorted[r[i]-1] = a[i]
</pre>
<p>And voilà, the sorted array! As presented above, the method would only work for arrays consisting of distinct non-negative integers. However, with some modifications it can be adapted to arbitrary arrays of integers or floats. This is left as an exercise to the reader.</p>
<h3>The General Implications</h3>
<p>There are several things one can learn from the "Paul-and-Simon sort". Firstly, it shows the immense power of the unit-cost RAM computational model. By packing arbitrary amounts of data into a single register of unlimited size, we may force our imaginary computer to perform enormously complex parallel computations in a single step. Indeed, it is known that <a href="https://en.wikipedia.org/wiki/PSPACE-complete" target="_blank" rel="noopener">PSPACE-complete</a> problems <a href="http://dl.acm.org/citation.cfm?doid=800076.802470" target="_blank" rel="noopener">can be solved in polynomial time</a> in the unlimited-precision RAM model. This, however, assumes that the machine can do arbitrary arithmetic operations. If you limit it to only additions, subtractions and multiplications (but not divisions or bit-shifts), you still cannot sort integers faster than <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-68b5f813b631bbb8523c7582a9be1fc0_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#79;&#109;&#101;&#103;&#97;&#40;&#110;&#32;&#92;&#108;&#111;&#103;&#32;&#110;&#41;" title="Rendered by QuickLaTeX.com" height="18" width="76" style="vertical-align: -4px;"/> even using infinitely-sized registers (this is the main result of the <a href="http://www-wjp.cs.uni-saarland.de/publikationen/PS80.pdf" target="_blank" rel="noopener">Paul and Simon's article</a> that inspired this post). Not obvious, is it?</p>
<p><a href="http://fouryears.eu/wp-content/uploads/2017/07/execution_time.png"><img class="alignright wp-image-2008" title=" " src="http://fouryears.eu/wp-content/uploads/2017/07/execution_time-300x192.png" alt=" " width="250" height="160" srcset="https://fouryears.eu/wp-content/uploads/2017/07/execution_time-300x192.png 300w, https://fouryears.eu/wp-content/uploads/2017/07/execution_time.png 476w" sizes="(max-width: 250px) 100vw, 250px" /></a></p>
<p>Of course, real computers can usually only perform constant-time operations on registers of a fixed size. This is formalized in the <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-dfee5c980777976ae8cf6541893fb572_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#119;" title="Rendered by QuickLaTeX.com" height="8" width="13" style="vertical-align: 0px;"/>-bit <a href="http://opendatastructures.org/versions/edition-0.1e/ods-cpp/1_3_Model_Computation.html" target="_blank" rel="noopener">word-RAM model</a>, and in this model the "Paul and Simon sort" degrades from a <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-31c2cdca867987ac2a6e27b1434ae748_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#79;&#40;&#110;&#41;" title="Rendered by QuickLaTeX.com" height="18" width="38" style="vertical-align: -4px;"/> into a <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-daf92780c3fa82ddabbbd7c99e62a656_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#79;&#40;&#110;&#94;&#51;&#41;" title="Rendered by QuickLaTeX.com" height="19" width="45" style="vertical-align: -4px;"/> algorithm (with <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-dbb3e59178c358dc56c0702e5601d005_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#79;&#40;&#110;&#94;&#50;&#41;" title="Rendered by QuickLaTeX.com" height="19" width="45" style="vertical-align: -4px;"/> memory consumption). This is a nice illustration of how the same algorithm can have different complexity based on the chosen execution model.</p>
<p>The third thing that the "Paul and Simon sort" highlights very clearly is the power of arithmetic operations on packed values and bitstrings. In fact, this idea has been applied to derive <a href="http://dl.acm.org/citation.cfm?id=271556" target="_blank" rel="noopener">practically usable</a> <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.136.6493&amp;rank=1" target="_blank" rel="noopener">integer sorting algorithms</a> with <a href="http://ieeexplore.ieee.org/document/1181890/?reload=true" target="_blank" rel="noopener">nearly-linear complexity</a>. The latter paper by Han &amp; Thorup expresses the idea quite well:</p>
<p><a href="http://ieeexplore.ieee.org/document/1181890/"><img class="aligncenter wp-image-2011 size-full" src="http://fouryears.eu/wp-content/uploads/2017/07/hanthorup.png" alt="Excerpt from Han &amp; Thorup, &quot;Integer Sorting in O(n sqrt(log log n)) Expected Time and Linear Space&quot;." width="408" height="771" srcset="https://fouryears.eu/wp-content/uploads/2017/07/hanthorup.png 408w, https://fouryears.eu/wp-content/uploads/2017/07/hanthorup-159x300.png 159w" sizes="(max-width: 408px) 100vw, 408px" /></a></p>
<p>In case you need the full code of the step-by-step explanation presented above, <a href="https://gist.github.com/konstantint/0c7266a41690917f16ca6fe6d4308ab2" target="_blank" rel="noopener">here it is</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2017/07/25/sorting-in-linear-time/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Blockchain Consensus Problem</title>
		<link>https://fouryears.eu/2017/07/09/the-blockchain-consensus-problem/</link>
		<comments>https://fouryears.eu/2017/07/09/the-blockchain-consensus-problem/#respond</comments>
		<pubDate>Sun, 09 Jul 2017 03:12:32 +0000</pubDate>
		<dc:creator><![CDATA[Konstantin]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Blockchain]]></category>
		<category><![CDATA[Computer science]]></category>
		<category><![CDATA[Cryptography]]></category>
		<category><![CDATA[Distributed computing]]></category>
		<category><![CDATA[Explanation]]></category>

		<guid isPermaLink="false">http://fouryears.eu/?p=1897</guid>
		<description><![CDATA[The Dark Side of the Bitcoin Recall that Bitcoin is a currency, i.e. it is a technology, which aims to provide a store of value along with a payment medium. With all due respect to its steadily growing adoption, it would be fair to note that it is not very good at fulfilling either of these two functions currently. [&#8230;]]]></description>
				<content:encoded><![CDATA[<h3>The Dark Side of the Bitcoin</h3>
<p>Recall that <a href="http://fouryears.eu/2017/03/29/blockchain-in-simple-terms/" target="_blank" rel="noopener noreferrer">Bitcoin</a> is a <a href="http://fouryears.eu/2008/09/25/the-meaning-of-money/" target="_blank" rel="noopener noreferrer"><em>currency</em></a>, i.e. it is a technology, which aims to provide a <a href="https://en.wikipedia.org/wiki/Store_of_value" target="_blank" rel="noopener noreferrer"><em>store of value</em></a> along with a <em><a href="https://en.wikipedia.org/wiki/Medium_of_exchange" target="_blank" rel="noopener noreferrer">payment medium</a></em>. With all due respect to its steadily growing adoption, it would be fair to note that it is not very good at fulfilling either of these two functions currently. Firstly, it is not a very reliable store of value due to extreme volatility in the price. Secondly, and most importantly, it is a mediocre payment medium because it is slow and expensive.</p>
<p>A typical transfer costs <a href="https://bitcoinfees.21.co/" target="_blank" rel="noopener noreferrer">around &#36;2</a> nowadays and takes about an hour for a full confirmation (or longer, if you pay a smaller fee). When you need to transfer a million dollars, this looks like a reasonable deal. When you buy a chocolate bar at a grocery store (something one probably does more often than transferring a million), it is unacceptable. Any plain old bank's payment card would offer a faster and cheaper solution, which is ironic, given that Bitcoin was meant to be all friendly, distributed and free (as in freedom) while banks are, as we all know, evil empires hungry for our money, flesh and souls.</p>
<p>The irony does not end here. The evil banks typically provide some useful services in exchange for the fees they collect, such as an online self-service portal, 24h support personnel, cash handling and ATMs, some security guarantees, interests on deposits, etc. The friendly Bitcoin offers nothing of this kind. What is Bitcoin wasting our money on then? Electricity, mainly! The <a href="http://fouryears.eu/2017/03/29/blockchain-in-simple-terms/" target="_blank" rel="noopener noreferrer">Proof of Work</a> (PoW) algorithm employed in the Bitcoin's blockchain requires the computation of <em>quintillions</em> of random, meaningless hashes to "confirm" payments. The "miner" nodes, running the Bitcoin's network are collectively performing more than 5 000 000 000 000 000 000 (five quintillion or five <em>exa-</em>) hash computations every second, continuously consuming as much electricity as the whole country of <a href="https://hothardware.com/news/ethereum-and-bitcoin-energy-consumption-surpass-entire-countries-power-budgets" target="_blank" rel="noopener noreferrer">Turkmenistan</a>. The situation is even worse if you consider that Bitcoin is just one of many other "coins" built upon the PoW algorithm (Ethereum and Litecoin being the two other prominent examples), and their overall power consumption is only growing with each day.</p>
<p>Just think of it: most of the &#36;2 fee a Bitcoin user needs to pay for a transaction will neither end up as someone's wage nor make a return on investment in someone's pocket. Instead, it will burn up in fossil fuels which generate power for the "miners", wasting precious resources of our planet, contributing to global warming and pushing poor polar bears faster towards extinction. Is all this mayhem at least a "necessary evil"? Sadly, it is not.</p>
<h3>The Unnecessary Evil</h3>
<p>Formally speaking, Proof of Work is an algorithm for achieving <a href="https://en.wikipedia.org/wiki/Consensus_(computer_science)" target="_blank" rel="noopener noreferrer">consensus</a> among a distributed set of nodes which collectively maintain a common blockchain. Is it the only such algorithm? Of course not! Many alternative methods exist, most of them (if not all) are both faster and less energy-hungry. In fact, the only valuable property of PoW is its ingenious simplicity<em>. </em>In terms of implementation it may very well be among the simplest distributed blockchain consensus algorithms ever to be invented.</p>
<p>It is natural that a successful pioneering technology (such as the Bitcoin) is originally built from simple blocks. Progress comes in small steps and you cannot innovate on all fronts at once, after all. There must come a time, however, when the limitations of the initially chosen basic blocks become apparent and the technology gets upgraded to something more efficient. With more than &#36;1 <em>billion</em> dollars in electricity bills paid by Bitcoin users last year for the inefficiency of PoW, Bitcoin has long surpassed this turning point, in my opinion.</p>
<p>Unfortunately, due to its pioneering status, enormous inertia, ongoing hype and the high stakes involved, Bitcoin continues to roll on its old wooden proof-of-work wheels with no improvement in sight, somewhy still being perceived as the leader in the brave new world of cryptocurrencies.</p>
<p>Are nearly-instant and nearly-free payment along with energy efficiency too much to ask from a real "currency of the future"? I do not think so. In fact, Bitcoin could be such a currency, if only it could switch from the evil Proof of Work to a different, fast and eco-friendly consensus algorithm.</p>
<p>Which algorithm could it be? Let me offer you an overview of some of the current options I am personally aware of, so you could decide for yourself.</p>
<h3>The Eco-Friendly Blockchain Consensus</h3>
<p>Consider a network of many nodes, which needs to maintain a common state for a chain of blocks. There seem to be roughly three general categories of algorithms which the nodes could employ for their purpose: <em>Proof of Authority (PoA)</em>, <em>Nakamoto Consensus</em>, and <em>Byzantine Fault Tolerance (BFT)</em>. Let us consider them in order.</p>
<h4>Proof of Authority</h4>
<p>Perhaps the most straightforward solution would be to nominate a fixed subset of nodes as "authoritative", and let any of them append new blocks by signing them cryptographically. To avoid conflicting updates, nodes may agree on a predefined round-robin signing order, honestly randomize their waiting intervals, or use some kind of a deterministic lottery for selecting the signer for next block, etc.</p>
<p>As this approach relies on a fixed subset of (reasonably) trusted nodes, it does not look robust and secure enough for a proper worldwide distributed blockchain. For example, in the limit case of a single trusted party it is equivalent to using a single service provider such as a bank. None the less, it is a convenient baseline and an important primitive, actually applicable to a wide range of real-life blockchain deployments. By relying on a set of well-behaving parties, a PoA blockchain actually sidesteps most of the complexities of a real distributed algorithm, and can thus be made to perform much faster than any of the "truly distributed" algorithms.</p>
<p>The Ethereum software provides <a href="https://github.com/ethereum/EIPs/issues/225" target="_blank" rel="noopener noreferrer">an implementation of this approach</a> for those who want to run private chains. <a href="https://en.wikipedia.org/wiki/Peercoin" target="_blank" rel="noopener noreferrer">PeerCoin</a> relies on the PoA principle by having "checkpoint blocks" signed regularly by a trusted authority. Finally, the <em>Delegated Proof of Stake</em> algorithm makes PoA work on a larger scale by relying on voting. It is probably one of the most interesting practical implementations of the idea.</p>
<p><b>Delegated Proof of Stake</b></p>
<p><a href="https://bitshares.org/technology/delegated-proof-of-stake-consensus/" target="_blank" rel="noopener noreferrer">Delegated Proof of Stake</a> (DPoS) is a consensus algorithm implemented in <a href="http://docs.bitshares.org/" target="_blank" rel="noopener noreferrer">Graphene</a>-based blockchains (<a href="https://bitshares.org/" target="_blank" rel="noopener noreferrer">BitShares</a>, <a href="https://steem.io/" target="_blank" rel="noopener noreferrer">Steem</a>, <a href="https://eos.io/" target="_blank" rel="noopener noreferrer">EOS</a>). It is a variant of Proof of Authority, where the small set of authoritative <em>delegate</em> nodes is elected by voting. When electing the delegates, each node can cast the number of votes, proportional to their account value (or "stakeholder share"), thus "delegating their stake in the network". The elected authorities then participate in a simple and fast round-robin block confirmation with each node given a two second window for confirming the next block.</p>
<p>The security of DPoS hinges on the assumption that the nodes with the most <em>stake</em> in the system should generally manage to elect a set of reasonable authorities, and in case of errors, the misbehaving authorities will not cause too much trouble and will be quickly voted out. At the same time, being internally a PoA implementation, the DPoS-based blockchains are by an order of magnitude faster in terms of transaction throughput than any other currently running public blockchains. Notably, they can also naturally support <a href="https://steem.io/SteemWhitePaper.pdf" target="_blank" rel="noopener noreferrer">fee-less transactions</a>.</p>
<h4>Nakamoto Consensus</h4>
<p>Consider the variation of PoA, where there are no pre-selected trusted nodes (i.e. all nodes may participate in the algorithm). Each time a new block needs to be added to the chain, let us pick the node who will gain the right to add it according to some deterministic "lottery" system. The consensus can then be achieved by simply verifying that the resulting blockchain is conforming to the lottery rules at all times, and the conflicting chains are resolved by always preferring the "harder" chain (according to some notion of "hardness").</p>
<p>For example, the infamous Proof-of-Work is an example of such a method. The "lottery" here is based on the ability of a node to find a suitable nonce value. The "hardness" is simply the length of the chain. Such "lottery" methods are sometimes referred to as "Nakamoto consensus algorithms". In terms of efficiency, Nakamoto consensus algorithms are among the slowest consensus algorithms.</p>
<p>Several alternatives to the "PoW lottery" have been proposed. Let us review some of them.</p>
<p><b>Proof of Stake</b></p>
<p><a href="https://github.com/ethereum/wiki/wiki/Proof-of-Stake-FAQ" target="_blank" rel="noopener noreferrer">Proof of Stake</a> (PoS), first implemented in the <a href="https://bravenewcoin.com/assets/Whitepapers/NxtWhitepaper-v122-rev4.pdf" target="_blank" rel="noopener noreferrer">Nxt cryptocurrency</a>, is a Nakamoto consensus technique, where the nodes with a greater balance on their account are given a higher chance to "win the lottery" and sign the next block. The actual technique used in Nxt is the following: before signing a block every node obtains a pseudo-random "lottery ticket number" <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#120;" title="Rendered by QuickLaTeX.com" height="8" width="10" style="vertical-align: 0px;"/> by hashing the last block data with its own identifier. If this number is smaller than</p>
<p class="ql-center-displayed-equation" style="line-height: 18px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-1b685682ff93b14ab8bb39843ed2769f_l3.png" height="18" width="351" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#91;&#92;&#97;&#108;&#112;&#104;&#97;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#116;&#101;&#120;&#116;&#123;&#40;&#97;&#99;&#99;&#111;&#117;&#110;&#116;&#32;&#98;&#97;&#108;&#97;&#110;&#99;&#101;&#41;&#125;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#116;&#101;&#120;&#116;&#123;&#40;&#116;&#105;&#109;&#101;&#32;&#115;&#105;&#110;&#99;&#101;&#32;&#108;&#97;&#115;&#116;&#32;&#98;&#108;&#111;&#99;&#107;&#41;&#125;&#44;&#92;&#93;" title="Rendered by QuickLaTeX.com"/></p>
<p>(where <img src="https://fouryears.eu/wp-content/ql-cache/quicklatex.com-8f0b6b1a01f8fcc2f95be0364c090397_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#92;&#97;&#108;&#112;&#104;&#97;" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;"/> is a block-specific constant), the node gets the right to sign the next block. The higher the node's balance, the higher is the probability it will get a chance to sign. The rationale is that nodes with larger balances have more at stake, are more motivated to behave honestly, and thus need to be given more opportunities to participate in generating the blockchain.</p>
<p>Proof of Stake is typically considered as the primary alternative to Proof of Work without all the wasteful computation, and it should, in principle, be possible to transition the whole blockchain from the latter to the former. In fact, this is what may probably <a href="http://www.coindesk.com/ethereums-big-switch-the-new-roadmap-to-proof-of-stake/" target="_blank" rel="noopener noreferrer">happen to Ethereum</a> eventually.</p>
<p><b>Proof of Space</b></p>
<p>In <a href="https://eprint.iacr.org/2015/528.pdf">Proof of Space</a> (PoSpace), a consensus mechanism implemented in <a href="http://burstcoin.biz/" target="_blank" rel="noopener noreferrer">Burstcoin</a>, the "miners" must first pre-generate a set of "lottery ticket numbers" in a particular manner for themselves, save these numbers on a hard drive and commit the hash (the Merkle tree root) of this complete ticket set to the blockchain. Then, similarly to Proof of Stake, by hashing the last block's data, a miner deterministically picks one of his own "lottery tickets" for the next block. If the value of this ticket, discounted by the number of tickets in possession, is small enough, the miner gets the right to sign the block. The more tickets a miner generates and stores, the better are his chances. When signing the block, the miner must present a couple of special hashes which he can only know if he constantly stores his complete set of tickets (or fully recomputes a large part of it every time, which is impractical). Consequently, instead of spending energy on the "mining" process, the nodes must constantly dedicate a certain amount of disk space to the algorithm.</p>
<p>Although it is probably among the less widely known methods, from both technical and practical standpoint, it is one of the most interesting techniques, in my opinion. Note how it combines the properties of PoS (speed and energy efficiency) with those of PoW (ownership of a real-world resource as a proxy for decentralization).</p>
<p><b>Proof of Burn</b></p>
<p>The idea behind <a href="http://www.slimcoin.club/whitepaper.pdf" target="_blank" rel="noopener noreferrer">Proof of Burn</a> is to allow the nodes to generate their "lottery ticket numbers" by irretrievably transferring some coins to a nonexistent address and taking the hash of the resulting transaction. The resulting hash, scaled by the amount of coins burned, can then be used to gain the right to sign blocks just like in other Nakamoto lottery systems. The act of wasting coins is meant to be a virtual analogue of spending electricity on PoW mining, without actually spending it. Blockchains based purely on Proof of Burn do not seem to exist at the moment. However, the technique can  be used alongside PoW, PoS or other approaches.</p>
<p><b>Proof of Elapsed Time</b></p>
<p>Presumably, some Intel processors have specialized instructions for emitting signed tokens, which prove that a given process called a particular function a certain period of time ago. The <a href="https://intelledger.github.io/introduction.html" target="_blank" rel="noopener noreferrer">Hyperledger project</a> proposes to build a consensus algorithm around those. Each "miner" will gain the right to sign a block after it waits for a certain period of time. The token which proves that the miner did in fact wait the allotted time, would act as a winning lottery ticket. I do not see how this method could work outside of the trusted Intel-only environment or how is it better than a trivialized Proof of Stake (not sure I even understood the idea correcty), but I could not help mentioning it here for completeness' sake.</p>
<p><b>Hybrid Nakamoto Consensus Systems</b></p>
<p>Some systems interleave PoW and PoS confirmations, or add PoA signatures from time to time to lock the chain or speed-up block confirmations. In fact, it is not too hard to invent nearly arbitrary combinations of delegation, voting, payments, authorities and lotteries.</p>
<h4>Byzantine Fault Tolerance</h4>
<p>The <a href="http://pmg.csail.mit.edu/papers/osdi99.pdf" target="_blank" rel="noopener noreferrer">Practical Byzantine Fault Tolerance</a> (PBFT) algorithm offers an alternative solution to the consensus problem. Here the blockchain state is tracked by a set of "bookkeeping" nodes, which constantly broadcast all changes among themselves and consider a change reliably replicated when it is signed and confirmed by given quorum (e.g. 2/3) of the bookkeepers. The algorithms of this type can be shown to be reliable if no more than a third of the nodes are dishonest. The <a href="http://www.the-blockchain.com/docs/Ripple%20Consensus%20Whitepaper.pdf" target="_blank" rel="noopener noreferrer">Ripple</a>, <a href="https://www.stellar.org/papers/stellar-consensus-protocol.pdf" target="_blank" rel="noopener noreferrer">Stellar</a> and <a href="https://www.antshares.org/Files/A8A0E2.pdf" target="_blank" rel="noopener noreferrer">Antshares</a> are examples of blockchains based on such techniques. This algorithm allows much higher transaction throughputs than Nakamoto consensus (PoW, PoS, PoSpace), yet it still lags behind the speed of PoA or DPoS.</p>
]]></content:encoded>
			<wfw:commentRss>https://fouryears.eu/2017/07/09/the-blockchain-consensus-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
