<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>(into blog (filter tech? thoughts))</title><link href="http://blog.mishkovskyi.net/" rel="alternate"></link><link href="http://blog.mishkovskyi.net/feeds/all.atom.xml" rel="self"></link><id>http://blog.mishkovskyi.net/</id><updated>2015-10-29T17:02:00+01:00</updated><entry><title>Clojure, numbers, despair</title><link href="http://blog.mishkovskyi.net/posts/2015/Oct/29/clojure-numbers-despair" rel="alternate"></link><updated>2015-10-29T17:02:00+01:00</updated><author><name>mishok13</name></author><id>tag:blog.mishkovskyi.net,2015-10-29:posts/2015/Oct/29/clojure-numbers-despair</id><summary type="html">&lt;p&gt;&lt;em&gt;Warning: this is a very angry post, but most points in here are
valid despite the tone.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Once upon a time, a high level language was developed. It's beginnings
were humble and the developers focused on things that
mattered. Numbers were not things that mattered. Numbers were used,
but how they were used mattered very little.&lt;/p&gt;
&lt;h1&gt;So...&lt;/h1&gt;
&lt;p&gt;The language is Clojure. And numbers in Clojure are this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;;; Auto-promotion is cool
user&amp;gt; (type (inc Integer/MAX_VALUE))
java.lang.Long
;; Except it doesn&amp;#39;t always work!
user&amp;gt; (type (inc Long/MAX_VALUE))
ArithmeticException integer overflow  clojure.lang.Numbers.throwIntOverflow (Numbers.java:1501)
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;or this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="nx"&gt;Because&lt;/span&gt; &lt;span class="nx"&gt;three&lt;/span&gt; &lt;span class="nx"&gt;ways&lt;/span&gt; &lt;span class="nx"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;parsing&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="nx"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="nx"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;Good&lt;/span&gt; &lt;span class="nx"&gt;Thing&lt;/span&gt;&lt;span class="err"&gt;™&lt;/span&gt;
&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Double&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;parseDouble&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;1.2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Double&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;valueOf&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;1.2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;read&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;1.2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="nx"&gt;Because&lt;/span&gt; &lt;span class="nx"&gt;having&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;different&lt;/span&gt; &lt;span class="nx"&gt;types&lt;/span&gt; &lt;span class="nx"&gt;based&lt;/span&gt; &lt;span class="nx"&gt;on&lt;/span&gt; &lt;span class="nx"&gt;parameters&lt;/span&gt; &lt;span class="nx"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;an&lt;/span&gt; &lt;span class="nx"&gt;Even&lt;/span&gt; &lt;span class="nx"&gt;Better&lt;/span&gt; &lt;span class="nx"&gt;Thing&lt;/span&gt;&lt;span class="err"&gt;™&lt;/span&gt;
&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/ 3 2)) (type (/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Do you think that girl was pretty?&lt;/h1&gt;
&lt;p&gt;There's no way to put it lightly: I hate Clojure number types. Java
keeps leaking into it and no-one cares. To add the insult to the
injury, on top of what you have in JVM, Clojure adds two more ways of
representing numbers and then builds a huge pile of logic on top of
that. Let's quickly cover what types one may find in a typical Clojure
application:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;clojure.lang.BigInt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;clojure.lang.Ratio&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.lang.Number&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.lang.Integer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.lang.Long&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.math.BigInteger&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.math.BigDecimal&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.lang.Float&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;java.lang.Double&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not surprisingly, most of these are just Java types. However, two more
types are added:
&lt;a href="https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/BigInt.java"&gt;&lt;code&gt;BigInt&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Ratio.java"&gt;&lt;code&gt;Ratio&lt;/code&gt;&lt;/a&gt;. Both
are weird. I'd like to focus a bit on &lt;code&gt;Ratio&lt;/code&gt;. &lt;code&gt;Ratio&lt;/code&gt; can be created
by integer division, but only in case the division can not produce an
integer:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;;; Aight
user&amp;gt; (type (/ 1 2))
clojure.lang.Ratio
;; Not really expecting this
user&amp;gt; (type (/ 1 1))
java.lang.Long
;; Yeah, well, WAIT WHAT
user&amp;gt; (type (/ 1N 1M))
java.math.BigDecimal
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We can also just call the &lt;code&gt;Ratio&lt;/code&gt; constructor (and fail miserably in
some cases):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;;; Cool
user&amp;gt; (type 1/2)
clojure.lang.Ratio
;; Eh?
user&amp;gt; (clojure.lang.Ratio. 1 1)
ClassCastException java.lang.Long cannot be cast to java.math.BigInteger  user/eval21314 (form-init5235971328632709373.clj:1)
;; Ah!
user&amp;gt; (clojure.lang.Ratio. (biginteger 1) (biginteger 1))
1/1
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The proper way is to &lt;em&gt;coerce the parameters to
java.math.BigInteger&lt;/em&gt;. Why? Historical reasons: &lt;code&gt;clojure.lang.Ratio&lt;/code&gt;
only accepts &lt;code&gt;java.math.BigInteger&lt;/code&gt; because back when it was written
Clojure didn't have &lt;code&gt;clojure.lang.BigInt&lt;/code&gt; type &lt;em&gt;and&lt;/em&gt; no-one touched
the code since quite literally&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;
&lt;a href="https://github.com/clojure/clojure/commits/master/src/jvm/clojure/lang/Ratio.java"&gt;forever&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The fun train doesn't stop here. For example, we may want to create a
ratio with a denominator of 0. Let's try the usual way:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;;; Good
user&amp;gt; 1/0
ArithmeticException Divide by zero  clojure.lang.Numbers.divide (Numbers.java:158)
;; Consistent!
user&amp;gt; (/ 1 0)
ArithmeticException Divide by zero  clojure.lang.Numbers.divide (Numbers.java:158)
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Bummer. But then again it might make sense, after all a &lt;code&gt;Ratio&lt;/code&gt; with a
denominator value &lt;code&gt;0&lt;/code&gt; may result in some weird math occurring. But we
haven't tried &lt;em&gt;all&lt;/em&gt; the available constructors yet, so let's do that:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;;; I hate this :/
user&amp;gt; (clojure.lang.Ratio. (biginteger 1) (biginteger 0))
1/0
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;WAIT WHAT.&lt;/p&gt;
&lt;p&gt;Combining &lt;code&gt;java.math.BigInteger&lt;/code&gt; with &lt;code&gt;clojure.lang.Ratio&lt;/code&gt; is even
more fun, especially when it comes to corner cases:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;;; Alright makes sense
user&amp;gt; (.denominator (* 7919/7920 (/ 1 Long/MAX_VALUE)))
73049106531889824391440
user&amp;gt; (class (.denominator (* 7919/7920 (/ 1 Long/MAX_VALUE))))
java.math.BigInteger
;; WAIT BUT WHY
user&amp;gt; (/ 7919 (* 7919/7920 (/ 1 Long/MAX_VALUE)))
73049106531889824391440N
user&amp;gt; (class (/ 7919 (* 7919/7920 (/ 1 Long/MAX_VALUE))))
clojure.lang.BigInt
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The result type differs while logically you performed the &lt;em&gt;exact same
computation&lt;/em&gt;. And don't forget that those types are not always
cooperating nicely, so you introduce more corner cases. Oh boy!&lt;/p&gt;
&lt;h1&gt;Who wears Cheetah?&lt;/h1&gt;
&lt;p&gt;Leaking abstractions is not cool. Clojure tries to present leaking
abstractions as a feature. This is doubly not cool.&lt;/p&gt;
&lt;p&gt;Number type promotion is not cool if there's no clear way to demote
type. It's doubly not cool in Clojure, because there's no clear
documentation on how and when promotion works. Existing documentation
is
&lt;a href="http://clojure.org/data_structures#Data%20Structures-Numbers"&gt;lacking at best&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Consistency is great. Clojure is not great at consistency though and
sometimes it feels like the "the principle of least astonishment" is
being pro-actively broken by Clojure's design in the numbers domain.&lt;/p&gt;
&lt;p&gt;Here's an incomplete and perhaps redundant list of things that I find
annoying, surprising or outright stupid in Clojure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Arithmetic overflows everywhere! Multiplying &lt;code&gt;java.lang.Integer&lt;/code&gt;
  will never cause overflow, however &lt;code&gt;java.lang.Long&lt;/code&gt; will fail to be
  autopromoted. To be fair, this behavior is right there in the
  docstring for &lt;code&gt;*&lt;/code&gt; but then again, who reads docstsring for
  multiplication? There's also &lt;code&gt;*'&lt;/code&gt;, &lt;code&gt;+'&lt;/code&gt; and &lt;code&gt;-'&lt;/code&gt;, all of which
  auto-promote the result, but what are the chances you ever even knew
  about them?&lt;/li&gt;
&lt;li&gt;&lt;code&gt;clojure.lang.Ratio&lt;/code&gt; uses &lt;code&gt;java.math.BigInteger&lt;/code&gt; and &lt;em&gt;not&lt;/em&gt; &lt;code&gt;clojure.lang.BigInt&lt;/code&gt;
  for numerator and denominator. Why? Because when &lt;code&gt;Ratio&lt;/code&gt; was created
  (back in 2010) &lt;code&gt;clojure.lang.BigInt&lt;/code&gt; simply didn't exist and when it
  was finally created, &lt;code&gt;Ratio&lt;/code&gt; was not updated to represent the
  change. Bonus points for figuring out why &lt;code&gt;clojure.lang.BigInt&lt;/code&gt; was
  created in the first place.&lt;/li&gt;
&lt;li&gt;Floats and doubles are... Well, the same floats and doubles as in
  Java. There's no attempt to hide them away. So, things like infinity
  and NaN are there, but they're not really supported by Clojure. How
  does one check if the number is NaN or Infinity in Clojure? You use
  &lt;code&gt;java.lang.Float&lt;/code&gt; or &lt;code&gt;java.lang.Double&lt;/code&gt; classes for that,
  specifically static methods such as &lt;code&gt;isNaN&lt;/code&gt;, &lt;code&gt;isFinite&lt;/code&gt;, etc. Hardly
  a portable solution.&lt;/li&gt;
&lt;li&gt;Documentation is bad. Like, terribad. We're talking about a language
  with 8 years of development history, with strong backing from
  commercial companies, with successful commercial and open-source
  products written in the languge and yet we see very little focus on
  documenting things, even essential things, numbers being one of
  them.&lt;/li&gt;
&lt;li&gt;Unsigned math is not supported. There's nothing in Java, thus
  there's nothing in Clojure. Make what you want out of it.&lt;/li&gt;
&lt;li&gt;Bit operations do not belong in core namespace. It's clutter, most
  programs don't need them. More than that they're simply broken. More
  on that in a few bits.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;*unchecked-math*&lt;/code&gt; is one big can of worms and can quite literally
  screw up your library performance or even behavior when someone
  using your library sets said dynamic var.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, bit operations. Clojure really lets you down here and you as a
programmer would have to extremely careful to avoid the common
pitfalls. Most recovering C and C++ addicts would say that bit shift
to the left by one bit is equal to multiplying by 2. Clojure says
NO. Unless you multiply by a different kind of two:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;user&amp;gt; (bit-shift-left Long/MAX_VALUE 1)
-2
user&amp;gt; (* 2 Long/MAX_VALUE)
ArithmeticException integer overflow  clojure.lang.Numbers.throwIntOverflow (Numbers.java:1501)
user&amp;gt; (* 2N Long/MAX_VALUE)
18446744073709551614N
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The first behavior is a result of lacking proper unsigned, modular
number type. The exception in the second is the result of "protecting"
the users from overflowing, instead of promoting the type (as
expected). And then the third one does the right thing. Or maybe a
wrong thing, but in any case you would expect all 3 functions to do
&lt;em&gt;the same thing&lt;/em&gt;. What's worse is that there are plenty similar
examples. Predictability is important, people!&lt;/p&gt;
&lt;h1&gt;I wanna look tan&lt;/h1&gt;
&lt;p&gt;Even though no-one asked me, I'll try to imagine a better world of
Clojure math. First off, the number types. There should be only two
ways to represent numbers in Clojure: integers and reals. Integers
should be signed and unbounded. Integer division always produces reals
WITHOUT EXCEPTIONS. Integers can be promoted to reals, but reals can
never be demoted to integers. Reals can follow the same approach as
&lt;a href="http://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html"&gt;&lt;code&gt;java.lang.BigDecimal&lt;/code&gt;&lt;/a&gt;,
Python's &lt;a href="https://docs.python.org/2/library/decimal.html"&gt;&lt;code&gt;decimal&lt;/code&gt;&lt;/a&gt;
module or &lt;a href="http://www.mpfr.org/mpfr-current/mpfr.html"&gt;&lt;code&gt;MPFR&lt;/code&gt;&lt;/a&gt;. Now,
obviously you'll immediately find a problem with this approach, namely
that you need a proper context for all decimal operations. I say,
default to large, and I mean LARGE precision. As in, precision that
doesn't even make sense anymore, like 2^20. Let people control the
precision through a context. Leave only basic math operations in core
namespace: addition, subtraction, multiplication and division. Define
those operations clearly, make sure that division &lt;em&gt;always&lt;/em&gt; produces
reals and truncate where need be.&lt;/p&gt;
&lt;p&gt;Then, introduces &lt;code&gt;math&lt;/code&gt; namespace. Put modular math operations in
&lt;code&gt;math.modular&lt;/code&gt;. &lt;code&gt;math.binary&lt;/code&gt; for binary math, bit
shifting. &lt;code&gt;math.real&lt;/code&gt; containing functions and macroses helping with
handling real context, rounding, etc. &lt;code&gt;math.ratio&lt;/code&gt; for, well
Ratio. &lt;code&gt;math.float&lt;/code&gt; for IEEE 754-2008 floating point
numbers. &lt;code&gt;math.platform.jvm&lt;/code&gt; and &lt;code&gt;math.platform.js&lt;/code&gt; for exposing
platform-specific numbers.&lt;/p&gt;
&lt;p&gt;But most importantly, write documentation. Everything has to be
documented extensively and clearly, without exceptions. Great code and
great design is only half the battle, clear documentation is the
other.&lt;/p&gt;
&lt;p&gt;As far as negative impact of said change, I can only think of
performance. But only a small minority of Clojure users type-hints
everything or uses Zachary Tellman's
&lt;a href="https://github.com/ztellman/primitive-math"&gt;&lt;code&gt;primitive-math&lt;/code&gt;&lt;/a&gt;. Everyone
else? They get to enjoy the math setup that has very questionable
decision baked into it without worrying much about the performance.&lt;/p&gt;
&lt;h1&gt;Let me take a selfie&lt;/h1&gt;
&lt;p&gt;I tend to complain. A lot. The math in Clojure is just one of my
complaint targets. However, it's a valid target. The math is neglected
in Clojure, I see no attention being paid to it by core developers,
there's no organized effort to make it better, there has been zero
calls to community to ask for improvement ideas. And what's worse is
that &lt;em&gt;this&lt;/em&gt; math is completely ingrained into Clojure core namespace
and you can't replace it easily. There's no way to fix the numbers in
Clojure from the outside.&lt;/p&gt;
&lt;p&gt;I could take on this, spend plenty of time writing the proposal
pushing it to Clojure core, writing the code afterwards, push for
solution, but what are the chances that it's ever going to get
accepted? The upfront cost of this work is tremendous and there's very
little chance that such work would ever end up in Clojure core.&lt;/p&gt;
&lt;div class="footnote"&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;Where "literally forever" is used in terms of Internet age.&amp;#160;&lt;a class="footnote-backref" href="#fnref:1" rev="footnote" title="Jump back to footnote 1 in the text"&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</summary><category term="clojure"></category></entry><entry><title>I've befriended `use-package` and `init.el`</title><link href="http://blog.mishkovskyi.net/posts/2015/Sep/28/ive-befriended-use-package-and-initel" rel="alternate"></link><updated>2015-09-28T12:00:00+02:00</updated><author><name>mishok13</name></author><id>tag:blog.mishkovskyi.net,2015-09-28:posts/2015/Sep/28/ive-befriended-use-package-and-initel</id><summary type="html">&lt;p&gt;I could never relate to the concept of Emacs bankruptcy. It's an
unnecessary waste of time and resources; not only you throw out years
of carefully written code, but you also have to go through everything
and write it from scratch. I've tried it a few times, but those few
attempts ended with me arriving at the state I was previously at.&lt;/p&gt;
&lt;p&gt;The problem with Emacs bankruptcy is that it's a concept that comes
from deep past, when every Emacs configuration was a collection of
random bits of ELisp code, combined together without any
organizational effort. Since there was no notion of packages or
libraries besides the built-in ones Emacs user would often resort to
collecting snippets of ELisp code in their &lt;code&gt;.emacs&lt;/code&gt; file without
trying to organize them in any way. I'd call it
&lt;a href="https://www.youtube.com/watch?v=90Ps2L46dUs"&gt;James Wood's method&lt;/a&gt; of
Emacs configuration. The end result of this process was always the
same -- unmanagebale multi-thousand lines long configuration with
varying level of code quality and no consistency whatsoever.&lt;/p&gt;
&lt;p&gt;Nowadays you have the tools to organize your configuration without
paying the ultimate price of clutter. &lt;code&gt;package.el&lt;/code&gt; has introduced
proper package management to Emacs and &lt;code&gt;el-get&lt;/code&gt; has shown that package
management doesn't have to be complicated. I've used a combination of
the latter and some hand crafted stuff for the past few years, then
switched to
&lt;a href="http://www.emacswiki.org/emacs/DotEmacsModular"&gt;"modular setup"&lt;/a&gt; with
&lt;code&gt;package.el&lt;/code&gt; for package management, however I've recently switched to
&lt;code&gt;use-package&lt;/code&gt; and that significantly simplified my configuration.&lt;/p&gt;
&lt;p&gt;Previous versions of my configuration have gone through several waves
of major improvements, but I was never quite satisfied with it. This
is where &lt;code&gt;use-package&lt;/code&gt; comes into play. I've managed to untangle some
mess into pretty manageable code. I still don't understand how the
underlying mechanisms work, what lazy loading does and how in hell
does &lt;code&gt;:init&lt;/code&gt; and &lt;code&gt;:config&lt;/code&gt; differ between each other, but I can
navigate my config in a much simpler way. Obviously, the current setup
can be improved, since it mostly consists of modules which don't have
to be modules. In fact, all of my configuration can now neetly fit
into single &lt;code&gt;init.el&lt;/code&gt; and it will hardly take more than a few hundred
lines of code. This is a dramatic improvement over what I had before
and enables very quick refactoring and gives a much cleaner overview
over setup of each bit of ELisp I have installed.&lt;/p&gt;
&lt;p&gt;As a bonus, &lt;code&gt;use-package&lt;/code&gt; can be used to decrease the load time of
Emacs. This doesn't really concern me, since most of the time I run
only one copy of Emacs for prolonged periods of time. Case in point,
&lt;code&gt;M-x emacs-uptime&lt;/code&gt; shows &lt;code&gt;4 days, 18 hours, 30 minutes, 31 seconds&lt;/code&gt;
and that's simply because I've rebooted my laptop recently.&lt;/p&gt;
&lt;p&gt;Overall I'm really happy with &lt;code&gt;use-package&lt;/code&gt;, however things could
still be improved. For example, as a Helm user I'm often finding
myself setting up various bits of Helm functionality
&lt;a href="https://github.com/mishok13/emacsen/blob/f5c01905290ff9465f61f6e7449200f614704cf5/lib/mishok-editing.el#L64"&gt;all&lt;/a&gt;
&lt;a href="https://github.com/mishok13/emacsen/blob/f5c01905290ff9465f61f6e7449200f614704cf5/lib/mishok-editing.el#L96"&gt;over&lt;/a&gt;
&lt;a href="https://github.com/mishok13/emacsen/blob/f5c01905290ff9465f61f6e7449200f614704cf5/lib/mishok-commands.el#L6"&gt;the place&lt;/a&gt;. Hardly
an ideal situation but it seems that most &lt;code&gt;use-package&lt;/code&gt; users deal
with it by simply ignoring the issue. I'm not a big fan of ignoring
issues like these, but I guess there's no way to avoid this. :)&lt;/p&gt;</summary><category term="emacs"></category></entry><entry><title>Implementing BK-tree in Clojure</title><link href="http://blog.mishkovskyi.net/posts/2015/Jul/31/implementing-bk-tree-in-clojure" rel="alternate"></link><updated>2015-07-31T17:00:00+02:00</updated><author><name>mishok13</name></author><id>tag:blog.mishkovskyi.net,2015-07-31:posts/2015/Jul/31/implementing-bk-tree-in-clojure</id><summary type="html">&lt;p&gt;As part of returning back to blogging, or at least attempting to, I've
decided to implement a data structure that can be quite useful but yet
can be considered somewhat obscure: BK-tree. Named so after its
inventors, WA Burkhard and RM Keller it can provide significant
performance boost to applications in need of fuzzy string search.&lt;/p&gt;
&lt;h1&gt;What is BK-tree?&lt;/h1&gt;
&lt;p&gt;An underlying of BK-tree is relatively simple, however just by
skimming the depths of the Internet I've seen a lot of people
struggling with understanding it.&lt;/p&gt;
&lt;p&gt;Most common use case for BK-tree is spell-checking applications,
however most spell-checkers out there use simpler and, to
be frank, often more efficient approaches. Nevertheless, we'd&lt;/p&gt;
&lt;p&gt;Before we go further, we need to understand how spell-checking is
commonly used. I'd split spell-checking in modern applications into 3
parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check whether the word exists in the dictionary&lt;/li&gt;
&lt;li&gt;Find possible fixes for the misspelled word&lt;/li&gt;
&lt;li&gt;Order suggestions based on some sort of heuristic&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While BK-tree can be used to address the first point, it's often
faster and easier to use set/bag implementation to achieve close to
constant time for checks. The second point can be tackled variety of
ways, but for the purpose of the article we'll consider only the
simplest ones:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;naively in linear time by scanning all words in the dictionary and
  calculating edit distance&lt;/li&gt;
&lt;li&gt;by generating all possible edits of the word being checked or by
  &lt;a href="as Peter Norvig succinctly
  shows in his article on the matter"&gt;http://norvig.com/spell-correct.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;BK-tree (duh)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With this in mind, let's write some code.&lt;/p&gt;
&lt;h1&gt;Implementing BK-tree&lt;/h1&gt;
&lt;p&gt;Since BK-tree is an index, we have to separate steps -- building said
index and querying it. I've decided to illustrate it instead of trying
to use algorithm description, to make it easier. Well, at least I
found it easier to understand it by using drawings.&lt;/p&gt;
&lt;h2&gt;Building the tree&lt;/h2&gt;
&lt;p&gt;Building BK-tree is pretty easy, our algorithm is basically this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Calculate Levenstein distance between root of the tree and the word
being indexed&lt;/li&gt;
&lt;li&gt;If there is a free slot for resulting distance, insert new node with
the word being indexed&lt;/li&gt;
&lt;li&gt;Otherwise, take the node under said slot and continue recursively&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let's imagine that our dictionary consists of words &lt;code&gt;squirrel&lt;/code&gt;,
&lt;code&gt;square&lt;/code&gt;, &lt;code&gt;shard&lt;/code&gt; and &lt;code&gt;circus&lt;/code&gt;. We start our process by setting
&lt;code&gt;squirrel&lt;/code&gt; at the root of the tree:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Setting the root" src="http://blog.mishkovskyi.net/images/bktree/build0.png" /&gt;&lt;/p&gt;
&lt;p&gt;As this is the root, no calculation had to be done. Next in queue is
the word &lt;code&gt;shard&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Adding first word" src="http://blog.mishkovskyi.net/images/bktree/build1.png" /&gt;&lt;/p&gt;
&lt;p&gt;Since the Levenstein distance between &lt;code&gt;square&lt;/code&gt; and &lt;code&gt;shard&lt;/code&gt; is 6 we
check whether there's any child node with the same distance and since
there's not we simply insert &lt;code&gt;shard&lt;/code&gt; as a child node of &lt;code&gt;shard&lt;/code&gt;. Next
up &lt;code&gt;square&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Adding second word" src="http://blog.mishkovskyi.net/images/bktree/build2.png" /&gt;&lt;/p&gt;
&lt;p&gt;Again, we calculate Levenstein distance (which is 3 this time) and
happily insert new child node. Now to the next word:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A WILD CONFLICT APPEARS" src="http://blog.mishkovskyi.net/images/bktree/build3.png" /&gt;&lt;/p&gt;
&lt;p&gt;Oops, this time we can't just insert the child node, since there's
already a child node with a Levenstein distance of 6. Following the
algorithm we simply try enter the word under the child node with the
same Levenstein distance, which happens to be &lt;code&gt;shard&lt;/code&gt; node. In the end
we end up with something that looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Look how the tree has grown" src="http://blog.mishkovskyi.net/images/bktree/build4.png" /&gt;&lt;/p&gt;
&lt;p&gt;That should show the basic principle of building the tree.&lt;/p&gt;
&lt;h2&gt;Querying the tree&lt;/h2&gt;
&lt;h3&gt;It's a bad, bad, bad, bad math!&lt;/h3&gt;
&lt;p&gt;Querying the tree is straightforward, however the efficiency is
guaranteed due to a simple property of Levenstein distance, namely the
&lt;a href="https://en.wikipedia.org/wiki/Triangle_inequality"&gt;triangle inequality&lt;/a&gt;. Warning,
what follows is math which may be a bit off. So, given the root of the
tree &lt;span class="math"&gt;\(R\)&lt;/span&gt;, children &lt;span class="math"&gt;\(C_i\)&lt;/span&gt;, where &lt;span class="math"&gt;\(i = Levenstein(R, C_i)\)&lt;/span&gt; and the query word
&lt;span class="math"&gt;\(W\)&lt;/span&gt; we can say that
&lt;/p&gt;
&lt;div class="math"&gt;$$Levenstein(W, R) + Levenstein(W, C_i) \ge Levenstein(R, C_i)$$&lt;/div&gt;
&lt;p&gt;
and consecutively
&lt;/p&gt;
&lt;div class="math"&gt;$$Levenstein(W, R) - Levenstein(W, C_i) \le Levenstein(R, C_i)$$&lt;/div&gt;
&lt;p&gt;Since &lt;span class="math"&gt;\(i = Levenstein(R, C_i)\)&lt;/span&gt; and &lt;span class="math"&gt;\(Levenstein(W, C_i) \le d_{max}\)&lt;/span&gt;
where &lt;span class="math"&gt;\(d_{max}\)&lt;/span&gt; is our maximum tolerable distance between query word
and dictionary words, we end with a simple filter, where we only need
to recursively query children nodes
&lt;span class="math"&gt;\(C_i, \forall i \in [Levenstein(W, R) - d_{max}, Levenstein(W, R) + d_{max}]\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;FIXME Make an illustration based on circles&lt;/p&gt;
&lt;h3&gt;Illustrating the query&lt;/h3&gt;
&lt;p&gt;Given this overly complicated and most definitely flawed explanation
of the math behind BK-tree, let's illustrate how this works in real
world. For the purpose of illustration I've created a pretty simple
tree.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Just the tree" src="http://blog.mishkovskyi.net/images/bktree/query0.png" /&gt;&lt;/p&gt;
&lt;p&gt;We'd like to find all words in the tree that are within Levenstein
distance of at most 1 of the word "brine". First step is to match the
root, which in our tree is "trine".&lt;/p&gt;
&lt;p&gt;&lt;img alt="Querying root" src="http://blog.mishkovskyi.net/images/bktree/query1.png" /&gt;&lt;/p&gt;
&lt;p&gt;Since the Levenstein distance between those words is 1, we got our
first match. Now, the trick part which makes BK-tree efficient comes
into play: from all children of this tree we only need to query
children within distance range of &lt;span class="math"&gt;\([0..2]\)&lt;/span&gt;. While in this simple
example it seems like a small improvement, in real-world scenarios
this would effectively lead to cutting off significant amount of
branches.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Highlight matching children nodes" src="http://blog.mishkovskyi.net/images/bktree/query2.png" /&gt;&lt;/p&gt;
&lt;p&gt;We then continue using the same approach recursively, until we either
terminate at the leaf node or when all edges fail the triangle
inequality.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Show matched values" src="http://blog.mishkovskyi.net/images/bktree/query3.png" /&gt;&lt;/p&gt;
&lt;p&gt;Voila! Basically, searching in BK-tree is a BFS with small filter on
top of it.&lt;/p&gt;
&lt;h1&gt;Performance&lt;/h1&gt;
&lt;p&gt;Theory and pictures is all nice and dandy, however I couldn't just
write about BK-tree without
&lt;a href="https://github.com/mishok13/bktree.clj"&gt;actually implementing it&lt;/a&gt;. It's
a pretty bare implementation, with very little though put into
optimizations and such, thus the performance can not be directly
compared to other existing approaches without embarrassing the author
of this article. This of course didn't stop me from doing exactly
that. For the performance test I've used a standard dictionary found
in OS X, under &lt;code&gt;/usr/share/dict/words&lt;/code&gt;. For no reason other than to
entertain myself I've decided to look into word length distribution of
said dictionary, so I proudly present to you the graphical
representation of my research:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Word length bar chart" src="http://blog.mishkovskyi.net/images/bktree/word-length.png" /&gt;&lt;/p&gt;
&lt;h2&gt;Benchmark&lt;/h2&gt;
&lt;p&gt;Armed with that information I've went through and benchmarked my
implementation of BK-tree and compared it to using linear dictionary
check and Peter Norvig's approach. The methodology was pretty simple:
each benchmark was executed with the same list of misspelled words
with only the maximum edit distance changing. Below you can see the
results of said test:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Benchmark results" src="{filename}/images/bktree/benchmark.png" /&gt;&lt;/p&gt;
&lt;p&gt;One glaring omission is Norvig's spellchecker for edit distance
of 3. This is due to the fact that for longer words the number of all
misspellings at an edit distance of 3 is humongous (we are talking
order of tens of millions).&lt;/p&gt;
&lt;p&gt;The actual results show that while BK-tree is in fact faster when
compared to brute force approach, it loses significantly to Norvig's
approach for edit distances of 1 and 2. The latter is also helped by
extremely tuned implementations of
&lt;code&gt;clojure.lang.PersistentHashSet&lt;/code&gt;. Obviously, proper real world spell
checkers use more sophisticated solutions, such as
&lt;a href="https://en.wikipedia.org/wiki/String_searching_algorithm"&gt;suffix tries, phonetic algorithms, bitap, etc&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;BK-tree is an interesting application of seemingly unrelated concept
(triangle inequality) to a well-established problem
space. Implementing BK-tree is easy, however I'm still on the fence
regarding practicality of using BK-trees. Some smart people before me
&lt;a href="https://issues.apache.org/jira/browse/LUCENE-2230"&gt;considered using BK-tree&lt;/a&gt;
in Lucene but ultimately decided that it is simply not worth it. I
still liked writing it. :)&lt;/p&gt;
&lt;script type="text/javascript"&gt;if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
    var align = "center",
        indent = "0em",
        linebreak = "false";

    if (false) {
        align = (screen.width &lt; 768) ? "left" : align;
        indent = (screen.width &lt; 768) ? "0em" : indent;
        linebreak = (screen.width &lt; 768) ? 'true' : linebreak;
    }
    
    var mathjaxscript = document.createElement('script');
    mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
    mathjaxscript.type = 'text/javascript';
    mathjaxscript.src = '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
    mathjaxscript[(window.opera ? "innerHTML" : "text")] =
        "MathJax.Hub.Config({" +
        "    config: ['MMLorHTML.js']," +
        "    TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
        "    jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
        "    extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
        "    displayAlign: '"+ align +"'," +
        "    displayIndent: '"+ indent +"'," +
        "    showMathMenu: true," +
        "    tex2jax: { " +
        "        inlineMath: [ ['\\\\(','\\\\)'] ], " +
        "        displayMath: [ ['$$','$$'] ]," +
        "        processEscapes: true," +
        "        preview: 'TeX'," +
        "    }, " +
        "    'HTML-CSS': { " +
        "        styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
        "        linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
        "    }, " +
        "}); " +
        "if ('default' !== 'default') {" +
            "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
            "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
        "}";
    (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
&lt;/script&gt;</summary><category term="clojure"></category><category term="datastructure"></category></entry></feed>