<?xml version="1.0" encoding="utf-8"?><feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom"><title>Recent notes from b.93z.org</title><link href="http://b.93z.org/notes/" rel="alternate"></link><link href="http://b.93z.org/notes/feed/" rel="self"></link><id>http://b.93z.org/notes/</id><updated>2017-08-01T00:00:00+00:00</updated><author><name>b.93z.org</name></author><subtitle>This blog is about things I encounter while doing web and non-web software development.</subtitle><entry><title>Running CPython under Memcheck</title><link href="http://b.93z.org/notes/running-cpython-under-memcheck/" rel="alternate"></link><updated>2017-08-01T00:00:00+00:00</updated><id>http://b.93z.org/notes/running-cpython-under-memcheck/</id><summary type="html">&lt;p&gt;Recently I’ve been doing some things that involve use of &lt;a href="https://docs.python.org/3.6/library/ctypes.html"&gt;&lt;code&gt;ctypes&lt;/code&gt;&lt;/a&gt;. Running CPython under Memcheck (part of &lt;a href="http://valgrind.org"&gt;Valgrind&lt;/a&gt;) is known to have some quirks. Nevertheless, it is still useful for my particular case: finding memory leaks in a program that uses &lt;code&gt;ctypes&lt;/code&gt; to call &lt;a href="http://man7.org/linux/man-pages/man3/malloc.3.html"&gt;&lt;code&gt;malloc&lt;/code&gt;&lt;/a&gt; but sometimes does not call &lt;a href="http://man7.org/linux/man-pages/man3/free.3.html"&gt;&lt;code&gt;free&lt;/code&gt;&lt;/a&gt;. Here’s a contrived example. It is a Python script (&lt;code&gt;test.py&lt;/code&gt;) that allows introduction of deliberate memory leak:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;ctypes&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;argparse&lt;/span&gt;


&lt;span class="n"&gt;libc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CDLL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;libc.so.6&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;allocate_and_maybe_free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;must_free&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;libc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;malloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_size_t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;must_free&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;libc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;__main__&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-f&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;--no-free&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;store_true&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;allocate_and_maybe_free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no_free&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When &lt;code&gt;--no-free&lt;/code&gt; command-line argument is passed, &lt;code&gt;args.no_free&lt;/code&gt; becomes &lt;code&gt;True&lt;/code&gt;, so &lt;code&gt;must_free&lt;/code&gt; (&lt;code&gt;not args.no_free&lt;/code&gt;) in &lt;code&gt;allocate_and_maybe_free&lt;/code&gt; becomes &lt;code&gt;False&lt;/code&gt;, therefore &lt;code&gt;libc.free(mem)&lt;/code&gt; is not called.&lt;/p&gt;&lt;p&gt;To use Valgrind (and its tools, like Memcheck) with CPython (3.6 in my case), debug-enabled build of the latter is needed:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; ./configure --prefix /home/user/build --with-pydebug --with-valgrind
&lt;span class="gp"&gt;$&lt;/span&gt; make install
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Three options are passed above:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;code&gt;--prefix&lt;/code&gt; is usual installation prefix&lt;/li&gt;&lt;li&gt;&lt;code&gt;--with-pydebug&lt;/code&gt; is &lt;a href="https://github.com/python/cpython/blob/6f446bee4f6ac0c61bb2c3386a0149fd36855793/Misc/SpecialBuilds.txt#L4"&gt;recommended&lt;/a&gt; option for building debug-enabled interpreter&lt;/li&gt;&lt;li&gt;&lt;a href="https://bugs.python.org/issue2422"&gt;&lt;code&gt;--with-valgrind&lt;/code&gt;&lt;/a&gt; makes interpreter automatically disable pymalloc memory allocator when running under Valgrind&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;After building &amp;amp; installation successfully finishes, &lt;code&gt;python3&lt;/code&gt; binary is installed into &lt;code&gt;/home/user/build/bin/&lt;/code&gt; directory (due to &lt;code&gt;--prefix /home/user/build&lt;/code&gt;). Now let’s see what happens when memory is allocated and then correctly freed in &lt;code&gt;test.py&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; valgrind --leak-check&lt;span class="o"&gt;=&lt;/span&gt;full --show-possibly-lost&lt;span class="o"&gt;=&lt;/span&gt;no --show-reachable&lt;span class="o"&gt;=&lt;/span&gt;no ./build/bin/python3 test.py
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;span class="go"&gt;==...== LEAK SUMMARY:&lt;/span&gt;
&lt;span class="go"&gt;==...==    definitely lost: 0 bytes in 0 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==    indirectly lost: 0 bytes in 0 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==      possibly lost: 1,993,410 bytes in 9,066 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==    still reachable: 6,179 bytes in 18 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==         suppressed: 0 bytes in 0 blocks&lt;/span&gt;
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Memcheck reports no “definitely lost” memory. There is “possibly lost” &amp;amp; “still reachable”, but that’s CPython. But if deliberate memory leak is introduced by passing &lt;code&gt;--no-free&lt;/code&gt; to &lt;code&gt;test.py&lt;/code&gt;, Memcheck complains about “definitely lost” memory:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; valgrind --leak-check&lt;span class="o"&gt;=&lt;/span&gt;full --show-possibly-lost&lt;span class="o"&gt;=&lt;/span&gt;no --show-reachable&lt;span class="o"&gt;=&lt;/span&gt;no ./build/bin/python3 test.py --no-free
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;span class="go"&gt;==...== 1,234 bytes in 1 blocks are definitely lost in loss record 22 of 34&lt;/span&gt;
&lt;span class="go"&gt;==...==    at 0x...: malloc (in /usr/lib/valgrind/vgpreload_memcheck-....so)&lt;/span&gt;
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;span class="go"&gt;==...==    by 0x...: ffi_call (in /usr/lib/.../libffi...)&lt;/span&gt;
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;span class="go"&gt;==...==    by 0x...: _PyFunction_FastCall (ceval.c:4891)&lt;/span&gt;
&lt;span class="go"&gt;==...==&lt;/span&gt;
&lt;span class="go"&gt;==...== LEAK SUMMARY:&lt;/span&gt;
&lt;span class="go"&gt;==...==    definitely lost: 1,234 bytes in 1 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==    indirectly lost: 0 bytes in 0 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==      possibly lost: 1,993,277 bytes in 9,065 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==    still reachable: 6,179 bytes in 18 blocks&lt;/span&gt;
&lt;span class="go"&gt;==...==         suppressed: 0 bytes in 0 blocks&lt;/span&gt;
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As you can see, “definitely lost” are 1234 bytes allocated by &lt;code&gt;libc.malloc(ctypes.c_size_t(1234))&lt;/code&gt; that due to &lt;code&gt;--no-free&lt;/code&gt; are not freed. So, Memcheck may help catch memory leaks in Python programs under CPython.&lt;/p&gt;</summary><category term="cpython"></category><category term="python-3"></category></entry><entry><title>How to redirect to primary domain</title><link href="http://b.93z.org/notes/how-to-redirect-to-primary-domain/" rel="alternate"></link><updated>2017-07-11T00:00:00+00:00</updated><id>http://b.93z.org/notes/how-to-redirect-to-primary-domain/</id><summary type="html">&lt;p&gt;Quite often I encounter such redirects in Nginx configuration:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;www.example.org&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;rewrite&lt;/span&gt; &lt;span class="s"&gt;^(.*)&lt;/span&gt;$ &lt;span class="nv"&gt;$scheme://example.org$uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I see three problems with this approach. Let’s fix them one by one.&lt;/p&gt;&lt;p&gt;First, use of regular expressions for such a simple case is unwarranted. &lt;a href="http://nginx.org/en/docs/http/ngx_http_rewrite_module.html#return"&gt;&lt;code&gt;return&lt;/code&gt;&lt;/a&gt; seems good enough:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;www.example.org&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;302&lt;/span&gt; &lt;span class="nv"&gt;$scheme://example.org$uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Second, in such cases we usually want permanent redirect. The &lt;a href="http://nginx.org/en/docs/http/ngx_http_rewrite_module.html#rewrite"&gt;&lt;code&gt;rewrite&lt;/code&gt;&lt;/a&gt; use as in first snippet (&lt;code&gt;rewrite ^(.*)$ $scheme://example.org$uri;&lt;/code&gt;) generates &lt;code&gt;302 Found&lt;/code&gt;. Let’s generate &lt;code&gt;301 Moved Permanently&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;www.example.org&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;301&lt;/span&gt; &lt;span class="nv"&gt;$scheme://example.org$uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Finally, &lt;a href="http://nginx.org/en/docs/http/ngx_http_core_module.html#var_uri"&gt;&lt;code&gt;$uri&lt;/code&gt;&lt;/a&gt; is not necessarily original URI. For redirects it may be better to use “full original request URI (with arguments)”—&lt;a href="http://nginx.org/en/docs/http/ngx_http_core_module.html#var_request_uri"&gt;&lt;code&gt;$request_uri&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;www.example.org&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;301&lt;/span&gt; &lt;span class="nv"&gt;$scheme://example.org$request_uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For example, my static blog engine (that powers this site) &lt;a href="https://github.com/PavloKapyshin/paka.vx1/blob/68a2ebbfe23fa153822d625010eadc2ed320b205/paka/vx1/nginx.py#L41"&gt;generates&lt;/a&gt; configuration with permanent redirect and &lt;code&gt;$request_uri&lt;/code&gt; as I recommend here.&lt;/p&gt;</summary><category term="nginx"></category></entry><entry><title>Unit-testing usage examples in README.rst</title><link href="http://b.93z.org/notes/unit-testing-usage-examples-in-readme.rst/" rel="alternate"></link><updated>2017-01-27T00:00:00+00:00</updated><id>http://b.93z.org/notes/unit-testing-usage-examples-in-readme.rst/</id><summary type="html">&lt;p&gt;Automatic testing of code examples in README is an area that is often neglected. I’d like to share some experience I got when was integrating doctests from &lt;code&gt;README.rst&lt;/code&gt; of &lt;a href="https://github.com/PavloKapyshin/paka.cmark"&gt;paka.cmark&lt;/a&gt; into existing &lt;code&gt;unittest&lt;/code&gt;-based test discovery.&lt;/p&gt;&lt;p&gt;Let’s say there are following files:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;somelib-project/
    README.rst
    somelib/
        ...
    tests/
        test_readme.py
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To run tests from &lt;code&gt;tests/&lt;/code&gt; dir following command is executed from inside &lt;code&gt;somelib-project/&lt;/code&gt; dir:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; python3 -m unittest discover --start-directory tests/
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Inside &lt;code&gt;test_readme.py&lt;/code&gt; file &lt;a href="https://docs.python.org/3.5/library/doctest.html"&gt;&lt;code&gt;doctest&lt;/code&gt;&lt;/a&gt; examples are parsed from &lt;code&gt;README.rst&lt;/code&gt; and then are exposed via &lt;a href="https://docs.python.org/3.5/library/unittest.html#load-tests-protocol"&gt;&lt;code&gt;load_tests&lt;/code&gt; protocol&lt;/a&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;doctest&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;unittest&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_tests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;suite&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doctest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DocFileSuite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;../README.rst&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addTests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suite&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This might be enough in general. Now for the specifics. Tests and code of &lt;code&gt;paka.cmark&lt;/code&gt; have to support both Python 2.7 and 3.5 with single source. As &lt;code&gt;paka.cmark.to_html&lt;/code&gt; accepts and returns &lt;code&gt;unicode&lt;/code&gt; in 2.7 and &lt;code&gt;str&lt;/code&gt; in 3, use of &lt;code&gt;repr&lt;/code&gt; in doctests will fail for one version of language, and succeed for other. That is, not using “u” string prefix will cause failure under 2.7:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;cmark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;u&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hello,&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;*World*!&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;\n&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="go"&gt;Failed example:&lt;/span&gt;
&lt;span class="go"&gt;    cmark.to_html(u&amp;quot;Hello,\n*World*!&amp;quot;)&lt;/span&gt;
&lt;span class="go"&gt;Expected:&lt;/span&gt;
&lt;span class="go"&gt;    &amp;#39;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;\n&amp;#39;&lt;/span&gt;
&lt;span class="go"&gt;Got:&lt;/span&gt;
&lt;span class="go"&gt;    u&amp;#39;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;\n&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And &lt;em&gt;using&lt;/em&gt; “u” string prefix will result in fail under 3.5 (but tests will pass for 2.7):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;cmark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;u&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hello,&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;*World*!&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;u&amp;#39;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;\n&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="go"&gt;Failed example:&lt;/span&gt;
&lt;span class="go"&gt;    cmark.to_html(u&amp;quot;Hello,\n*World*!&amp;quot;)&lt;/span&gt;
&lt;span class="go"&gt;Expected:&lt;/span&gt;
&lt;span class="go"&gt;    u&amp;#39;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;\n&amp;#39;&lt;/span&gt;
&lt;span class="go"&gt;Got:&lt;/span&gt;
&lt;span class="go"&gt;    &amp;#39;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;\n&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I use a “hack”—wrap each call with &lt;code&gt;print&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;u&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hello,&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;*World*!&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;But just wrapping them is not enough, as resulting HTML has newline at the end:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="go"&gt;Failed example:&lt;/span&gt;
&lt;span class="go"&gt;    print(cmark.to_html(u&amp;quot;Hello,\n*World*!&amp;quot;))&lt;/span&gt;
&lt;span class="go"&gt;Expected:&lt;/span&gt;
&lt;span class="go"&gt;    &amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;Got:&lt;/span&gt;
&lt;span class="go"&gt;    &amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;    &amp;lt;BLANKLINE&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;With &lt;a href="https://docs.python.org/3.5/library/doctest.html#doctest.NORMALIZE_WHITESPACE"&gt;doctest directive&lt;/a&gt; that problem is solved:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;u&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hello,&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;*World*!&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# doctest: +NORMALIZE_WHITESPACE&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;p&amp;gt;Hello, &amp;lt;em&amp;gt;World&amp;lt;/em&amp;gt;!&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If you don’t want to add such directives to each example individually, you can use &lt;code&gt;optionflags&lt;/code&gt; argument of &lt;code&gt;doctest.DocFileSuite&lt;/code&gt; to pass &lt;code&gt;doctest.NORMALIZE_WHITESPACE&lt;/code&gt; (and thus use that directive for all examples):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;doctest&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;unittest&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_tests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;suite&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doctest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DocFileSuite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;../README.rst&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optionflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doctest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NORMALIZE_WHITESPACE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addTests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suite&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;</summary><category term="cpython"></category><category term="pypy"></category><category term="python-2"></category><category term="python-3"></category><category term="testing"></category></entry><entry><title>Adding hard line breaks to paka.cmark</title><link href="http://b.93z.org/notes/adding-hard-line-breaks-to-paka.cmark/" rel="alternate"></link><updated>2017-01-26T00:00:00+00:00</updated><id>http://b.93z.org/notes/adding-hard-line-breaks-to-paka.cmark/</id><summary type="html">&lt;p&gt;Recently I’ve been working on support for &lt;code&gt;CMARK_OPT_HARDBREAKS&lt;/code&gt; in &lt;a href="https://github.com/PavloKapyshin/paka.cmark"&gt;&lt;code&gt;paka.cmark&lt;/code&gt;&lt;/a&gt;, and I’ve been thinking about design of API for &lt;code&gt;paka.cmark.to_html&lt;/code&gt;. In particular, how to conveniently allow library user to specify hard breaks via &lt;code&gt;breaks&lt;/code&gt; keyword argument without breaking compatibility with existing interface that accepts &lt;code&gt;True&lt;/code&gt; (to render “softbreak” elements as line breaks) or &lt;code&gt;False&lt;/code&gt; (to render “softbreak” elements as spaces) for &lt;code&gt;breaks&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;There are three option specification cases related to line breaks and HTML in &lt;a href="https://github.com/jgm/cmark"&gt;&lt;code&gt;cmark&lt;/code&gt;&lt;/a&gt; (C library):&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;When no line break-related options are specified, “softbreaks” are rendered as line breaks.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;When &lt;code&gt;CMARK_OPT_NOBREAKS&lt;/code&gt; is specified, “softbreaks” are rendered as spaces.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Finally, on &lt;code&gt;CMARK_OPT_HARDBREAKS&lt;/code&gt;, “softbreaks” are rendered as &lt;code&gt;&amp;lt;br&amp;gt;&lt;/code&gt;s.&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;In &lt;code&gt;paka.cmark&lt;/code&gt;, first case is handled by &lt;code&gt;breaks=True&lt;/code&gt;, and the second case by &lt;code&gt;breaks=False&lt;/code&gt;. So the question is, what’s the best way (in this case) to handle third one. I don’t want to get rid of booleans, they are natural for “yes” and “no”, so let’s see how range of possible argument values can be augmented.&lt;/p&gt;&lt;h2&gt;Enum&lt;/h2&gt;&lt;p&gt;In addition to &lt;code&gt;to_html&lt;/code&gt; user will have to import &lt;code&gt;Breaks&lt;/code&gt; class (probably a subclass of &lt;code&gt;enum.Enum&lt;/code&gt;):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;paka.cmark&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Breaks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_html&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Breaks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;soft&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Breaks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Breaks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hard&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Though constants (e.g. &lt;code&gt;HARD_BREAKS&lt;/code&gt;, &lt;code&gt;NO_BREAKS&lt;/code&gt;) can be used instead, they’d still have to be imported, so there would be no real benefits comparing to use of enumeration.&lt;/p&gt;&lt;h2&gt;String&lt;/h2&gt;&lt;p&gt;Like “enum” variant, but no need to import anything except &lt;code&gt;to_html&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;paka.cmark&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;to_html&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;soft&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;no&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hard&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;h2&gt;Combined approach&lt;/h2&gt;&lt;p&gt;There’ll be &lt;code&gt;Breaks&lt;/code&gt; class with two members:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Breaks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;soft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;soft&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;hard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;hard&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;code&gt;Breaks.no&lt;/code&gt; will not exist (so it will not be member of &lt;code&gt;Breaks&lt;/code&gt;), as &lt;code&gt;False&lt;/code&gt; is good enough, no need to reinvent it :) Then in &lt;code&gt;to_html&lt;/code&gt; this will be handled in a way that does not set any &lt;code&gt;cmark&lt;/code&gt; options when &lt;code&gt;Breaks.soft&lt;/code&gt; is used:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CMARK_OPT_DEFAULT&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;hard&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CMARK_OPT_HARDBREAKS&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CMARK_OPT_NOBREAKS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;“Combined” variant will allow for enough flexibility, giving library user freedom to choose most suitable way to use &lt;code&gt;paka.cmark&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;paka.cmark&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;to_html&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hard&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;paka.cmark&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Breaks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_html&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breaks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Breaks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hard&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So this is, roughly speaking, what I have &lt;a href="https://github.com/PavloKapyshin/paka.cmark/commit/985204a3e78776f1648bea26b736d8b959ed8e4b"&gt;implemented&lt;/a&gt;.&lt;/p&gt;</summary><category term="api-design"></category><category term="cpython"></category><category term="pypy"></category><category term="python-2"></category><category term="python-3"></category></entry><entry><title>Automatic HTML escaping in Mako</title><link href="http://b.93z.org/notes/automatic-html-escaping-in-mako/" rel="alternate"></link><updated>2017-01-02T00:00:00+00:00</updated><id>http://b.93z.org/notes/automatic-html-escaping-in-mako/</id><summary type="html">&lt;p&gt;Some developers I know think that &lt;a href="http://makotemplates.org/"&gt;Mako&lt;/a&gt; templating library is “less secure” than, for example, Jinja2 (and, therefore, “must not be used”). Actually, Mako is as “secure” as you’ll configure it.&lt;/p&gt;&lt;p&gt;Though by default Mako does not HTML-escape rendered expressions, it can be configured to do that. When you create lookup (an instance of &lt;code&gt;mako.lookup.TemplateLookup&lt;/code&gt;), you may pass arguments: list of directories to search for templates, encoding of templates, etc. Usually it looks like this:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;mako.lookup&lt;/span&gt;


&lt;span class="n"&gt;lookup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mako&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lookup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TemplateLookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;directories&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;templates&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;input_encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;utf-8&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;filesystem_checks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;strict_undefined&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So, how to make lookup, as constructed above, escape rendered expressions? In addition to aforementioned arguments, &lt;code&gt;TemplateLookup&lt;/code&gt; accepts &lt;code&gt;default_filters&lt;/code&gt;—a list of filters to use on expressions by default:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;mako.lookup&lt;/span&gt;


&lt;span class="n"&gt;lookup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mako&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lookup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TemplateLookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;directories&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;templates&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;input_encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;utf-8&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;filesystem_checks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;strict_undefined&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;h&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For example, &lt;code&gt;pyramid_mako&lt;/code&gt; uses &lt;code&gt;default_filters=[&amp;quot;h&amp;quot;]&lt;/code&gt; by default, so if you’re user of Pyramid framework you may have one less thing to worry about.&lt;/p&gt;&lt;p&gt;Why not &lt;code&gt;default_filters=[&amp;quot;str&amp;quot;, &amp;quot;h&amp;quot;]&lt;/code&gt;? It may seem to work, but not for objects with &lt;code&gt;__html__&lt;/code&gt; method: these will be treated as text that needs escaping, not HTML that should be rendered as-is. Default filters are applied from left to right, so &lt;code&gt;${form.somefield()}&lt;/code&gt; will be transformed, roughly speaking, into &lt;code&gt;h(str(form.somefield()))&lt;/code&gt;—which is, basically, &lt;code&gt;h(form.somefield().__str__())&lt;/code&gt;. As you can clearly see, &lt;code&gt;form.somefield().__html__&lt;/code&gt; will not be called if &lt;code&gt;default_filters&lt;/code&gt; are defined as &lt;code&gt;[&amp;quot;str&amp;quot;, &amp;quot;h&amp;quot;]&lt;/code&gt;.&lt;/p&gt;</summary><category term="mako"></category><category term="pyramid"></category><category term="python-3"></category></entry><entry><title>How to create .tar.gz reproducibly (putting it together)</title><link href="http://b.93z.org/notes/how-to-create-.tar.gz-reproducibly/" rel="alternate"></link><updated>2016-11-12T00:00:00+00:00</updated><id>http://b.93z.org/notes/how-to-create-.tar.gz-reproducibly/</id><summary type="html">&lt;p&gt;Though I recommend you to read &lt;a href="/notes/how-to-create-.tar-reproducibly/"&gt;&lt;code&gt;.tar&lt;/code&gt;&lt;/a&gt; and &lt;a href="/notes/how-to-create-.gz-reproducibly/"&gt;&lt;code&gt;.gz&lt;/code&gt;&lt;/a&gt; notes, where internals and features of these file formats and relevant modules from CPython standard library are discussed, you are free to skip them it you are in rush and need just code, because here we’ll do just that: code :)&lt;/p&gt;&lt;p&gt;Yes, here we are to make a script (&lt;code&gt;mktgz.py&lt;/code&gt;) that’ll take paths to files and dirs and make &lt;code&gt;.tar.gz&lt;/code&gt;—compressed archive.&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="ch"&gt;#!/usr/bin/env python3&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;gzip&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;tarfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;argparse&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tgz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_paths&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;relpath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_to_add&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;top_path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_paths&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="n"&gt;filepath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;wb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;_dest_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GzipFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_dest_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;w&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dest_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tarfile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dest_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;w&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;_get_to_add&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;tinfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gettarinfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isreg&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;__main__&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;paths&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;+&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--reltop&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--mtime&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--filename&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--verbose&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;store_true&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dest&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tgz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Few quick notes about code. First, same &lt;code&gt;mtime&lt;/code&gt; value is used for both “tar” and “gzip” parts of code. Second, &lt;a href="https://docs.python.org/3.5/library/gzip.html#gzip.GzipFile"&gt;&lt;code&gt;gzip.GzipFile&lt;/code&gt;&lt;/a&gt; is used instead of &lt;a href="https://docs.python.org/3.5/library/gzip.html#gzip.open"&gt;&lt;code&gt;gzip.open&lt;/code&gt;&lt;/a&gt; because the latter does not accept &lt;code&gt;filename&lt;/code&gt; and &lt;code&gt;mtime&lt;/code&gt; arguments. Oh, and if you are puzzled about &lt;code&gt;reltop&lt;/code&gt;, please read the &lt;a href="/notes/how-to-create-.tar-reproducibly/"&gt;note about tar&lt;/a&gt;, &lt;code&gt;reltop&lt;/code&gt;’s purpose is explained there.&lt;/p&gt;&lt;p&gt;Example usage:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; ./mktgz.py dir1/ file2 dir2/ file1 out.tar.gz --mtime &lt;span class="m"&gt;0&lt;/span&gt; --filename &lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt; --verbose
&lt;span class="go"&gt;dir1/ -&amp;gt; dir1&lt;/span&gt;
&lt;span class="go"&gt;dir1/1 -&amp;gt; dir1/1&lt;/span&gt;
&lt;span class="go"&gt;dir1/1/a -&amp;gt; dir1/1/a&lt;/span&gt;
&lt;span class="go"&gt;dir1/1/b -&amp;gt; dir1/1/b&lt;/span&gt;
&lt;span class="go"&gt;dir1/1/c -&amp;gt; dir1/1/c&lt;/span&gt;
&lt;span class="go"&gt;dir1/1/d -&amp;gt; dir1/1/d&lt;/span&gt;
&lt;span class="go"&gt;dir1/2 -&amp;gt; dir1/2&lt;/span&gt;
&lt;span class="go"&gt;dir1/2/aa -&amp;gt; dir1/2/aa&lt;/span&gt;
&lt;span class="go"&gt;dir1/2/bb -&amp;gt; dir1/2/bb&lt;/span&gt;
&lt;span class="go"&gt;dir1/2/cc -&amp;gt; dir1/2/cc&lt;/span&gt;
&lt;span class="go"&gt;dir1/2/dd -&amp;gt; dir1/2/dd&lt;/span&gt;
&lt;span class="go"&gt;dir1/3 -&amp;gt; dir1/3&lt;/span&gt;
&lt;span class="go"&gt;dir1/3/aaa -&amp;gt; dir1/3/aaa&lt;/span&gt;
&lt;span class="go"&gt;dir1/3/bbb -&amp;gt; dir1/3/bbb&lt;/span&gt;
&lt;span class="go"&gt;dir1/3/ccc -&amp;gt; dir1/3/ccc&lt;/span&gt;
&lt;span class="go"&gt;dir1/3/ddd -&amp;gt; dir1/3/ddd&lt;/span&gt;
&lt;span class="go"&gt;dir2/ -&amp;gt; dir2&lt;/span&gt;
&lt;span class="go"&gt;file1 -&amp;gt; file1&lt;/span&gt;
&lt;span class="go"&gt;file2 -&amp;gt; file2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As we expect, &lt;code&gt;out.tar.gz&lt;/code&gt; is a gzip file:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; file out.tar.gz
&lt;span class="go"&gt;out.tar.gz: gzip compressed data, max compression&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And &lt;code&gt;tar&lt;/code&gt; also works with &lt;code&gt;out.tar.gz&lt;/code&gt; as expected:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; tar tvf out.tar.gz
&lt;span class="go"&gt;drwxrwxr-x u/u             0 1970-01-01 03:00 dir1/&lt;/span&gt;
&lt;span class="go"&gt;drwxrwxr-x u/u             0 1970-01-01 03:00 dir1/1/&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/1/a&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/1/b&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/1/c&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/1/d&lt;/span&gt;
&lt;span class="go"&gt;drwxrwxr-x u/u             0 1970-01-01 03:00 dir1/2/&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/2/aa&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/2/bb&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/2/cc&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/2/dd&lt;/span&gt;
&lt;span class="go"&gt;drwxrwxr-x u/u             0 1970-01-01 03:00 dir1/3/&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/3/aaa&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/3/bbb&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             5 1970-01-01 03:00 dir1/3/ccc&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u             7 1970-01-01 03:00 dir1/3/ddd&lt;/span&gt;
&lt;span class="go"&gt;drwxrwxr-x u/u             0 1970-01-01 03:00 dir2/&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u            10 1970-01-01 03:00 file1&lt;/span&gt;
&lt;span class="go"&gt;-rw-rw-r-- u/u            17 1970-01-01 03:00 file2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;</summary><category term="cpython"></category><category term="python-3"></category><category term="reproducibility"></category></entry><entry><title>How to create .gz reproducibly</title><link href="http://b.93z.org/notes/how-to-create-.gz-reproducibly/" rel="alternate"></link><updated>2016-11-11T00:00:00+00:00</updated><id>http://b.93z.org/notes/how-to-create-.gz-reproducibly/</id><summary type="html">&lt;p&gt;Essentially, gzip file consists of “members”, each represented by following diagram:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;+---+---+--+---+---+---+---+---+---+--+     +=========================================+
|___|___|__|FLG|     MTIME     |___|OS| ... |...original file name, zero-terminated...|
+---+---+--+---+---+---+---+---+---+--+     +=========================================+
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Member parts not relevant to our discussion are not named, shown or described, but, if you want specifics, you can look at &lt;a href="https://tools.ietf.org/html/rfc1952"&gt;GZIP file format specification version 4.3 (RFC 1952)&lt;/a&gt; and &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/gzip.py"&gt;&lt;code&gt;gzip&lt;/code&gt; module implementation&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;&lt;code&gt;FLG&lt;/code&gt; is a flag byte that may indicate presence of original file name; as done in Python’s &lt;a href="https://docs.python.org/3.5/library/gzip.html"&gt;&lt;code&gt;gzip&lt;/code&gt; module&lt;/a&gt; (more precisely, in &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/gzip.py#l235"&gt;&lt;code&gt;GzipFile._write_gzip_header&lt;/code&gt;&lt;/a&gt;):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FNAME&lt;/span&gt;
&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fname&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\000&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;code&gt;MTIME&lt;/code&gt; is “the most recent modification time of the original file being compressed” and “if the compressed data did not come from a file, &lt;code&gt;MTIME&lt;/code&gt; is set to the time at which compression started”. In Python optional &lt;code&gt;mtime&lt;/code&gt; argument (POSIX timestamp) is taken and &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/gzip.py#l187"&gt;set to &lt;code&gt;self._write_mtime&lt;/code&gt;&lt;/a&gt; in &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/gzip.py#l123"&gt;constructor of &lt;code&gt;GzipFile&lt;/code&gt;&lt;/a&gt;. Then &lt;code&gt;self._write_mtime&lt;/code&gt; is &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/gzip.py#l238"&gt;used in &lt;code&gt;_write_gzip_header&lt;/code&gt;&lt;/a&gt; method:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_write_mtime&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Finally, &lt;code&gt;OS&lt;/code&gt; indicates the type of file system, and in Python’s &lt;code&gt;gzip&lt;/code&gt; it is &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/gzip.py#l243"&gt;set to 255&lt;/a&gt;—“Unknown”:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\377&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If you are not sure why 255, &lt;code&gt;b'\377'&lt;/code&gt; is &lt;code&gt;0xff&lt;/code&gt;, which is indeed 255:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\377&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;
&lt;span class="go"&gt;b&amp;#39;\xff&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="mh"&gt;0xff&lt;/span&gt;
&lt;span class="go"&gt;255&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I guess “Unknown” value for &lt;code&gt;OS&lt;/code&gt; is hardcoded in &lt;code&gt;gzip&lt;/code&gt; library to ensure portability.&lt;/p&gt;&lt;p&gt;So, when you are making a gzip file, there are two result-influencing factors: original &lt;code&gt;filename&lt;/code&gt;s and &lt;code&gt;timestamp&lt;/code&gt;s that are kept in members of file (type of filesystem—&lt;code&gt;OS&lt;/code&gt;—is set by Python’s &lt;code&gt;gzip&lt;/code&gt; library, so no need to worry about it).&lt;/p&gt;&lt;h2&gt;Implementation&lt;/h2&gt;&lt;p&gt;Our goal is to build a script (&lt;code&gt;mkgz.py&lt;/code&gt;) that creates gzip file containing file pointed to by path passed as a command-line argument.&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="ch"&gt;#!/usr/bin/env python3&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;gzip&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;shutil&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;argparse&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;gz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;src_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;wb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;_dest_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GzipFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_dest_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;w&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dest_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copyfileobj&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;__main__&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;path&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--mtime&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--filename&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dest&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;gz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So, having tar archive &lt;code&gt;out1.tar&lt;/code&gt;, we could use &lt;code&gt;./mkgz.py out1.tar out1.tar.gz&lt;/code&gt; command to create gzip file &lt;code&gt;out1.tar.gz&lt;/code&gt;, but it would contain timestamp and filename (command to get such output is &lt;code&gt;file out1.tar.gz&lt;/code&gt;):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;out1.tar.gz: gzip compressed data, was &amp;quot;out1.tar&amp;quot;, last modified: Fri Nov 11 00:00:07 2016, max compression
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As you may have noticed, &lt;code&gt;mkgz.py&lt;/code&gt; can accept &lt;code&gt;--mtime&lt;/code&gt; and &lt;code&gt;--filename&lt;/code&gt; command-line arguments: they are passed to &lt;a href="https://docs.python.org/3.5/library/gzip.html#gzip.GzipFile"&gt;&lt;code&gt;gzip.GzipFile&lt;/code&gt;&lt;/a&gt; constructor. Let’s use these arguments:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; ./mkgz.py out2.tar out2.tar.gz --mtime &lt;span class="m"&gt;1&lt;/span&gt;.23 --filename &lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt; file out2.tar.gz
&lt;span class="go"&gt;out2.tar.gz: gzip compressed data, last modified: Thu Jan  1 00:00:01 1970, max compression&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As we can see above, timestamp (“last modified”) is set to one we passed as &lt;code&gt;--mtime 1.23&lt;/code&gt;. &lt;code&gt;filename&lt;/code&gt; is set to empty string.&lt;/p&gt;&lt;p&gt;Let’s make two more gzip files where only filename will vary:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; ./mkgz.py out3.tar out3.tar.gz --mtime &lt;span class="m"&gt;1&lt;/span&gt;.23
&lt;span class="gp"&gt;$&lt;/span&gt; ./mkgz.py out3.tar out4.tar.gz --mtime &lt;span class="m"&gt;1&lt;/span&gt;.23 --filename &lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If you’ll compare outputs of &lt;code&gt;hd out3.tar.gz | sed 2q&lt;/code&gt; and &lt;code&gt;hd out4.tar.gz | sed 2q&lt;/code&gt;, you’d see that &lt;code&gt;out3.tar.gz&lt;/code&gt; indeed contains filename of &lt;code&gt;out3.tar&lt;/code&gt; (while &lt;code&gt;out4.tar.gz&lt;/code&gt; does not):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="nl"&gt;00000000&lt;/span&gt;  &lt;span class="mh"&gt;1f&lt;/span&gt; &lt;span class="mh"&gt;8b&lt;/span&gt; &lt;span class="mh"&gt;08&lt;/span&gt; &lt;span class="mh"&gt;08&lt;/span&gt; &lt;span class="mh"&gt;01&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt;  &lt;span class="mh"&gt;02&lt;/span&gt; &lt;span class="mh"&gt;ff&lt;/span&gt; &lt;span class="mh"&gt;6f&lt;/span&gt; &lt;span class="mh"&gt;75&lt;/span&gt; &lt;span class="mh"&gt;74&lt;/span&gt; &lt;span class="mh"&gt;33&lt;/span&gt; &lt;span class="mh"&gt;2e&lt;/span&gt; &lt;span class="mh"&gt;74&lt;/span&gt;  &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="s"&gt;..........out3.t&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;
&lt;span class="nl"&gt;00000010&lt;/span&gt;  &lt;span class="mh"&gt;61&lt;/span&gt; &lt;span class="mh"&gt;72&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt; &lt;span class="mh"&gt;ed&lt;/span&gt; &lt;span class="mh"&gt;9b&lt;/span&gt; &lt;span class="mh"&gt;41&lt;/span&gt; &lt;span class="mh"&gt;6e&lt;/span&gt; &lt;span class="mh"&gt;c2&lt;/span&gt;  &lt;span class="mh"&gt;30&lt;/span&gt; &lt;span class="mh"&gt;10&lt;/span&gt; &lt;span class="mh"&gt;45&lt;/span&gt; &lt;span class="mh"&gt;bd&lt;/span&gt; &lt;span class="mh"&gt;f6&lt;/span&gt; &lt;span class="mh"&gt;29&lt;/span&gt; &lt;span class="mh"&gt;72&lt;/span&gt; &lt;span class="mh"&gt;83&lt;/span&gt;  &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="s"&gt;ar...An.0.E..)r.&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="nl"&gt;00000000&lt;/span&gt;  &lt;span class="mh"&gt;1f&lt;/span&gt; &lt;span class="mh"&gt;8b&lt;/span&gt; &lt;span class="mh"&gt;08&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt; &lt;span class="mh"&gt;01&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt; &lt;span class="mh"&gt;00&lt;/span&gt;  &lt;span class="mh"&gt;02&lt;/span&gt; &lt;span class="mh"&gt;ff&lt;/span&gt; &lt;span class="mh"&gt;ed&lt;/span&gt; &lt;span class="mh"&gt;9b&lt;/span&gt; &lt;span class="mh"&gt;41&lt;/span&gt; &lt;span class="mh"&gt;6e&lt;/span&gt; &lt;span class="mh"&gt;c2&lt;/span&gt; &lt;span class="mh"&gt;30&lt;/span&gt;  &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="s"&gt;............An.0&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;
&lt;span class="nl"&gt;00000010&lt;/span&gt;  &lt;span class="mh"&gt;10&lt;/span&gt; &lt;span class="mh"&gt;45&lt;/span&gt; &lt;span class="mh"&gt;bd&lt;/span&gt; &lt;span class="mh"&gt;f6&lt;/span&gt; &lt;span class="mh"&gt;29&lt;/span&gt; &lt;span class="mh"&gt;72&lt;/span&gt; &lt;span class="mh"&gt;83&lt;/span&gt; &lt;span class="mh"&gt;7a&lt;/span&gt;  &lt;span class="mh"&gt;6c&lt;/span&gt; &lt;span class="mh"&gt;8f&lt;/span&gt; &lt;span class="mh"&gt;7d&lt;/span&gt; &lt;span class="mh"&gt;9e&lt;/span&gt; &lt;span class="mh"&gt;24&lt;/span&gt; &lt;span class="mh"&gt;a6&lt;/span&gt; &lt;span class="mh"&gt;2a&lt;/span&gt; &lt;span class="mh"&gt;52&lt;/span&gt;  &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="s"&gt;.E..)r.zl.}.$.*R&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now that we looked at &lt;a href="/notes/how-to-create-.tar-reproducibly/"&gt;tar&lt;/a&gt; and gzip separately, we can proceed and &lt;a href="/notes/how-to-create-.tar.gz-reproducibly/"&gt;combine them&lt;/a&gt;.&lt;/p&gt;</summary><category term="cpython"></category><category term="python-3"></category><category term="reproducibility"></category></entry><entry><title>How to create .tar reproducibly</title><link href="http://b.93z.org/notes/how-to-create-.tar-reproducibly/" rel="alternate"></link><updated>2016-11-10T00:00:00+00:00</updated><id>http://b.93z.org/notes/how-to-create-.tar-reproducibly/</id><summary type="html">&lt;p&gt;Given that in &lt;a href="https://docs.python.org/3.5/library/tarfile.html"&gt;&lt;code&gt;tarfile&lt;/code&gt;&lt;/a&gt; default for &lt;code&gt;format&lt;/code&gt; argument of &lt;code&gt;tarfile.TarFile&lt;/code&gt; is &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/tarfile.py#l106"&gt;&lt;code&gt;GNU_FORMAT&lt;/code&gt;&lt;/a&gt;, I will use GNU &lt;code&gt;tar&lt;/code&gt;, not &lt;a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html"&gt;POSIX&lt;/a&gt;, but in aspects I care about for this note they are very similar.&lt;/p&gt;&lt;p&gt;Tar archive is comprised of sequence of “file entries”. Each entry consists of header (metadata) and contents (data). Header contains &lt;code&gt;mtime&lt;/code&gt; field, that represents file modification time at the time of archivation. In Python’s &lt;code&gt;tarfile&lt;/code&gt; archive entries are represented by instances of &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/tarfile.py#l720"&gt;&lt;code&gt;tarfile.TarInfo&lt;/code&gt;&lt;/a&gt;, and &lt;code&gt;mtime&lt;/code&gt; field of file entry header is represented by public field of &lt;code&gt;tarfile.TarInfo&lt;/code&gt; instance. This is all specifics you need to know about tar archive format, but if you want non-simplified description, consult, for example, the &lt;a href="https://www.gnu.org/software/tar/manual/html_node/Standard.html"&gt;relevant section&lt;/a&gt; of GNU &lt;code&gt;tar&lt;/code&gt; manual.&lt;/p&gt;&lt;p&gt;So, when you are making a tar archive, there are 2 ways in which you can influence end result:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;add files in particular order;&lt;/li&gt;&lt;li&gt;set added files’ modification time (&lt;code&gt;mtime&lt;/code&gt;).&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If (for your goals) you consider two identical files with differing &lt;code&gt;mtime&lt;/code&gt; different, you don’t have to worry about it (but still you may want to add files in order). Otherwise, you’ll have to set it to some fixed value to make it irrelevant.&lt;/p&gt;&lt;p&gt;I am talking about realistic scenario, so it should be possible to add not just regular files, but also directories (recursively). Given that in such case we still care about file order, I will not use &lt;a href="https://docs.python.org/3.5/library/tarfile.html#tarfile.TarFile.add"&gt;&lt;code&gt;TarFile.add&lt;/code&gt;&lt;/a&gt;: it &lt;a href="https://hg.python.org/cpython/file/b8233c779ff7/Lib/tarfile.py#l1950"&gt;uses&lt;/a&gt; the &lt;a href="https://docs.python.org/3.5/library/os.html#os.listdir"&gt;&lt;code&gt;os.listdir&lt;/code&gt;&lt;/a&gt;, that returns a list of dir entries in arbitrary order—and &lt;code&gt;TarFile.add&lt;/code&gt; does not sort it nor allow us to do that:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Instead for for recursing into dirs I’ll use &lt;a href="https://docs.python.org/3.5/library/os.html#os.walk"&gt;&lt;code&gt;os.walk&lt;/code&gt;&lt;/a&gt;. Though it uses &lt;a href="https://docs.python.org/3.5/library/os.html#os.scandir"&gt;&lt;code&gt;os.scandir&lt;/code&gt;&lt;/a&gt; internally (&lt;code&gt;os.listdir&lt;/code&gt; in CPython &amp;lt; 3.5), which does not guarantee any ordering too, it is &lt;a href="https://docs.python.org/3.5/library/os.html#os.walk"&gt;possible&lt;/a&gt; to affect &lt;code&gt;walk&lt;/code&gt; so it does its work in particular order:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;When &lt;code&gt;topdown&lt;/code&gt; is &lt;code&gt;True&lt;/code&gt;, the caller can modify the &lt;code&gt;dirnames&lt;/code&gt; list in-place (perhaps using &lt;code&gt;del&lt;/code&gt; or slice assignment), and &lt;code&gt;walk()&lt;/code&gt; will only recurse into the subdirectories whose names remain in &lt;code&gt;dirnames&lt;/code&gt;; this can be used to prune the search, &lt;strong&gt;impose a specific order of visiting&lt;/strong&gt;, or even to inform &lt;code&gt;walk()&lt;/code&gt; about directories the caller creates or renames before it resumes &lt;code&gt;walk()&lt;/code&gt; again. Modifying &lt;code&gt;dirnames&lt;/code&gt; when &lt;code&gt;topdown&lt;/code&gt; is &lt;code&gt;False&lt;/code&gt; has no effect on the behavior of the walk, because in bottom-up mode the directories in &lt;code&gt;dirnames&lt;/code&gt; are generated before &lt;code&gt;dirpath&lt;/code&gt; itself is generated.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;Thus, code that always traverses all dirs and files in &lt;code&gt;top_dir&lt;/code&gt; is same order will look like this:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# ...&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To add file entries (that is, regular files, dirs, etc.) into archive I’ll use &lt;a href="https://docs.python.org/3.5/library/tarfile.html#tarfile.TarFile.addfile"&gt;&lt;code&gt;TarFile.addfile&lt;/code&gt;&lt;/a&gt; instead of &lt;code&gt;TarFile.add&lt;/code&gt;. Therefore, in general, &lt;code&gt;tarfile&lt;/code&gt;-related code will look like this:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tarfile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;out.tar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;w&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tinfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gettarinfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;123123.1&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isreg&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Before getting to actual implementation, let’s discuss one more concept: arcnames. Arcname is name of file in archive, and it can, actually, be path: that is, it can be &lt;code&gt;control.txt&lt;/code&gt; as well as &lt;code&gt;somedir/1.txt&lt;/code&gt; or &lt;code&gt;/home/user/some/file&lt;/code&gt;. For our purposes, having absolute file paths in archive is undesirable, so we have to “modify” them in a way that ensures arcnames do not include parents of “top dirs” we intend to add into archive.&lt;/p&gt;&lt;h2&gt;Implementation&lt;/h2&gt;&lt;p&gt;The goal is to build a script (&lt;code&gt;mktar.py&lt;/code&gt;) that creates tar archive containing files and directories: paths are passed as command-line arguments.&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="ch"&gt;#!/usr/bin/env python3&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;tarfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;argparse&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_paths&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;relpath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_to_add&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;top_path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_paths&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="n"&gt;filepath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tarfile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;w&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;_get_to_add&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tinfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gettarinfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isreg&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;archive&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tinfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;__main__&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;paths&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;+&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--reltop&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--mtime&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;--verbose&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;store_true&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dest&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reltop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So, for example, to create archive &lt;code&gt;out.tar&lt;/code&gt; with &lt;code&gt;mtime&lt;/code&gt; pre-set to &lt;code&gt;123&lt;/code&gt;, you’d use &lt;code&gt;./mktar.py dir1/ file1 dir2/ file2 out.tar --mtime 123&lt;/code&gt;. If, say, &lt;code&gt;dir{1,2}&lt;/code&gt; and &lt;code&gt;file{1,2}&lt;/code&gt; are in &lt;code&gt;reproducible_tar/&lt;/code&gt;, and you want it to be “root” inside an archive, you could use &lt;code&gt;--reltop ../&lt;/code&gt; to modify “relative top dir” (&lt;code&gt;reltop&lt;/code&gt;), so arcnames will be built like this (command is &lt;code&gt;./mktar.py dir1/ file1 dir2/ file2 out.tar --mtime 123 --reltop ../ --verbose&lt;/code&gt; and is run from inside &lt;code&gt;reproducible_tar/&lt;/code&gt;):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;dir1/ -&amp;gt; reproducible_tar/dir1
dir1/1 -&amp;gt; reproducible_tar/dir1/1
dir1/1/a -&amp;gt; reproducible_tar/dir1/1/a
dir1/1/b -&amp;gt; reproducible_tar/dir1/1/b
dir1/1/c -&amp;gt; reproducible_tar/dir1/1/c
dir1/1/d -&amp;gt; reproducible_tar/dir1/1/d
dir1/2 -&amp;gt; reproducible_tar/dir1/2
dir1/2/aa -&amp;gt; reproducible_tar/dir1/2/aa
dir1/2/bb -&amp;gt; reproducible_tar/dir1/2/bb
dir1/2/cc -&amp;gt; reproducible_tar/dir1/2/cc
dir1/2/dd -&amp;gt; reproducible_tar/dir1/2/dd
dir1/3 -&amp;gt; reproducible_tar/dir1/3
dir1/3/aaa -&amp;gt; reproducible_tar/dir1/3/aaa
dir1/3/bbb -&amp;gt; reproducible_tar/dir1/3/bbb
dir1/3/ccc -&amp;gt; reproducible_tar/dir1/3/ccc
dir1/3/ddd -&amp;gt; reproducible_tar/dir1/3/ddd
dir2/ -&amp;gt; reproducible_tar/dir2
file1 -&amp;gt; reproducible_tar/file1
file2 -&amp;gt; reproducible_tar/file2
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now we have working implementation of reproducible tar building, and we can &lt;a href="/notes/how-to-create-.gz-reproducibly/"&gt;look at gzip&lt;/a&gt;.&lt;/p&gt;</summary><category term="cpython"></category><category term="python-3"></category><category term="reproducibility"></category></entry><entry><title>Reproducible builds and django.utils.feedgenerator</title><link href="http://b.93z.org/notes/reproducible-builds-and-django.utils.feedgenerator/" rel="alternate"></link><updated>2016-06-20T00:00:00+00:00</updated><id>http://b.93z.org/notes/reproducible-builds-and-django.utils.feedgenerator/</id><summary type="html">&lt;p&gt;Recently I’ve been working towards making builds of this blog reproducible. My goal was to allow use of regular &lt;code&gt;diff&lt;/code&gt; for spotting differences between resulting files.&lt;/p&gt;&lt;p&gt;But there was a problem. &lt;a href="https://tools.ietf.org/html/rfc4287"&gt;Atom 1.0&lt;/a&gt; feeds (e.g., &lt;a href="/notes/feed/"&gt;/notes/feed/&lt;/a&gt;, &lt;a href="/tags/django/feed/"&gt;/tags/django/feed/&lt;/a&gt;) are generated with my fork of &lt;code&gt;django.utils.feedgenerator&lt;/code&gt;. Both original and fork use &lt;code&gt;xml.sax.saxutils.XMLGenerator&lt;/code&gt; (implementation of &lt;a href="https://docs.python.org/3.4/library/xml.sax.handler.html#xml.sax.handler.ContentHandler"&gt;&lt;code&gt;ContentHandler&lt;/code&gt;&lt;/a&gt; interface) subclass called &lt;code&gt;SimplerXMLGenerator&lt;/code&gt; for XML generation, and both pass elements’ attributes into &lt;a href="https://docs.python.org/3.4/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.startElement"&gt;&lt;code&gt;startElement&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://docs.python.org/3.4/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.startElementNS"&gt;&lt;code&gt;startElementNS&lt;/code&gt;&lt;/a&gt; as regular &lt;code&gt;dict&lt;/code&gt;s. This caused random ordering of XML elements’ attributes in resulting feeds: textually feeds were changing, while semantically they were not. Despite being inconvenient for my use case (when use of specialized tools for XML comparison is undesirable), such behavior is, according to &lt;a href="https://www.w3.org/TR/2008/REC-xml-20081126/#sec-starttags"&gt;specification&lt;/a&gt;, valid:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;So I decided to take advantage of that. Both &lt;code&gt;startElement&lt;/code&gt; and &lt;code&gt;startElementNS&lt;/code&gt; methods assume that &lt;code&gt;attrs&lt;/code&gt; argument is an object that behaves like mapping (see lines &lt;a href="https://hg.python.org/cpython/file/dfc57c66a670/Lib/xml/sax/saxutils.py#l170"&gt;170&lt;/a&gt; and &lt;a href="https://hg.python.org/cpython/file/dfc57c66a670/Lib/xml/sax/saxutils.py#l195"&gt;195&lt;/a&gt; of &lt;code&gt;Lib/xml/sax/saxutils.py&lt;/code&gt;). &lt;a href="https://docs.python.org/3.4/library/collections.html#collections.OrderedDict"&gt;&lt;code&gt;OrderedDict&lt;/code&gt;&lt;/a&gt; is a mapping (like &lt;code&gt;dict&lt;/code&gt; is), therefore it is possible to provide &lt;code&gt;attrs&lt;/code&gt; (attributes of XML element) as an instance of &lt;code&gt;OrderedDict&lt;/code&gt; to preserve order of attributes. Least intrusive change—it’s a fork, after all—is to override two mentioned methods and sort &lt;code&gt;attrs&lt;/code&gt; there (&lt;code&gt;_order_attrs&lt;/code&gt;) before passing to implementation of superclass (&lt;code&gt;XMLGenerator&lt;/code&gt;):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;operator&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;collections&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;xml.sax.saxutils&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;XMLGenerator&lt;/span&gt;


&lt;span class="n"&gt;_order_attrs_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;itemgetter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_order_attrs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_order_attrs_key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimplerXMLGenerator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;XMLGenerator&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;startElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_order_attrs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;startElementNS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startElementNS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_order_attrs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# ...here goes the rest (already present in feedgenerator)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;That’s what I did. Now my blog engine does not randomly reorder attributes in feeds, and the latter are still perfectly valid XML :).&lt;/p&gt;</summary><category term="django"></category><category term="python-3"></category><category term="reproducibility"></category><category term="xml"></category></entry><entry><title>How to determine file where particular function is defined</title><link href="http://b.93z.org/notes/how-to-determine-file-where-particular-function-is-defined/" rel="alternate"></link><updated>2016-04-05T00:00:00+00:00</updated><id>http://b.93z.org/notes/how-to-determine-file-where-particular-function-is-defined/</id><summary type="html">&lt;p&gt;...at execution time, given that all you have is an instance of function object:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;class &amp;#39;function&amp;#39;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Knowing where it is defined may save you some time during debugging session: say, when you are not sure that function you are dealing with is imported from “correct” location. Despite being easy, such sanity check may help.&lt;/p&gt;&lt;p&gt;Most obvious option is to use &lt;code&gt;co_filename&lt;/code&gt; attribute of function’s code:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__code__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;co_filename&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;/home/user/.../something.py&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This also may work with methods:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SomeClass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;some_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;        &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SomeClass&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;some_method&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__code__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;co_filename&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;&amp;lt;stdin&amp;gt;&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;some_method&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__func__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__code__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;co_filename&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;&amp;lt;stdin&amp;gt;&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Usually it’s better to use &lt;code&gt;inspect.getsourcefile&lt;/code&gt; (that uses &lt;code&gt;inspect.getfile&lt;/code&gt;, which uses something similar to above, but in more consistent way—for many different types):&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;inspect&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;inspect&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getsourcefile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;/home/user/.../something.py&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;But if &lt;code&gt;func&lt;/code&gt; is defined in CPython extension (that is shared library which uses C API), &lt;code&gt;inspect.getsourcefile&lt;/code&gt; will fail with &lt;code&gt;TypeError&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;inspect&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getsourcefile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gt"&gt;Traceback (most recent call last):&lt;/span&gt;
  File &lt;span class="nb"&gt;&amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;&lt;/span&gt;, line &lt;span class="m"&gt;1&lt;/span&gt;, in &lt;span class="n"&gt;&amp;lt;module&amp;gt;&lt;/span&gt;
  File &lt;span class="nb"&gt;&amp;quot;/home/user/.../lib/python3.4/inspect.py&amp;quot;&lt;/span&gt;, line &lt;span class="m"&gt;571&lt;/span&gt;, in &lt;span class="n"&gt;getsourcefile&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  File &lt;span class="nb"&gt;&amp;quot;/home/user/.../lib/python3.4/inspect.py&amp;quot;&lt;/span&gt;, line &lt;span class="m"&gt;536&lt;/span&gt;, in &lt;span class="n"&gt;getfile&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;function, traceback, frame, or code object&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="gr"&gt;TypeError&lt;/span&gt;: &lt;span class="n"&gt;&amp;lt;built-in function func&amp;gt; is not a module, class, method, function, traceback, frame, or code object&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In such case it is still possible to get file path of &lt;code&gt;.so&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;inspect&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inspect&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getmodule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;/home/user/.../lib/python3.4/site-packages/something.cpython-34m.so&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;</summary><category term="cpython"></category><category term="python-3"></category></entry></feed>