<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Entrepreneur Being</title>
	<atom:link href="http://stasosphere.com/entrepreneur-being/feed/" rel="self" type="application/rss+xml" />
	<link>https://stasosphere.com/entrepreneur-being/</link>
	<description>What can be done without a safety net</description>
	<lastBuildDate>Tue, 19 Aug 2025 01:37:31 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>A Deep Investigation into MMAP Not Leaking Memory</title>
		<link>https://stasosphere.com/entrepreneur-being/301-mmap-memory-leak-investigation/</link>
					<comments>https://stasosphere.com/entrepreneur-being/301-mmap-memory-leak-investigation/#comments</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Thu, 29 Sep 2022 03:52:28 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[datasets]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[mmap]]></category>
		<category><![CDATA[pyarrow]]></category>
		<category><![CDATA[python]]></category>
		<guid isPermaLink="false">https://stasosphere.com/entrepreneur-being/?p=301</guid>

					<description><![CDATA[A step-by-step demonstration of mmap not leaking memory even though it appears to be leaking memory like there is no tomorrow. ]]></description>
										<content:encoded><![CDATA[
<p>This write up is going to demonstrate that while it looks that a <code>mmap</code>&#8216;ed file IO looks like it&#8217;s leaking memory it actually is not.</p>
<p>HuggingFace&#8217;s <a href="https://github.com/huggingface/datasets">datasets</a> project uses MMAP to make datasets available to multiple processes in an efficient way. This is very important since typically a machine learning training program will use a Dataloader which may use multiple workers, or alternatively the same dataset is simply accessed by multiple processes.</p>
<p>An <a href="https://github.com/huggingface/datasets/issues/4883">issue was posted</a> that suggested that a <code>datasets</code>-based program leaks memory with each iteration. This triggered an extensive research into understanding that MMAP doesn&#8217;t leak memory and bringing a lot of deepeer understanding of the different components used under the hood of <code>datasets</code>.</p>
<p>If you&#8217;d like to gain a deeper understanding into why and how please read on.<br /><span id="more-301"></span><br /></p>
<h2 id="emulating-a-computer-with-just-1gb-of-memory">Emulating a computer with just 1GB of memory</h2>
<p>Since we don&#8217;t want to crash our computer while debugging memory issues we are going to emulate a computer with just 1GB of memory and no swap memory. Unless such computer has a protection from programs using more memory than the computer has most of the time such computers start <a href="https://en.wikipedia.org/wiki/Thrashing_(computer_science)">thrashing</a> and eventually crash.</p>
<p>To accomplish that we are going to start a cgroups-controlled shell which will kill any program started from that shell and which consumes more than 1GB of memory (and give it no swap memory either):</p>
<pre>$ systemd-run --user --scope -<span class="hljs-selector-tag">p</span> MemoryHigh=<span class="hljs-number">1</span>G -<span class="hljs-selector-tag">p</span> MemoryMax=<span class="hljs-number">1</span>G -<span class="hljs-selector-tag">p</span> MemorySwapMax=<span class="hljs-number">0</span>G --setenv=<span class="hljs-string">"MEMLIMIT=1GB"</span> bash
</pre>
<p>I&#8217;m setting <code>MEMLIMIT=1GB</code> env variable so that at any moment I can check if I&#8217;m in the right shell by printing:</p>
<pre>$ <span class="hljs-built_in">echo</span> <span class="hljs-variable">$MEMLIMIT</span>
1GB</pre>
<p>Let&#8217;s validate that this shell allows a program to allocate under 1GB of RSS RAM, but will kill it if it tries to allocate more than that:</p>
<pre># <span class="hljs-number">7</span> * <span class="hljs-number">128</span>M chars
<span class="hljs-string">$ </span>python -c <span class="hljs-comment">"import sys, os, psutil; a='a'*7*2**27; print(f'{psutil.Process(os.getpid()).memory_info().rss &gt;&gt; 20}MB');"</span>
<span class="hljs-number">908</span>MB

# <span class="hljs-number">8</span> * <span class="hljs-number">128</span>M chars
<span class="hljs-string">$ </span>python -c <span class="hljs-comment">"import sys, os, psutil; a='a'*8*2**27; print(f'{psutil.Process(os.getpid()).memory_info().rss &gt;&gt; 20}MB');"</span>
<span class="hljs-type">Killed</span>
</pre>
<p>So we can see that &lt; ~1GB works, but beyond an allocation that asks for more than 1GB of resident memory gets killed.</p>
<p>In the rest of this write up let&#8217;s use shell A, which is unlimited (or rather limited to an actual available memory on your computer) and shell B, where a program started from it can only allocate 1GB of resident memory.</p>
<p>Sidenote: Linux memory management and reporting is super-complicated and one could probably easily write a whole book about it. Resident Set Size (RSS) is typically the easiest to use to measure the approximate actual memory usage by the program. It doesn&#8217;t tell you the whole truth, but most of the time it&#8217;s good enough to detect memory leaks. Therefore in this write up this is the metric we are going to use.</p>
<h2 id="simple-io-debug-program">Simple IO debug program</h2>
<p>Now let&#8217;s write a simple debug program that will create a file with a few very large lines, and then it&#8217;ll read them sequentially using a normal IO, but if we set <code>--mmap</code> it&#8217;ll switch to memory mmaped API via the <code>mmap</code> module.</p>
<p>Additionally, if <code>--accumulate</code> flag is passed the program will accumulate the lines it reads into a single string.</p>
<pre>$ cat python mmap-no-leak-debug.py
<span class="hljs-keyword">import</span> gc
<span class="hljs-keyword">import</span> mmap
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> psutil
<span class="hljs-keyword">import</span> sys

PATH = <span class="hljs-string">"./tmp.txt"</span>
<span class="hljs-comment"># create a large data file with a few long lines</span>
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(PATH):
    <span class="hljs-keyword">with</span> open(PATH, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">as</span> fh:
        s = <span class="hljs-string">'a'</span>* <span class="hljs-number">2</span>**<span class="hljs-number">27</span> + <span class="hljs-string">"\n"</span> <span class="hljs-comment"># 128MB</span>
        <span class="hljs-comment"># write ~2GB file</span>
        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">16</span>):
            fh.write(s)

proc = psutil.Process(os.getpid())
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">mem_read</span><span class="hljs-params">()</span>:</span>
    gc.collect()
    <span class="hljs-keyword">return</span> proc.memory_info().rss / <span class="hljs-number">2</span>**<span class="hljs-number">20</span>

print(f<span class="hljs-string">"{'idx':&gt;4} {'RSS':&gt;10}   {'Δ RSS':&gt;12}   {'Δ accumulated':&gt;10}"</span>)

content = <span class="hljs-string">''</span>
mem_after = mem_before_acc = mem_after_acc = mem_before = proc.memory_info().rss / <span class="hljs-number">2</span>**<span class="hljs-number">20</span>
print(f<span class="hljs-string">"{0:4d} {mem_after:10.2f}MB {mem_after - 0:10.2f}MB {0:10.2f}MB"</span>)

mmap_mode = <span class="hljs-keyword">True</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"--mmap"</span> <span class="hljs-keyword">in</span> sys.argv <span class="hljs-keyword">else</span> <span class="hljs-keyword">False</span>

<span class="hljs-keyword">with</span> open(PATH, <span class="hljs-string">"r"</span>) <span class="hljs-keyword">as</span> fh:

    <span class="hljs-keyword">if</span> mmap_mode:
        mm = mmap.mmap(fh.fileno(), <span class="hljs-number">0</span>, access=mmap.ACCESS_READ)

    idx = <span class="hljs-number">0</span>
    <span class="hljs-keyword">while</span> <span class="hljs-keyword">True</span>:
        idx += <span class="hljs-number">1</span>
        mem_before = mem_read()
        line = mm.readline() <span class="hljs-keyword">if</span> mmap_mode <span class="hljs-keyword">else</span> fh.readline()
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> line:
            <span class="hljs-keyword">break</span>

        <span class="hljs-keyword">if</span> <span class="hljs-string">"--accumulate"</span> <span class="hljs-keyword">in</span> sys.argv:
            mem_before_acc = mem_read()
            content += str(line)
            mem_after_acc = mem_read()

        mem_after = mem_read()

        print(f<span class="hljs-string">"{idx:4d} {mem_after:10.2f}MB {mem_after - mem_before:10.2f}MB {mem_after_acc - mem_before_acc:10.2f}MB"</span>)</pre>
<p>The four output columns are:</p>
<pre> <span class="hljs-attribute">idx</span>        RSS          Δ RSS   Δ accumulated</pre>
<ol>
<li>the line number (starting from 1)</li>
<li>the total RSS reported at the end of each iteration</li>
<li>the RSS delta of each iteration</li>
<li>the accumulated buffer delta</li>
</ol>
<p>And as you can see we force Python&#8217;s garbage collection via <code>gc.collect()</code> before taking RSS (Resident Set Size) measurements. This is a very crucial step when debugging memory usages and leaks in particular and especially if you delete some objects and want to make sure that memory is actually freed as Python&#8217;s garbage collection mechanism is not immediate.</p>
<h2 id="normal-io-diagnostics">Normal IO diagnostics</h2>
<p>First, let&#8217;s run normal IO without accumulating any strings and simply discarding those.</p>
<pre>shell A $ python mmap-no-leak-debug.py
 idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.37</span>MB      <span class="hljs-number">12.37</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">269.66</span>MB     <span class="hljs-number">257.29</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">269.68</span>MB       <span class="hljs-number">0.02</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">269.68</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">4</span>     <span class="hljs-number">269.69</span>MB       <span class="hljs-number">0.01</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">5</span>     <span class="hljs-number">269.69</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">6</span>     <span class="hljs-number">269.70</span>MB       <span class="hljs-number">0.01</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">7</span>     <span class="hljs-number">269.70</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">8</span>     <span class="hljs-number">269.70</span>MB       <span class="hljs-number">0.01</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">9</span>     <span class="hljs-number">269.70</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">10</span>     <span class="hljs-number">269.71</span>MB       <span class="hljs-number">0.01</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">11</span>     <span class="hljs-number">269.71</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">12</span>     <span class="hljs-number">269.71</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">13</span>     <span class="hljs-number">269.71</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">14</span>     <span class="hljs-number">269.71</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">15</span>     <span class="hljs-number">269.71</span>MB       <span class="hljs-number">0.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">16</span>     <span class="hljs-number">145.96</span>MB    <span class="hljs-number">-123.75</span>MB       <span class="hljs-number">0.00</span>MB</pre>
<p>We read in a loop a 128MB line and discard it.</p>
<p>We can see the memory is very low and steady, with some fluctuations when Python decided to release some memory. The program allocates more than 128MB due to a new line character in the string &#8211; this is a peculiar Python behavior.</p>
<p>The bottom line is that the program doesn&#8217;t appear to be leaking any memory.</p>
<h2 id="mmap-ed-io-diagnostics">MMAP&#8217;ed IO diagnostics</h2>
<p>Now let&#8217;s do the exact same operation but this time using <code>mmap</code>&#8216;s IO:</p>
<pre>shell A $ python mmap-no-leak-debug.py --mmap
idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.39</span>MB      <span class="hljs-number">12.39</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">268.25</span>MB     <span class="hljs-number">255.87</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">396.47</span>MB     <span class="hljs-number">128.22</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">524.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">4</span>     <span class="hljs-number">652.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">5</span>     <span class="hljs-number">780.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">6</span>     <span class="hljs-number">908.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">7</span>    <span class="hljs-number">1036.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">8</span>    <span class="hljs-number">1164.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">9</span>    <span class="hljs-number">1292.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">10</span>    <span class="hljs-number">1420.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">11</span>    <span class="hljs-number">1548.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">12</span>    <span class="hljs-number">1676.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">13</span>    <span class="hljs-number">1804.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">14</span>    <span class="hljs-number">1932.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">15</span>    <span class="hljs-number">2060.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">16</span>    <span class="hljs-number">2188.47</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB</pre>
<p>Whoah! It looks like there is a major leak here. On each iteration the program keeps on growing by 128MB despite us discarding the read data. What&#8217;s going on?</p>
<p>The theoretical explanation is simple &#8211; MMAP was designed to make IO faster and shared by multiple processes &#8211; so if there is a lot of available RAM, the MMAP API will use as much of it as it can and in order to speed things up it won&#8217;t normally release it back to the OS. For example, if you have two programs reading the same sections from the same MMAP&#8217;ed file only the first program will incur the delay of copying the data from disc to RAM. The other program will read it directly from RAM. Since MMAP doesn&#8217;t know which sections will be accessed next it simply keeps everything it read in the memory if there is enough of it.</p>
<p>But you&#8217;d say this is very bad and that&#8217;s a terrible design. But wait, it only keeps it in memory if nobody else wants the memory, and it immediately releases that unused memory back to the operating system as soon as such demand arises.</p>
<h2 id="proof-that-there-is-no-leak">Proof that there is no leak</h2>
<p>To show that the memory does get released as soon as it&#8217;s needed let&#8217;s re-run this same program in shell B, where only 1GB of memory is allowed to be allocated.</p>
<pre>shell B $ systemd-run --user --scope -p MemoryHigh=<span class="hljs-number">1</span>G -p MemoryMax=<span class="hljs-number">1</span>G -p MemorySwapMax=<span class="hljs-number">0</span>G --setenv=<span class="hljs-string">"MEMLIMIT=1GB"</span> bash
shell B $ python mmap-no-leak-debug.py --mmap
 idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.48</span>MB      <span class="hljs-number">12.48</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">268.51</span>MB     <span class="hljs-number">256.03</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">396.73</span>MB     <span class="hljs-number">128.22</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">524.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">4</span>     <span class="hljs-number">652.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">5</span>     <span class="hljs-number">780.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">6</span>     <span class="hljs-number">908.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">7</span>    <span class="hljs-number">1036.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">8</span>    <span class="hljs-number">1164.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">9</span>    <span class="hljs-number">1292.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">10</span>    <span class="hljs-number">1420.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">11</span>    <span class="hljs-number">1548.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">12</span>    <span class="hljs-number">1676.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">13</span>    <span class="hljs-number">1804.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">14</span>    <span class="hljs-number">1932.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">15</span>    <span class="hljs-number">2060.73</span>MB     <span class="hljs-number">128.00</span>MB       <span class="hljs-number">0.00</span>MB
  <span class="hljs-number">16</span>    <span class="hljs-number">2188.69</span>MB     <span class="hljs-number">127.95</span>MB       <span class="hljs-number">0.00</span>MB</pre>
<p>A surprise, it appears that the program managed to allocate &gt;2GB of memory when we double checked that it should have been killed as soon as it reached 1GB RSS since we limited the shell to allow only &lt;1GB memory allocation!</p>
<p>We will understand better shortly what&#8217;s going on, but it&#8217;s clear that cgroups that controls the memory usage is aware that while it accounts that MMAP&#8217;ed memory to the RSS counter of the program it&#8217;s aware that the program itself isn&#8217;t using most of this memory!</p>
<p>Interim observation: we can&#8217;t rely on RSS memory stats to diagnose memory leaks when MMAP is used.</p>
<h2 id="let-s-create-memory-pressure">Let&#8217;s create memory pressure</h2>
<p>This is where our <code>--accumulate</code> flag comes in. It&#8217;s going to help us to see that RSS is &#8220;misreporting&#8221; the actual memory used by the program.</p>
<p>First we run it with normal IO:</p>
<pre>shell A $ python mmap-no-leak-debug.py --accumulate
 idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.30</span>MB      <span class="hljs-number">12.30</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">269.60</span>MB     <span class="hljs-number">257.29</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">525.49</span>MB     <span class="hljs-number">255.89</span>MB     <span class="hljs-number">127.93</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">653.49</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">4</span>     <span class="hljs-number">781.50</span>MB     <span class="hljs-number">128.01</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">5</span>     <span class="hljs-number">909.50</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">6</span>    <span class="hljs-number">1037.51</span>MB     <span class="hljs-number">128.01</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">7</span>    <span class="hljs-number">1165.51</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">8</span>    <span class="hljs-number">1293.52</span>MB     <span class="hljs-number">128.01</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">9</span>    <span class="hljs-number">1421.52</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">10</span>    <span class="hljs-number">1549.53</span>MB     <span class="hljs-number">128.01</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">11</span>    <span class="hljs-number">1677.53</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">12</span>    <span class="hljs-number">1805.53</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">13</span>    <span class="hljs-number">1933.53</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">14</span>    <span class="hljs-number">2061.53</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">15</span>    <span class="hljs-number">2189.53</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
  <span class="hljs-number">16</span>    <span class="hljs-number">2193.78</span>MB       <span class="hljs-number">4.25</span>MB     <span class="hljs-number">127.87</span>MB</pre>
<p>where RSS reports correctly <code>128*16 ~= 2048</code>MB and then some for the other bits of the program, but the ballpark matches.</p>
<p>Now let&#8217;s activate MMAP and re-run:</p>
<pre>shell A $ python mmap-no-leak-debug.py --mmap --accumulate
 idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.37</span>MB      <span class="hljs-number">12.37</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">396.39</span>MB     <span class="hljs-number">384.02</span>MB     <span class="hljs-number">128.13</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">652.48</span>MB     <span class="hljs-number">256.09</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">908.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">4</span>    <span class="hljs-number">1164.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">5</span>    <span class="hljs-number">1420.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">6</span>    <span class="hljs-number">1676.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">7</span>    <span class="hljs-number">1932.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">8</span>    <span class="hljs-number">2188.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">9</span>    <span class="hljs-number">2444.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">10</span>    <span class="hljs-number">2700.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">11</span>    <span class="hljs-number">2956.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">12</span>    <span class="hljs-number">3212.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">13</span>    <span class="hljs-number">3468.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">14</span>    <span class="hljs-number">3724.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">15</span>    <span class="hljs-number">3980.48</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
  <span class="hljs-number">16</span>    <span class="hljs-number">4236.46</span>MB     <span class="hljs-number">255.98</span>MB     <span class="hljs-number">128.00</span>MB</pre>
<p>Here we can see that RSS reports 2x memory than it actually uses.</p>
<p>And now let&#8217;s create pressure using our 1GB-limited shell B and use normal IO with accumulation:</p>
<pre>shell B $ systemd-run --user --scope -p MemoryHigh=<span class="hljs-number">1</span>G -p MemoryMax=<span class="hljs-number">1</span>G -p MemorySwapMax=<span class="hljs-number">0</span>G --setenv=<span class="hljs-string">"MEMLIMIT=1GB"</span> bash
shell B $ python mmap-no-leak-debug.py --accumulate
 idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.38</span>MB      <span class="hljs-number">12.38</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">269.41</span>MB     <span class="hljs-number">257.04</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">525.55</span>MB     <span class="hljs-number">256.14</span>MB     <span class="hljs-number">127.93</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">653.55</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">4</span>     <span class="hljs-number">781.56</span>MB     <span class="hljs-number">128.01</span>MB     <span class="hljs-number">127.87</span>MB
   <span class="hljs-number">5</span>     <span class="hljs-number">909.56</span>MB     <span class="hljs-number">128.00</span>MB     <span class="hljs-number">127.87</span>MB
Killed</pre>
<p>As you can easily see the program gets killed once it reaches 1GB of RSS. It managed to perform 5 iterations, thus on iteration 6 it tries to accumulate <code>6*128=768</code> plus the current <code>readline</code> read of 128MB, plus the memory used by the rest of the program, it crosses 1GB and gets killed before finishing iteration 6.</p>
<p>Also it might be useful to compare with the same run with shell A. You can see that RSS of the shell B run is quite different from shell A. The reported RSS doesn&#8217;t grow as fast.</p>
<p>Now let&#8217;s run the MMAPed version:</p>
<pre>shell B $ systemd-run --user --scope -p MemoryHigh=<span class="hljs-number">1</span>G -p MemoryMax=<span class="hljs-number">1</span>G -p MemorySwapMax=<span class="hljs-number">0</span>G --setenv=<span class="hljs-string">"MEMLIMIT=1GB"</span> bash
shell B $ python mmap-no-leak-debug.py --mmap --accumulate
 idx        RSS          Δ RSS   Δ accumulated
   <span class="hljs-number">0</span>      <span class="hljs-number">12.51</span>MB      <span class="hljs-number">12.51</span>MB       <span class="hljs-number">0.00</span>MB
   <span class="hljs-number">1</span>     <span class="hljs-number">396.52</span>MB     <span class="hljs-number">384.00</span>MB     <span class="hljs-number">128.13</span>MB
   <span class="hljs-number">2</span>     <span class="hljs-number">652.60</span>MB     <span class="hljs-number">256.08</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">3</span>     <span class="hljs-number">908.60</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">4</span>    <span class="hljs-number">1164.60</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
   <span class="hljs-number">5</span>    <span class="hljs-number">1420.60</span>MB     <span class="hljs-number">256.00</span>MB     <span class="hljs-number">128.00</span>MB
Killed
</pre>
<p>You can see it gets killed in the exactly same iteration as when it was run without MMAP.</p>
<p>You can see that while the RSS numbers are bigger than that of the normal IO run, the program gets killed in the exact same iteration. which tells us the actual memory usage with normal IO and mmap&#8217;ed IO is either very similar or very likely exactly the same.</p>
<h2 id="enter-huggingface-datasets">What about PyArrow?</h2>
<p>Originally this whole research started from this <a href="https://github.com/huggingface/datasets/issues/4883">Issue</a> in the <a href="https://github.com/huggingface/datasets"><code>datasets</code></a> repo. It looked like a dataset loaded via <code>pyarrow</code> leaked on every iteration.</p>
<p><a href="https://github.com/lhoestq">Quentin Lhoest</a> reduced it to <a href="https://github.com/huggingface/datasets/issues/4883#issuecomment-1242034985">a simple <code>pyarrow</code> program</a></p>
<pre>$ cat mmap-no-leak-<span class="hljs-built_in">debug</span>-pyarrow.py
<span class="hljs-keyword">import</span> psutil
<span class="hljs-keyword">import</span> <span class="hljs-built_in">os</span>
<span class="hljs-keyword">import</span> gc
<span class="hljs-keyword">import</span> pyarrow as pa

ARROW_PATH = <span class="hljs-string">"tmp.arrow"</span>

<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-built_in">os</span>.path.exists(ARROW_PATH):
    arr = pa.array([b<span class="hljs-string">"a"</span> * (<span class="hljs-number">200</span> * <span class="hljs-number">1024</span>)] * <span class="hljs-number">1000</span>)  # ~<span class="hljs-number">200</span>MB
    <span class="hljs-built_in">table</span> = pa.<span class="hljs-built_in">table</span>({<span class="hljs-string">"a"</span>: arr})

    with open(ARROW_PATH, <span class="hljs-string">"wb"</span>) as <span class="hljs-name">f</span>:
        writer = pa.RecordBatchStreamWriter(f, schema=<span class="hljs-built_in">table</span>.schema)
        writer.write_table(<span class="hljs-built_in">table</span>)
        writer.close()

def memory_mapped_arrow_table_from_file(<span class="hljs-name">filename</span>: str) -&gt; pa.<span class="hljs-name">Table</span>:
    memory_mapped_stream = pa.memory_map(filename)
    opened_stream = pa.ipc.open_stream(memory_mapped_stream)
    pa_table = opened_stream.read_all()
    <span class="hljs-keyword">return</span> pa_table


<span class="hljs-built_in">table</span> = memory_mapped_arrow_table_from_file(ARROW_PATH)
arr = <span class="hljs-built_in">table</span>[<span class="hljs-number">0</span>]

<span class="hljs-built_in">print</span>(f<span class="hljs-string">"{'idx':&gt;8} {'RSS':&gt;10} {'Δ RSS':&gt;15}"</span>)

mem_before = psutil.Process(<span class="hljs-built_in">os</span>.getpid()).memory_info().rss / (<span class="hljs-number">1024</span> * <span class="hljs-number">1024</span>)
<span class="hljs-keyword">for</span> idx, x <span class="hljs-keyword">in</span> enumerate(arr):
    <span class="hljs-keyword">if</span> idx % <span class="hljs-number">100</span> == <span class="hljs-number">0</span>:
        gc.collect()
        mem_after = psutil.Process(<span class="hljs-built_in">os</span>.getpid()).memory_info().rss / (<span class="hljs-number">1024</span> * <span class="hljs-number">1024</span>)
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"{idx:4d}  {mem_after:12.4f}MB {mem_after - mem_before:12.4f}MB"</span>)
</pre>
<p>which when run produced the familiar leak-like pattern:</p>
<pre>$ python mmap-no-leak-debug-pyarrow.py
     idx        RSS           Δ RSS
   <span class="hljs-number">0</span>       <span class="hljs-number">51.3164</span>MB       <span class="hljs-number">2.5430</span>MB
 <span class="hljs-number">100</span>       <span class="hljs-number">69.9805</span>MB      <span class="hljs-number">21.2070</span>MB
 <span class="hljs-number">200</span>       <span class="hljs-number">90.6055</span>MB      <span class="hljs-number">41.8320</span>MB
 <span class="hljs-number">300</span>      <span class="hljs-number">107.1055</span>MB      <span class="hljs-number">58.3320</span>MB
 <span class="hljs-number">400</span>      <span class="hljs-number">127.7305</span>MB      <span class="hljs-number">78.9570</span>MB
 <span class="hljs-number">500</span>      <span class="hljs-number">148.3555</span>MB      <span class="hljs-number">99.5820</span>MB
 <span class="hljs-number">600</span>      <span class="hljs-number">164.8555</span>MB     <span class="hljs-number">116.0820</span>MB
 <span class="hljs-number">700</span>      <span class="hljs-number">185.4805</span>MB     <span class="hljs-number">136.7070</span>MB
 <span class="hljs-number">800</span>      <span class="hljs-number">206.1055</span>MB     <span class="hljs-number">157.3320</span>MB
 <span class="hljs-number">900</span>      <span class="hljs-number">226.7305</span>MB     <span class="hljs-number">177.9570</span>MB
</pre>
<p>But if we run it from a shell that is only allowed 100MB of allocated memory:</p>
<pre>$ systemd-run --user --scope -p MemoryHigh=<span class="hljs-number">0.1</span>G -p MemoryMax=<span class="hljs-number">0.1</span>G -p MemorySwapMax=<span class="hljs-number">0</span>G --setenv=<span class="hljs-string">"MEMLIMIT=0.1GB"</span> bash
$ python mmap-no-leak-debug-pyarrow.py
     idx        RSS           Δ RSS
   <span class="hljs-number">0</span>       <span class="hljs-number">51.2852</span>MB       <span class="hljs-number">2.4609</span>MB
 <span class="hljs-number">100</span>       <span class="hljs-number">70.4102</span>MB      <span class="hljs-number">21.5859</span>MB
 <span class="hljs-number">200</span>       <span class="hljs-number">86.9102</span>MB      <span class="hljs-number">38.0859</span>MB
 <span class="hljs-number">300</span>      <span class="hljs-number">107.5352</span>MB      <span class="hljs-number">58.7109</span>MB
 <span class="hljs-number">400</span>      <span class="hljs-number">128.1602</span>MB      <span class="hljs-number">79.3359</span>MB
 <span class="hljs-number">500</span>      <span class="hljs-number">148.7852</span>MB      <span class="hljs-number">99.9609</span>MB
 <span class="hljs-number">600</span>      <span class="hljs-number">165.2852</span>MB     <span class="hljs-number">116.4609</span>MB
 <span class="hljs-number">700</span>      <span class="hljs-number">185.9102</span>MB     <span class="hljs-number">137.0859</span>MB
 <span class="hljs-number">800</span>      <span class="hljs-number">206.5352</span>MB     <span class="hljs-number">157.7109</span>MB
 <span class="hljs-number">900</span>      <span class="hljs-number">227.1602</span>MB     <span class="hljs-number">178.3359</span>MB</pre>
<p>So it reports it allocated ~200MB of RSS, yet it runs just fine without getting killed.</p>
<p>There is no leak here.</p>
<h2 id="what-about-datasets-">What about HuggingFace datasets?</h2>
<p>In another <a href="https://github.com/huggingface/datasets/issues/4528">Issue</a> a very similar datasets-iterator-is-leaking report was submitted.</p>
<p>So let&#8217;s use a similar <code>datasets</code> reproduction example here but we will use a larger dataset.</p>
<pre>$ cat mmap-no-leak-debug-datasets.py
<span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset
<span class="hljs-keyword">import</span> gc
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> psutil
<span class="hljs-keyword">import</span> sys

keep_in_memory = <span class="hljs-keyword">True</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"in-mem"</span> <span class="hljs-keyword">in</span> sys.argv <span class="hljs-keyword">else</span> <span class="hljs-keyword">False</span>

proc = psutil.Process(os.getpid())
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">mem_read</span><span class="hljs-params">()</span>:</span>
    gc.collect()
    <span class="hljs-keyword">return</span> proc.memory_info().rss / <span class="hljs-number">2</span>**<span class="hljs-number">20</span>

dataset = load_dataset(<span class="hljs-string">"wmt19"</span>, <span class="hljs-string">'cs-en'</span>, keep_in_memory=keep_in_memory, streaming=keep_in_memory)[<span class="hljs-string">'train'</span>]
print(f<span class="hljs-string">"Dataset len={len(dataset)}"</span>)

print(f<span class="hljs-string">"{'idx':&gt;8} {'RSS':&gt;10} {'Δ RSS':&gt;15}"</span>)
step = <span class="hljs-number">1</span>_000_000
mem_start = <span class="hljs-number">0</span>
<span class="hljs-keyword">for</span> idx, i <span class="hljs-keyword">in</span> enumerate(range(<span class="hljs-number">0</span>, len(dataset), step)):
    <span class="hljs-keyword">if</span> idx == <span class="hljs-number">4</span>: <span class="hljs-comment"># skip the first few iterations while things get set up</span>
        mem_start = mem_read()
    mem_before = mem_read()
    x = dataset[i:i+step]
    mem_after = mem_read()
    print(f<span class="hljs-string">"{idx:8d} {mem_after:12.4f}MB {mem_after - mem_before:12.4f}MB "</span>)
mem_end = mem_read()

print(f<span class="hljs-string">"Total diff: {mem_end - mem_start:12.4f}MB "</span>)</pre>
<p>Let&#8217;s run it in a normal shell first:</p>
<pre>$ python mmap-no-leak-debug-datasets.py
Dataset len=7270695
     idx        RSS           Δ RSS
      <span class="hljs-number"> 0 </span>    775.7773MB     609.9805MB
      <span class="hljs-number"> 1 </span>    849.6016MB      73.8242MB
      <span class="hljs-number"> 2 </span>    876.1445MB      26.5430MB
      <span class="hljs-number"> 3 </span>    941.3477MB      65.2031MB
      <span class="hljs-number"> 4 </span>    984.9570MB      43.6094MB
      <span class="hljs-number"> 5 </span>   1053.6445MB      68.6875MB
      <span class="hljs-number"> 6 </span>   1164.2852MB     110.6406MB
      <span class="hljs-number"> 7 </span>   1252.5312MB      88.2461MB
      <span class="hljs-number"> 8 </span>   1368.6523MB     116.1211MB
      <span class="hljs-number"> 9 </span>   1445.7266MB      77.0742MB
     <span class="hljs-number"> 10 </span>   1564.5195MB     118.7930MB
     <span class="hljs-number"> 11 </span>   1678.7500MB     114.2305MB
     <span class="hljs-number"> 12 </span>   1729.9844MB      51.2344MB
     <span class="hljs-number"> 13 </span>   1866.1953MB     136.2109MB
Total diff:    1700.3984MB</pre>
<p>You can see the mid-column of total RSS memory keeps on growing in MBs. The last column is by how much it has grown during a single iteration of the script (0.5M items).</p>
<p>And now let&#8217;s run in a 1GB limited shell:</p>
<pre>$ systemd-run --user --scope -p MemoryHigh=1G -p MemoryMax=1G -p MemorySwapMax=0G --setenv="MEMLIMIT=1GB" bash
$ python mmap-no-leak-debug-datasets.py
Dataset len=7270695
     idx        RSS           Δ RSS
      <span class="hljs-number"> 0 </span>    775.8516MB     610.1797MB
      <span class="hljs-number"> 1 </span>    849.5820MB      73.7305MB
      <span class="hljs-number"> 2 </span>    876.1328MB      26.5508MB
      <span class="hljs-number"> 3 </span>    941.3281MB      65.1953MB
      <span class="hljs-number"> 4 </span>    984.9375MB      43.6094MB
      <span class="hljs-number"> 5 </span>   1053.6328MB      68.6953MB
      <span class="hljs-number"> 6 </span>   1164.0273MB     110.3945MB
      <span class="hljs-number"> 7 </span>   1252.5273MB      88.5000MB
      <span class="hljs-number"> 8 </span>   1368.3906MB     115.8633MB
      <span class="hljs-number"> 9 </span>   1445.7188MB      77.3281MB
     <span class="hljs-number"> 10 </span>   1564.2656MB     118.5469MB
     <span class="hljs-number"> 11 </span>   1678.7383MB     114.4727MB
     <span class="hljs-number"> 12 </span>   1729.7227MB      50.9844MB
     <span class="hljs-number"> 13 </span>   1866.1875MB     136.4648MB
Total diff:    1700.5156MB</pre>
<p>No problem at all.</p>
<p>So we now know there is no leak there and it&#8217;s just the OS includes in RSS memory that will be released as soon as it&#8217;s needed.</p>
<h2 id="debbuging-real-leak-while-using-mmap">How to debug real memory leaks while using MMAP</h2>
<p>So how does one debug an actual memory that might be elsewhere in the code while using MMAP.</p>
<p>Well, you have to disable MMAP for the duration of your debug session and then re-enabled it back when you want high performance.</p>
<p>As you have seen at the beginning of this article switching from <code>mmap</code> to normal IO is very simple to do.</p>
<p>In the case of <code>datasets</code> you&#8217;d turn MMAP functionality off with <code>keep_in_memory=True</code> as in:</p>
<pre><code><span class="hljs-attr">dataset</span> = load_dataset(<span class="hljs-string">"wmt19"</span>, <span class="hljs-string">'cs-en'</span>, keep_in_memory=<span class="hljs-literal">True</span>, streaming=<span class="hljs-literal">False</span>)[<span class="hljs-string">'train'</span>]
</code></pre>
<p>This loads the dataset in RAM, and now you should be able to debug your potential leak.</p>
<p>Let&#8217;s test after modifying our last program:</p>
<pre>- dataset = load_dataset(<span class="hljs-string">"wmt19"</span>, <span class="hljs-symbol">'cs</span>-en', keep_in_memory=<span class="hljs-literal">False</span>, streaming=<span class="hljs-literal">False</span>)[<span class="hljs-symbol">'train</span>']
+ dataset = load_dataset(<span class="hljs-string">"wmt19"</span>, <span class="hljs-symbol">'cs</span>-en', keep_in_memory=<span class="hljs-literal">True</span>, streaming=<span class="hljs-literal">False</span>)[<span class="hljs-symbol">'train</span>']</pre>
<p>Now in the normal unlimited shell we run:</p>
<pre>$ python mmap-no-leak-debug-datasets.py --in-mem
Dataset len=7270695
     idx        RSS           Δ RSS
      <span class="hljs-number"> 0 </span>   1849.5391MB     469.5781MB
      <span class="hljs-number"> 1 </span>   1833.0391MB     -16.5000MB
      <span class="hljs-number"> 2 </span>   1803.4609MB     -29.5781MB
      <span class="hljs-number"> 3 </span>   1811.5312MB       8.0703MB
      <span class="hljs-number"> 4 </span>   1803.9531MB      -7.5781MB
      <span class="hljs-number"> 5 </span>   1811.7734MB       7.8203MB
      <span class="hljs-number"> 6 </span>   1836.0391MB      24.2656MB
      <span class="hljs-number"> 7 </span>   1839.5938MB       3.5547MB
      <span class="hljs-number"> 8 </span>   1855.9688MB      16.3750MB
      <span class="hljs-number"> 9 </span>   1850.5430MB      -5.4258MB
     <span class="hljs-number"> 10 </span>   1865.3398MB      14.7969MB
     <span class="hljs-number"> 11 </span>   1876.2461MB      10.9062MB
     <span class="hljs-number"> 12 </span>   1853.0469MB     -23.1992MB
     <span class="hljs-number"> 13 </span>   1881.4453MB      28.3984MB
Total diff:     501.4844MB</pre>
<p>The RSS memory is more stable but fluctuates because the records are different, and the dataset can be huge to load into memory.</p>
<h2 id="using-synthetic-mmap-disabled-dataset-to-debug-memory-leaks">Using synthetic MMAP-disabled dataset to debug memory leaks</h2>
<p>Therefore the easiest approach is to create a synthetic dataset of desired length with all records being the same. That way the data is no longer a factor in the memory usage patterns as it&#8217;s always the same.</p>
<pre>$ <span class="hljs-keyword">cat</span> <span class="hljs-keyword">ds</span>-synthetic-<span class="hljs-keyword">no</span>-mmap.<span class="hljs-keyword">py</span>
from datasets import load_from_disk, Dataset
import gc
import sys
import os
import psutil

proc = psutil.Process(os.<span class="hljs-built_in">getpid</span>())
def mem_read():
    gc.collect()
    <span class="hljs-keyword">return</span> proc.memory_info().rss / <span class="hljs-number">2</span>**<span class="hljs-number">20</span>

DS_PATH = <span class="hljs-string">"synthetic-ds"</span>
<span class="hljs-keyword">if</span> not os.path.<span class="hljs-built_in">exists</span>(DS_PATH):
    records = <span class="hljs-number">1</span>_000_000
    <span class="hljs-keyword">print</span>(<span class="hljs-string">"Creating a synthetic dataset"</span>)
    row = dict(foo=[dict(<span class="hljs-keyword">a</span>=<span class="hljs-string">'a'</span>*<span class="hljs-number">500</span>, <span class="hljs-keyword">b</span>=<span class="hljs-string">'b'</span>*<span class="hljs-number">1000</span>)])
    <span class="hljs-keyword">ds</span> = Dataset.from_dict({<span class="hljs-keyword">k</span>: [v] * records <span class="hljs-keyword">for</span> <span class="hljs-keyword">k</span>, v in row.<span class="hljs-built_in">items</span>()})
    <span class="hljs-keyword">ds</span>.save_to_disk(DS_PATH)
    <span class="hljs-keyword">print</span>(<span class="hljs-string">"Done. Please restart the program"</span>)
    sys.<span class="hljs-keyword">exit</span>()

dataset = load_from_disk(DS_PATH, keep_in_memory=True)
<span class="hljs-keyword">print</span>(<span class="hljs-keyword">f</span><span class="hljs-string">"Dataset len={len(dataset)}"</span>)

<span class="hljs-keyword">print</span>(<span class="hljs-keyword">f</span><span class="hljs-string">"{'idx':&gt;8} {'RSS':&gt;10} {'Δ RSS':&gt;15}"</span>)
mem_start = <span class="hljs-number">0</span>
step = <span class="hljs-number">50</span>_000
warmup_iterations = <span class="hljs-number">4</span>
<span class="hljs-keyword">for</span> idx, i in enumerate(<span class="hljs-built_in">range</span>(<span class="hljs-number">0</span>, <span class="hljs-built_in">len</span>(dataset), step)):
    <span class="hljs-keyword">if</span> idx == warmup_iteration<span class="hljs-variable">s:</span> # skip the <span class="hljs-keyword">first</span> few iterations <span class="hljs-keyword">while</span> things <span class="hljs-built_in">get</span> <span class="hljs-keyword">set</span> <span class="hljs-keyword">up</span>
        mem_start = mem_read()
    mem_before = mem_read()
    _ = dataset[i:i+step]
    mem_after = mem_read()
    <span class="hljs-keyword">print</span>(<span class="hljs-keyword">f</span><span class="hljs-string">"{i:8d} {mem_after:12.4f}MB {mem_after - mem_before:12.4f}MB"</span>)
mem_end = mem_read()

<span class="hljs-keyword">print</span>(<span class="hljs-keyword">f</span><span class="hljs-string">"Total diff: {mem_end - mem_start:12.4f}MB (after {warmup_iterations} warmup iterations)"</span>)
</pre>
<p>We run this program once to create the dataset, and then the second time to profile its memory usage:</p>
<pre>$ python ds-synthetic-no-mmap<span class="hljs-selector-class">.py</span>
Creating <span class="hljs-selector-tag">a</span> synthetic dataset
Done. Please restart the program</pre>
<pre>$ python ds-synthetic-no-mmap.py
Dataset len=1000000
     idx        RSS           Δ RSS
      <span class="hljs-number"> 0 </span>   1649.6055MB      95.1992MB
  <span class="hljs-number"> 50000 </span>   1728.4961MB      78.8906MB
 <span class="hljs-number"> 100000 </span>   1728.7109MB       0.2148MB
 <span class="hljs-number"> 150000 </span>   1729.2539MB       0.5430MB
 <span class="hljs-number"> 200000 </span>   1729.0039MB      -0.2500MB
 <span class="hljs-number"> 250000 </span>   1729.5039MB       0.5000MB
 <span class="hljs-number"> 300000 </span>   1729.2539MB      -0.2500MB
 <span class="hljs-number"> 350000 </span>   1729.7539MB       0.5000MB
 <span class="hljs-number"> 400000 </span>   1729.5039MB      -0.2500MB
 <span class="hljs-number"> 450000 </span>   1730.0039MB       0.5000MB
 <span class="hljs-number"> 500000 </span>   1729.7539MB      -0.2500MB
 <span class="hljs-number"> 550000 </span>   1730.2539MB       0.5000MB
 <span class="hljs-number"> 600000 </span>   1730.0039MB      -0.2500MB
 <span class="hljs-number"> 650000 </span>   1730.5039MB       0.5000MB
 <span class="hljs-number"> 700000 </span>   1730.2539MB      -0.2500MB
 <span class="hljs-number"> 750000 </span>   1730.7539MB       0.5000MB
 <span class="hljs-number"> 800000 </span>   1730.5039MB      -0.2500MB
 <span class="hljs-number"> 850000 </span>   1731.0039MB       0.5000MB
 <span class="hljs-number"> 900000 </span>   1730.7539MB      -0.2500MB
 <span class="hljs-number"> 950000 </span>   1731.2539MB       0.5000MB
Total diff:       2.0000MB (after<span class="hljs-number"> 4 </span>warmup iterations)
</pre>
<p>This is much better. There are still tiny fluctuations due to Python and you can see in the code I skipped the first few iterations in the code while things are being set up.</p>
<p>But otherwise now you can easily debug the rest of your code for any memory leaks since <code>datasets</code> are in non-MMAP mode and the records size doesn&#8217;t fluctuate.</p>
<p>Of course, do not forget to flip <code>load_from_disk(..., keep_in_memory=True)</code> to <code>False</code> when the debugging process is over so that you get back the performance speed up provided by MMAP.</p>
<p>I wrote these notes mainly for myself to ensure I have a good understanding of this complex use-case. And I hope you have gained some understanding from it as well.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/301-mmap-memory-leak-investigation/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Getting NVIDIA A100-80GB PCIe to work on a consumer motherboard with custom water cooling</title>
		<link>https://stasosphere.com/entrepreneur-being/262-getting-nvidia-a100-80gb-pcie-to-work-on-a-consumer-motherboard-with-custom-water-cooling/</link>
					<comments>https://stasosphere.com/entrepreneur-being/262-getting-nvidia-a100-80gb-pcie-to-work-on-a-consumer-motherboard-with-custom-water-cooling/#comments</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Thu, 14 Apr 2022 18:58:59 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[a100]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[nvidia]]></category>
		<category><![CDATA[water cooling]]></category>
		<guid isPermaLink="false">https://stasosphere.com/entrepreneur-being/?p=262</guid>

					<description><![CDATA[For the last few months I have been trying to get A100 80GB PCIe to work on my desktop computer. The first stage was to get the card recognized by BIOS and then the OS which took quite some figuring out. The second stage was to get a custom water cooling solution, since A100 runs [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>For the last few months I have been trying to get <a href="https://www.nvidia.com/en-us/data-center/a100/">A100 80GB PCIe</a> to work on my desktop computer.<br /><br /><img fetchpriority="high" decoding="async" class="alignnone size-medium wp-image-276" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-a100-pcie-300x169.jpg" alt="" width="300" height="169" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-a100-pcie-300x169.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-a100-pcie.jpg 630w" sizes="(max-width: 300px) 100vw, 300px" /></p>
<p>The first stage was to get the card recognized by BIOS and then the OS which took quite some figuring out.</p>
<p>The second stage was to get a custom water cooling solution, since A100 runs really hot and it&#8217;s very difficult to cool with just fans and typically requires a server-level cooling hardware and a server room with air conditioning.</p>
<p>But after a few months of trials and tribulations I have a working solution, which I&#8217;ll share with you in this post.<span id="more-262"></span></p>
<h2>Getting A100 recognized by BIOS</h2>
<p>A100 PCIe is a headless card designed for server use &#8211; it has no sockets to plug a monitor in.</p>
<p>I first made <a href="https://forums.developer.nvidia.com/t/a100-pcie-isnt-recognized-by-bios/198546">a post at NVIDIA forums</a> asking for help and another user found that the only way to make A100 work is to have another NVIDIA GPU to run the monitor from and that it&#8217;s currently impossible to use iGPU (built-in gpu available with most Intel CPUs consumer level motherboards).</p>
<p>To benefit from PCIe-4 I purchased <a href="https://amzn.to/3uGQOpK">ROG Maximus XIII Hero (z590)</a> as my original MB was z390 / PCIe-3.</p>
<p>I first tried using iGPU (CPU Graphics) and A100 lead to the system not POSTING (d4 &#8211; PCI resource allocation error. Out of Resources).</p>
<p>As kindly shared by one of the forum users that <a href="https://forums.developer.nvidia.com/t/a100-pcie-isnt-recognized-by-bios/198546/12">the current solution is to use a 2nd card</a>, I added a old NVIDIA PCIe card and plugged the monitor into it. Now it POSTed and booted just fine, but still not seeing A100 wasn&#8217;t visible in nvidia-smi.</p>
<p>I also tried changing the order of the cards (A100 2nd) &#8211; but there was no change in the outcome.</p>
<p>I then experimented with various BIOS configurations until I found one that worked. Here it is:</p>
<pre><code class="hljs yaml"><span class="hljs-attr">Advanced:</span>

  <span class="hljs-string">Advanced</span> <span class="hljs-string">System</span> <span class="hljs-string">Agent</span> <span class="hljs-string">(SA)</span> <span class="hljs-string">configuration</span>

    <span class="hljs-attr">Graphics Configuration:</span>
      <span class="hljs-attr">Primary Display:</span> <span class="hljs-string">Auto</span> <span class="hljs-string">(probably</span> <span class="hljs-string">could</span> <span class="hljs-string">be</span> <span class="hljs-string">set</span> <span class="hljs-string">to</span> <span class="hljs-string">PEG)</span>
      <span class="hljs-attr">IGPU Multi-Monitor:</span> <span class="hljs-string">Disabled</span>

    <span class="hljs-attr">Memory Configuration:</span>
      <span class="hljs-attr">Memory Remap:</span> <span class="hljs-string">Enabled</span> <span class="hljs-string">(above</span> <span class="hljs-string">4GB)</span>

    <span class="hljs-string">PCI</span> <span class="hljs-string">Subsystem</span> <span class="hljs-string">Settings</span>
      <span class="hljs-attr">Above 4G Decoding:</span> <span class="hljs-string">Enabled</span>
      <span class="hljs-attr">Resize Bar:</span> <span class="hljs-string">Enabled</span>
      <span class="hljs-attr">SR-IOV Support:</span> <span class="hljs-string">Enabled</span>
</code></pre>
<p>and the reason it wasn’t working originally is because by default it had <code>SR-IOV Support: Disabled</code></p>
<p>As it&#8217;s possible that other motherboards may or may not work please read this <a href="https://forums.developer.nvidia.com/t/a100-pcie-isnt-recognized-by-bios/198546">thread</a> where Scott Ellis explains which BIOS settings the server motherboards normally need to detect A100, and which may or may not be present in the consumer motherboard. Another user shared that they got it to work on <a href="https://amzn.to/3rtAhnb">ASUS ROG STRIX Z690-G GAMING WIFI</a>, so chances are that the recent motherboards support it. If you find others that work please don&#8217;t hesitate to share the name in the comments.</p>
<h2>Water Cooling A100</h2>
<p>Fans proved to be very inadequate for A100 cooling, the huge radiator it came with did well for the first few minutes and then it&#8217;d remain really hot &#8211; definitely not something usable for the weak PC fans. Perhaps it&#8217;d work if I were to blast them at 100% speed but it&#8217;d be very loud and won&#8217;t work in summer.</p>
<p>I started researching water cooling.</p>
<p>I first ordered the water block from EKWB &#8211; they unfortunately didn&#8217;t label their product correctly &#8211; only in a small print it said that it was for A100 40GB <strong>which doesn&#8217;t fit A100 80GB PCB</strong>. So I wasted a lot of time and lost money on dealing with the wrong product. I asked them to fix the label to state that it&#8217;s a 40GB block, but they refuse to do it. And as of this writing they have no plans to make a 80GB version.<br /><br />Here is the 40GB water block from EKWB:</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-scaled.jpg"><img decoding="async" class="alignleft wp-image-265 size-large" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-1024x422.jpg" alt="" width="540" height="223" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-1024x422.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-300x124.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-768x316.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-1536x633.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/ekwb-a100-40gb-2048x844.jpg 2048w" sizes="(max-width: 540px) 100vw, 540px" /></a><br /><br /></p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p>But A100 80Gb added a metal frame around the main chip as can be seen below. It wasn&#8217;t there in the 40GB version of PCB.</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-scaled.jpg"><img decoding="async" class="alignnone wp-image-275 size-large" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-1024x768.jpg" alt="" width="540" height="405" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-1024x768.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-300x225.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-768x576.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-1536x1152.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-chip-2048x1536.jpg 2048w" sizes="(max-width: 540px) 100vw, 540px" /></a><br /><br />Then I ordered <a href="https://www.bykski.com/page133?product_id=5393">Bykski&#8217;s A100 80GB water block</a> via aliexpress. The ordering experience wasn&#8217;t great as I paid top dollar for quick shipping (DHL) and the vendor abused that, by submitting a shipment label but didn&#8217;t ship the product for weeks! As soon as I was able to start a dispute and request a full refused I did that, and immediately the product was shipped. So be careful spending extra money on shipment since Aliexpress allows its vendors to do what they please and they don&#8217;t enforce anything.<br /><br />Here is the water block (left) and the PCB (right):</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-266 size-large" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-919x1024.jpg" alt="" width="540" height="602" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-919x1024.jpg 919w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-269x300.jpg 269w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-768x856.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-1379x1536.jpg 1379w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/Bykski-water-block-a100-pcb-1838x2048.jpg 1838w" sizes="auto, (max-width: 540px) 100vw, 540px" /></a></p>
<p>Taking apart A100 was mainly a matter of removing all the screws. The key to be able to remove the panel is to remove 2 screws around the power plug.</p>
<p>The first step is to clean the main chip from the thermal paste using alcohol and a coffee filter so that the old residue is removed.<br /><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-273 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-150x300.jpg" alt="" width="150" height="300" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-150x300.jpg 150w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-513x1024.jpg 513w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-768x1534.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-769x1536.jpg 769w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-1025x2048.jpg 1025w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-80gb-pcb-scaled.jpg 1281w" sizes="auto, (max-width: 150px) 100vw, 150px" /></a><br />Then thermal pads need to be applied. The instructions provided by Bykski product page are negligently incomplete. Luckily I still had the original radiator to be the model to where I need to apply the thermal pads:</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-272 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-300x187.jpg" alt="" width="300" height="187" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-300x187.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-1024x638.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-768x479.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-1536x957.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-full-2048x1277.jpg 2048w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a></p>
<p>You can see I matched them (actually I now see I missed one of them on the left! ouch)</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-270 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405-300x176.jpg" alt="" width="300" height="176" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405-300x176.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405-1024x602.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405-768x451.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405-1536x902.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-added-thermal-pads-scaled-e1649960182405-2048x1203.jpg 2048w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a></p>
<p>Now it&#8217;s time to put the water block on and tighten the 4 screws around the main chip:</p>
<p> </p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-269 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-300x156.jpg" alt="" width="300" height="156" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-300x156.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-1024x531.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-768x398.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-1536x797.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-pcb-mounted-on-water-block-2048x1062.jpg 2048w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a></p>
<p>Bykski slacked again and didn&#8217;t provide a proper mounting bracket that would fit their product. Their instructions allude to using the bracket from the original A100 radiator. As you can see it had 4 screws mounting the main radiator body and 2 more on the side you can&#8217;t see on the picture:<br /><br /><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-271 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-300x149.jpg" alt="" width="300" height="149" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-300x149.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-1024x509.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-768x382.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-1536x763.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-orig-radiator-mounting-bracket-2048x1018.jpg 2048w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a></p>
<p>so when you try to mount it on Bykski block you end up with it mostly hanging in the air and attached to the backplate with 2 super tiny screws. This is very poor engineering.</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-264 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-300x185.jpg" alt="" width="300" height="185" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-300x185.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-1024x633.jpg 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-768x475.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-1536x949.jpg 1536w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/reusing-a100-bracket-2048x1266.jpg 2048w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a></p>
<p>The acrylic block is super-heavy, I&#8217;m very concerned that this can break the PCIe socket or the A100 PCB.<br /><br />(If Bykski engineers read this, please provide your own custom mounting bracket that gets screwed onto the acrylic block and perhaps the backplate. Otherwise your product can easily break the motherboard socket, since the original mounting bracket that comes with A100 was not designed for your watering block.)<br /><br />Finally I put it all together with a radiator and a reservoir-pump combo from EKWB and had it run for 24h to test for leaks.</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-scaled.jpg"><img loading="lazy" decoding="async" class="alignnone wp-image-268 size-medium" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-216x300.jpg" alt="" width="216" height="300" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-216x300.jpg 216w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-738x1024.jpg 738w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-768x1065.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-1107x1536.jpg 1107w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-1477x2048.jpg 1477w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/a100-water-cooling-ensemble-IMG_20220412_161955090-scaled.jpg 1846w" sizes="auto, (max-width: 216px) 100vw, 216px" /></a></p>
<p>This was my first time doing a custom water cooling solution so it wasn&#8217;t easy. As I couldn&#8217;t afford to have my desktop not working for 24h, I did the whole water cooling ensemble outside and simply plugged it into an old computer while testing for leaks. And the next day I mounted the ensemble in the target PC while keeping the 3 parts interconnected, which was a bit of a challenge but I made it work.</p>
<p>I purchased the huge <a href="https://amzn.to/3M0bdMm">7000D AIRFLOW Full-Tower ATX PC Case from Corsair</a> and even then I had a hard time putting this huge <a href="https://www.ekwb.com/shop/ek-coolstream-ce-420-triple">EK-CoolStream CE 420</a> radiator together with the smaller AIO radiator I was using for CPU already. I have multiple HDs and the <a href="https://amzn.to/3KLOxPU">Corsair HX1200</a> power-supply that is extra long, so I barely had any space to place all those parts around.<br /><br />Finally, I had to turn <a href="https://askubuntu.com/a/1099963/803807">my headless solution for igpu and nvidia cards off</a>, so that it could switch to the NVIDIA driver, while having the monitor plugged into an old GTX-1080 which was perfect as it is low power and doesn&#8217;t overheat as easily as the Ampere GPUs. And booted my Kubuntu.<br /><br />And voila putting some serious load on the card it keeps a nice ~30C &#8211; Amazing!</p>
<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-smi-a100-gtx-1080.png"><img loading="lazy" decoding="async" class="alignnone wp-image-274 size-full" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-smi-a100-gtx-1080.png" alt="" width="760" height="492" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-smi-a100-gtx-1080.png 760w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2022/04/nvidia-smi-a100-gtx-1080-300x194.png 300w" sizes="auto, (max-width: 760px) 100vw, 760px" /></a></p>
<p>So despite not thinking it fully through the Bykski&#8217;s water block works well at the moment.<br />And yes, A100 80Gb is 10x larger than GTX-1080.</p>
<h2>Things to figure out</h2>
<ol>
<li>setting up software to trigger PWM and water pump adjustments based on temperature reported by <code>nvidia-smi</code> &#8211; at the moment I have just set a normal pump speed in BIOS and using the CPU&#8217;s AIO to drive the speed of the fans &#8211; typically when GPU is churning CPU is almost always busy as well.</li>
<li>Of course, figuring out how to get rid of the 2nd NVIDIA card and use iGPU instead. Save electric bills and generate less heat. If you discover a solution please share in the comments. Thank you!</li>
</ol>
<h2>Notes</h2>
<p>This post is focused on the specifics of getting A100 80GB PCIe working in a PC, and I&#8217;m not an expert in water cooling, so besides sharing how I installed the water block itself, I trust you can find the details on the best way to do the water cooling elsewhere.</p>
<p>Huge thanks to the person on the NVIDIA forums who <a href="https://forums.developer.nvidia.com/t/a100-pcie-isnt-recognized-by-bios/198546/12">discovered the workaround by using a 2nd card to recognize A100</a>.</p>
<h2>Other related gpu-center-at-home tinkering projects</h2>
<p>&#8211; <a href="https://leikoe.github.io/posts/automotive-gpu-maxxxing">A 4x A100-SXM setup at home</a> by Leo Paille</p>
<p> </p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/262-getting-nvidia-a100-80gb-pcie-to-work-on-a-consumer-motherboard-with-custom-water-cooling/feed/</wfw:commentRss>
			<slash:comments>10</slash:comments>
		
		
			</item>
		<item>
		<title>Play a Game to Understand The Learning in Machine Learning</title>
		<link>https://stasosphere.com/entrepreneur-being/167-understand-machine-learning-play-game/</link>
					<comments>https://stasosphere.com/entrepreneur-being/167-understand-machine-learning-play-game/#respond</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Wed, 29 Jan 2020 05:19:40 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[foundation]]></category>
		<category><![CDATA[game]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://stasosphere.com/entrepreneur-being/?p=167</guid>

					<description><![CDATA[This is an introduction article to Machine Learning, and its various concepts, that even a high-school student should be able to understand. We are going to play with blocks and through that game understand how Machine Learning works. The big news nowadays is Machine Learning (ML). But what is it? The machine part is obvious [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>This is an introduction article to Machine Learning, and its various concepts, that even a high-school student should be able to understand. We are going to play with blocks and through that game understand how Machine Learning works.</p>



<p>The big news nowadays is Machine Learning (ML). But what is it? The <em>machine</em> part is obvious &#8211; it&#8217;s the computer that does the learning. But the <em>learning</em> part may appear as tricky, even though in reality it&#8217;s a very simple thing in its foundation. Let&#8217;s gain an intuitive understand of the learning in ML.</p>



<h2 class="wp-block-heading">Let&#8217;s play a game</h2>



<p>I don&#8217;t know if you have ever played a computer game where you have a ball at a starting position, and you have to direct it to a target with the help of blocks, springs, rotating gears, etc. The original game was named <em><a href="https://en.wikipedia.org/wiki/The_Incredible_Machine_(series)">The Incredible Machine</a></em>. It usually relies on the laws of physics and you need to re-arrange those helper components to make the ball move from let&#8217;s say the left top corner of the screen to the right bottom of the screen where some sort of a target is located. The way you play the game is by experimenting with the block arrangement, so that they will do the work of guiding the ball to its target. </p>



<p>Here is a possible initial game board setup:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="481" height="292" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_1.png" alt="" class="wp-image-202" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_1.png 481w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_1-300x182.png 300w" sizes="auto, (max-width: 481px) 100vw, 481px" /></figure>



<p><span id="more-167"></span></p>



<p>The purple ball needs to hit the purple hexagon for you to win. </p>



<p>Normally, with the help of gravity, the ball just falls straight down. But  since we have the helper blocks, we will use them to guide the ball so that once it starts falling it&#8217;ll have a trajectory to hit the target.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="481" height="292" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_2.png" alt="" class="wp-image-203" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_2.png 481w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_2-300x182.png 300w" sizes="auto, (max-width: 481px) 100vw, 481px" /></figure>



<p>Now, I hope that it will hit the target. </p>



<p>But no, I slightly misplaced the &#8220;plank&#8221;-shaped block and the ball will get stuck in the gap. </p>



<p>OK, let&#8217;s move the bar a bit to the left:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="481" height="292" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_3.png" alt="" class="wp-image-206" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_3.png 481w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_3-300x182.png 300w" sizes="auto, (max-width: 481px) 100vw, 481px" /></figure>



<p>Hooray, this time, the ball should hit the target.</p>



<p>Next, the initial ball position is moved a bit to the left.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="481" height="292" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_4.png" alt="" class="wp-image-205" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_4.png 481w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_4-300x182.png 300w" sizes="auto, (max-width: 481px) 100vw, 481px" /></figure>



<p>And our helper block arrangement is no longer helping to guide the ball to the target. The ball again falls straight down.</p>



<p>OK, let&#8217;s rearrange some blocks:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="481" height="292" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_5.png" alt="" class="wp-image-204" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_5.png 481w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_5-300x182.png 300w" sizes="auto, (max-width: 481px) 100vw, 481px" /></figure>



<p>And success! Now this starting position of the ball is covered too. </p>



<p>But the previous one (the ball starting from a different position) will no longer work. So, we have more trial and error re-arrangements to do.</p>



<p>After some lengthy experimentation we find the following arrangement:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="481" height="292" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_6.png" alt="" class="wp-image-207" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_6.png 481w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/direct_the_ball_6-300x182.png 300w" sizes="auto, (max-width: 481px) 100vw, 481px" /></figure>



<p>And, voila, almost any starting position of the ball will now lead it to the target correctly.</p>



<p>Let&#8217;s revisit the stages of the game we have just played. </p>



<ol class="wp-block-list"><li>The first time I tried to arrange the blocks, I under-shot and missed the target. (You can&#8217;t tell from the picture how heavy the ball is or how much friction the helper blocks generate, so while it is looking obvious in my drawing, in the actual game it could be quite challenging to find the right setup).</li><li>Then I started experimenting by moving the helper blocks a bit and re-running the ball, continually making small corrections until the target was reached.</li><li>Then a new situation was given and I through trial and error I made it work. </li><li>Then I tried to arrange the blocks so that both situations would work.</li><li>Finally, I tried to generalize so that numerous possible situations would work.</li></ol>



<h2 class="wp-block-heading">Machine Learning</h2>



<p>If you understood how the game was played, you already understands the <em>learning</em> part in the Machine Learning. This is how the ML model learns. It usually starts with a random arrangement of the helper blocks (<strong>hidden weights</strong>) and tests whether it hits the target (e.g. a specific category). If it didn&#8217;t succeed, it tries to make small adjustments, while measuring whether the ball is coming closer to the target or not (often using the method of <strong>gradient descent</strong>). Overtime, the adjustments become better and better, until it actually hits the target. It is possible for the model to overshoot the target, in which case it turns around and takes a few small steps in the opposite direction until the target is hit. This whole process is called <strong>training</strong>.</p>



<p>The state where the ball hasn&#8217;t made to the target for most ball starting positions yet, is called <strong>underfitting</strong>, since it can clearly still improve. </p>



<p>The more different positions we train the model with, the better it will generalize for positions it hasn&#8217;t yet seen. Assuming that we have an infinite number of helper blocks of different shapes we can probably find a solution for almost every situation.</p>



<p>If we try to fit perfectly all the situations we have seen so far, we may end up with a perfect outcome for every seen situation, but if a new situation is given it may not work. This is called <strong>overfitting</strong>.</p>



<p>And the final stage of the game was to make our setup <strong>generalize</strong>, so that it could fit almost any unseen situation. In practice, the generalization is an ongoing process, rather than a final stage. For example, by randomly removing the helper blocks at random situations, we can force our block arrangement to be more robust to unseen situations. This is called <strong>dropout</strong> in ML lingo.</p>



<p>Usually, we don&#8217;t have an infinite number of helper blocks and so we have to make do with what we have. So the number of block arrangement possibilities is limited. And, say, after 10 different ball position starting points, our model knows how to send the ball to its destination 90% of the time, which is already quite excellent. </p>



<p>Now you no longer need to spend hours moving the blocks around, since the computer will do it for you. This is no longer fun, since it&#8217;s the computer that&#8217;s now playing the game for you. But for the sake of better understanding of how Machine Learning works we will allow it to do it for us this time.</p>



<h2 class="wp-block-heading">Deep Learning</h2>



<p>Next comes the concept of<strong> Deep Learning</strong>, where you give the ML model a lot more helper blocks, allowing it to build much more elaborate setups, which usually we don&#8217;t even understand how they were made to work. And then the process is repeated with hundreds and thousands of different starting combinations and the model learns with, often, close to 100% correct solutions. </p>



<p>The <strong>deep</strong> part just indicates that instead of a simple basic setup with a few blocks, now we have a much more complex setup that has a much higher capacity for flexibility of arranging things and generalizing. For example, for the sake of our mechanical universe example, imagine that re-arrange our helper blocks so that they represent a concave surface like a bowl or a wine glass, and the target is at the bottom of it. If this was possible, no matter where the ball is released from it would always ends up at the bottom of that concave surface where our target is situated.</p>



<p>If you&#8217;re ready to imagine even more complex situations, consider multiple dimensions, so if it looks impossible to make the ball move from say the bottom of the board to the top, we use a forth dimension to sneak it in. In reality deep learning uses hundreds and thousand of dimensions to solve very tricky set ups. We can hardly visualize the 4th dimension, so we just have to trust that it works, relying primarily on math.</p>



<p>(note: If you would like to stretch your mind in a relatively easy and fun way to start grasping multiple dimensions read the book <a href="https://amzn.to/2Gz7O7v">Flatland</a> or watch the <a href="https://www.imdb.com/title/tt0814106/">movie</a> based on this book.)</p>



<p>Further, to make things even more efficient, instead of trying to figure out one ball situation at a time, the model tries to work simultaneously on <strong>batches</strong> of dozens or hundreds of such situations at once. This is due to availability of specialized hardware (called GPU or TPU), which was designed to process huge amounts of data in parallel at an incredible speed. Not only that hardware makes things run much faster, it also finds the best generalized block arrangement in less steps, as compared to doing it one situation at a time.</p>



<p>While the foundations of ML are very simple, for it work successfully and in a timely manner we need:</p>



<ol class="wp-block-list"><li>Either huge amount of data to train it on &#8211; think real estate prices for the last 5 years over a huge territory, which is often needed for <strong>supervised learning</strong>, where the data helps the model to improve. Or the model can be trained using <strong>reinforcement learning</strong>, like in our ball to target game example, where through trial and error it learns how to do better over time.</li><li>Very powerful hardware, usually specially designed for heavy matrix processing, to support deep learning. Imagine, how much more complicated it would be for you to play this game if you had to accomplish the same not with a few helper blocks, but millions of blocks.</li></ol>



<p>If these two requirements are satisfied then Deep Learning is possible. The remaining difficult part is usually to build or choose an architecture that solves a problem at hand. </p>



<h2 class="wp-block-heading">Transfer Learning</h2>



<p>Now, if you were to start playing the arrange-the-blocks game from scratch you&#8217;d probably have to invest as much work as I did to make it deal with multiple starting situations. What if I were to save my work and share it with you? Then you won&#8217;t have to start from scratch and will be able to continue working on more complex situations, saving yourself a huge amount of time and computer resources. </p>



<p>This brings us to <strong>Transfer Learning</strong>, where an individual or an organization invests a huge amount of their time and computer resources (i.e. money) to build a ML model, which can then be shared with others.</p>



<p>This shared model is then <strong>fine-tuned</strong> to a specific type of data. For example imagine you want to replace the falling ball with some kind of polygon, that can roll like a ball, but it doesn&#8217;t roll as well. You will still benefit from the pre-trained model, and then you will make small adjustments to the block arrangement by retraining the existing model that was shared with you for your specific needs.</p>



<p>Often, transfer learning can provide huge savings of time and money. </p>



<h2 class="wp-block-heading">Outcome Explainability</h2>



<p>Deep Learning models often figure out very subtle correlations between input signals, that we humans are either not likely to notice or have the capacity to do so, but since ML models are forced to generalize to give correct answers, at times the answers are given for totally wrong or unexplainable reasons.</p>



<p>There is the urban legend, that once upon a time a certain military force tried to build a ML model to detect camouflaged tanks in satellite imagery. A ML model was trained with a handful of photos that included tanks (<strong>positive examples</strong>) and about the same amount with no tanks in them (<strong>negative examples</strong>). After training on these photos, the model was able to classify new photos correctly, except instead of learning to tell hidden tanks, it learned to tell whether it was a cloudy day or not. Since it so happened that all camouflaged tanks photos were taken on the cloudy days, and all the non-tank photos were taken on non-cloudy days.</p>



<p>I found an <a href="https://www.jefftk.com/p/detecting-tanks">article</a> that attempts to find the truth about whether there is any truth to this legend. I will let you discover it for yourself, but the article concludes with:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>So I think it&#8217;s very likely, though not certain, that this didn&#8217;t actually happen.</p></blockquote>



<p>Regardless, there are plenty of research papers out there that do indicate real findings of this type and this a big problem, since if we don&#8217;t know how a model makes its decisions, we will be unable to use it reliably. If some kitten photos gets miscategorized, it&#8217;s probably not a big deal, but if a self-driving car miscategorizes an obstacle and we can&#8217;t figure out why, or a person gets put in jail due to a ML model mistake, that would be a big problem.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>I hope this little playful introduction helped you to gain an insight into how the very complicated field of Machine Learning is based on rather simple things. The complexity is in the details of finding the right solution for the right situation, knowing how to process data, how to debug problems in the code, optimize the code to work faster, etc. The devil is in the detail. But I trust you now have an intuitive understanding of how the gears of Machine Learning work and in particular the <strong>Learning</strong> part of it.</p>


]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/167-understand-machine-learning-play-game/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Stop Making Money for Social Networks!</title>
		<link>https://stasosphere.com/entrepreneur-being/152-stop-making-money-social-networks/</link>
					<comments>https://stasosphere.com/entrepreneur-being/152-stop-making-money-social-networks/#respond</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Sat, 25 Jan 2020 02:25:41 +0000</pubDate>
				<category><![CDATA[Marketing]]></category>
		<category><![CDATA[black hole]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[control]]></category>
		<category><![CDATA[social networks]]></category>
		<guid isPermaLink="false">https://stasosphere.com/entrepreneur-being/?p=152</guid>

					<description><![CDATA[Many businesses, especially, independent entrepreneurs spend a lot of time and money on social networks in hope that the effort will lead to a steady stream of customers. Unfortunately, this doesn&#8217;t work, because those who run social networks have a conflicting interest with yours. Welcome to the Social Network Black Hole Social networks primarily make [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>Many businesses, especially, independent entrepreneurs spend a lot of time and money on social networks in hope that the effort will lead to a steady stream of customers.</p>



<p>Unfortunately, this doesn&#8217;t work, because those who run social networks have a conflicting interest with yours. </p>



<h2 class="wp-block-heading"><strong>Welcome to the Social Network Black Hole</strong></h2>



<p>Social networks primarily make income from advertising. In order to be able to sell advertisement placements, the social networks need a lot of eyeballs and that requires minions to generate new content all the time. Who are those minions? Well, it&#8217;s most of us&#8230;</p>



<p><span id="more-152"></span></p>



<p>Have you ever tried to find a post written by you or someone else from a few weeks earlier and you just couldn&#8217;t find it? Well, it&#8217;s because each piece of user-generated content has a very short life-span on social networks. It lives slightly longer if it generates a controversy and/or gets re-shared by many. But even that content disappears in a relatively short period of time. This is by design.</p>



<p>Even though, the older content is usually just as solid and relevant for years to come (a delicious shakshuka recipe anybody?), if all your followers have already seen that post, they won&#8217;t be excited to look at it and re-share it again, or start a new controversy, and therefore, it will generate no fresh eyeballs to display ads to. As such, the old content is pushed down to make space for a fresh, potentially exciting, content and thus to generate ad revenue for the social network owners.</p>



<p>Now, I hope you realize that your hard work makes money for the social networks. To which you may retort, but they sends me customers (if you&#8217;re trying to use social networks to create a customer flow that is). And I&#8217;d agree partially, and also tell you that this is an extremely inefficient way of accomplishing your goals. Let&#8217;s explore why this is the case.</p>



<p>If, tomorrow, you get sick and stop posting on your favorite social network, how long will it take before your customer flow from that network will come to a full halt? Usually, in 7 to 10 days of you &#8220;not contributing&#8221;, as far as the social network algorithms are concerned, you never existed, i.e. nobody will see all the hundreds or thousands of the earlier posts you worked so hard to create.</p>



<p>At least, if the social networks were to share profits with the content creators, but most don&#8217;t do that.<strong> The networks collect all the profits and you work for free</strong>.</p>



<p>Is there a way out? Yes, there is. You need to take the control back. You need to own all the content that you generate and have it hosted on your own website. You can pay $5/month or even have a free blog/website &#8211; just search for those and there are hundreds of great offers out there. For example many use free blogs like <a href="https://wordpress.com/">wordpress</a> and <a href="https://www.blogger.com">blogger</a>, but there are many other similar and even better options out there. </p>



<p>Some 10-15 years ago having your own website required either a hardcore tech expertise or one had to pay a ton to someone to build and maintain it. These days, even my grandma could probably learn how to write her own blog. If you can create a Word document, you can, with the same easy, write a blog. You can always create a free blog, experiment with it &#8211; it will take 1h of your time to get started &#8211; and chances are, you will find out that it&#8217;s a very easy task. It won&#8217;t take longer than, say, switching from an older MS Word to the Windows 10 version of the same application, and figure out the new interface.</p>



<p>I recommend investing the extra $10-15/year and getting your own domain. That way you may choose to move from one hosting company to another, but all your previously existing links will still work. If, however, you use a free domain like mygreatblog.wordpress.com, and then you decide you want to use a different platform, and now have mygreatblog.blogger.com, the old links to the original website will no longer work and the search engines will have to discover your website anew, meaning you will lose all that free traffic and you will need to start acquiring it from scratch.</p>



<p>If this article made you think and you decided to take control over your writing, start creating useful content, one paragraph or article at a time. Every time you create a new content on <strong>your</strong> website, do post a link to it to your social network. But you retain control over how, when and to whom your content is shown. And here are some other great benefits:</p>



<ol class="wp-block-list"><li>Now, your content, instead of having a life-span of days, will have a life-span of years. </li><li>Little by little search engines will start sending to you free traffic. And this will grow exponentially over time. </li><li>You can also buy ads on other sites to send more traffic to your own content. </li><li>If the content is interesting and unique, other sites and even social networks will start sending free traffic to you. wikipedia-style reference-type writing will guarantee a ton of traffic.</li><li>You can make landing pages if you are into selling goods and services. </li><li>You can publish content that otherwise won&#8217;t be allowed to be published on social networks.</li><li>You own and control the comments. For example you can delete comments you don&#8217;t approve of and edit comments with bad grammar.</li><li>You can revisit and enhance your content over time. </li><li>You can even make money from your content by selling ad space on your own website. This gives you passive income.</li></ol>



<p>And while you may not be able to live the <a href="https://amzn.to/30XGuJB">The 4-Hour Workweek</a>, you should be able<strong> to take a break from producing</strong> any time, and for example take a month off and go on a long well-deserved and badly needed vacation. Your website/blog will continue working for you 24/7, despite you being on some remote island with no Internet connection.</p>



<p>I was recently browsing Google Analytics for some of my really old sites and I discovered that there are some articles I published 20 years ago and they are still being read today. For example in January 2000 I published a summary <a href="https://stason.org/articles/money/investing/everything_you_ever_wanted_to_know_about_employment_and_stock_options_plan.html">Employment and Stock Options Plan Explained</a>, and 20 years later it&#8217;s still being read.</p>



<p>Note that this is a long term investment. You just need a bit of patience, as it will take some time for your new website/blog to be discovered by search engines. Meanwhile sending traffic to your website via social networks is quick and easy and there are even tools that can do this automatically for you. But, you no longer need to slave for a company that gives you very little in return and forgets about you as soon as you stopped giving.</p>



<p>Another important subject matter to explore is measuring outcomes. As, <a href="https://amzn.to/36mFRKz">Dan Kennedy</a>, the most insightful practical marketing guru that I have ever encountered, says &#8211;  if you can&#8217;t measure the outcome of something you&#8217;re putting an effort into &#8211; do not do it. Because if you do, you have no idea whether your effort/money amounted to anything good, and you have no way of making improvements, since you don&#8217;t have a base line to compare with. You could be getting your customers anyway, and most of your social network effort could be mostly wasted.  I think this is the case for many entrepreneurs, who try to get the social networks to work for them, but who can&#8217;t measure the real impact of their efforts. But, this is a big topic, and I will explore it in another post.</p>



<h2 class="wp-block-heading">Recommended Reading</h2>



<p>I have just finished reading Cal Newport&#8217;s excellent books &#8220;<a href="https://amzn.to/37Phki7">Deep Work</a>&#8221; and &#8220;<a href="https://amzn.to/38Rd2rH">Digital Minimalism</a>&#8220;, and while both are somewhat related to what has been discussed here, in particular the latter book provides simple practical strategies to overcome the social media addiction. Even though I have been already &#8220;very minimal&#8221; on social networks, I have just learned some crucial patterns that I was missing before. For example, I&#8217;m going to follow Cal&#8217;s recommendation to completely stop upvoting/downvoting &#8211; I can see now that it has been causing some anxiety for me, as I had to find an internal justification for downvoting, which didn&#8217;t feel good at times and felt like I was engaging in cowardly behavior. And also feeling bad when someone downvoted my contribution. Completely eradicating the participation in up/downvoting process removes all anxiety and saves a ton of time. I still know what I like and what I dislike, but I don&#8217;t have to make an anonymous statement about it. Yay! Thank you, Cal!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/152-stop-making-money-social-networks/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How Chinese companies generate real-fake reviews on Amazon</title>
		<link>https://stasosphere.com/entrepreneur-being/147-chinese-companies-fake-reviews-amazon/</link>
					<comments>https://stasosphere.com/entrepreneur-being/147-chinese-companies-fake-reviews-amazon/#respond</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Fri, 24 Jan 2020 21:00:44 +0000</pubDate>
				<category><![CDATA[Marketing]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[fake]]></category>
		<category><![CDATA[reviews]]></category>
		<guid isPermaLink="false">https://stasosphere.com/entrepreneur-being/?p=147</guid>

					<description><![CDATA[You&#8217;re probably aware that most social networks are loaded with fake likes and reviews, and it&#8217;s quite cheap to buy those if you&#8217;d like your profile boosted. In the past an entrepreneur would arrange a network of participants who for a small monetary token would like and/or write fake reviews. These days, with extremely powerful [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>You&#8217;re probably aware that most social networks are loaded with fake likes and reviews, and it&#8217;s quite cheap to buy those if you&#8217;d like your profile boosted. In the past an entrepreneur would arrange a network of participants who for a small monetary token would like and/or write fake reviews.</p>
<p>These days, with extremely powerful realistic text-generating machine learning engines like <a href="https://en.wikipedia.org/wiki/GPT2">GPT-2</a>, this technique becomes more and more automated and no longer requires networks of humans to create fake things. It&#8217;s the domain of AI agents these days.</p>
<p>But, in the domain of online stores these machinations are still tricky to do, since unlike online profiles that have no physical boundaries, real products are bound by hard coins and thus it&#8217;s much more difficult to game. But, of course, it&#8217;s being gamed.</p>
<p><strong>The name of the game is generating multiple positive reviews by buying them with freebies.</strong></p>
<p><span id="more-147"></span></p>
<p>While in the past a merchant could pay anybody to go and write reviews on sites like Amazon and all was good, recently <a href="/entrepreneur-being/138-machine-learning-hallucinations-amazon/">machine learning started to being deployed to ignore those reviews</a>, and instead count only reviews with the &#8216;Verified Purchase&#8217; tag. i.e. only reviewers that spend their hard-earned cash are considered as real. Surely, the ML model also validates that the verified purchaser has a history of normal purchases and wasn&#8217;t created just to game the system, even if the money was spent for real &#8211; (create a new account, send to it some credit via a gift card, buy the product, forget about the account).</p>
<p>Having recently purchased a few hardware products on Amazon, I discovered that the Chinese merchants found a way to beat the system and bypass the AI gatekeeper. Once you buy say some <a href="https://amzn.to/37vVOji">grow lights</a>, when your product arrives it will include a variety of incentives, such as:</p>
<ol>
<li>write a 5-star review and receive a free gift</li>
<li>join the testers and receive free products in exchange for reviews</li>
</ol>
<p>Who doesn&#8217;t want a free gift? So a lot of those 5-star reviews were &#8220;bought&#8221; with a free gift bribe. Notice, that Amazon ships those products and it knows that the incentive card is there. Don&#8217;t you find it strange?</p>
<p>Then, there is the free-testers program, where you get freebies in exchange for reviews. In order to fly below the fake review detector, the customer is told to purchase the product as if they wanted it in first place, then post a review, then email the company with the review and the receipt and they refund you for the full cost via paypal. Now, again, when you get the whole product for free, will you leave a bad review? So here, again, we get a bunch of fake reviews. Do you think Amazon doesn&#8217;t know about those?</p>
<p>I&#8217;m probably being unfair here instigating that it&#8217;s only the Chinese companies that do that. To clarify, I&#8217;m only sharing my direct experience and it came through Chinese companies so far.</p>
<p>I had another incident on Amazon, where I purchased a really badly engineered product and it fell apart after a few uses, so I left a negative review, listing its shortcomings. Next, the vendor of that product started hassling me over email trying to bribe me to change my rating from 2 to 4. This makes no sense to me. If the vendor were to fix the problems in the product and to send me a better product, I would have been more than happy to give them a glowing review, but paying me off to say lies is just lame. Though, I&#8217;m sure that it did work on some other reviewers.</p>
<p>Bottom line, a lot of reviews on Amazon and other sites are fake and they are there to stay. Let&#8217;s see how Amazon&#8217;s <a href="/entrepreneur-being/138-machine-learning-hallucinations-amazon/">attempts at using machine learning to discount fake works out</a>, this is a very exciting time. And, surely, the merchants will have to find adversarial ways to beat the machine.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/147-chinese-companies-fake-reviews-amazon/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Authors of Aging Tech Books on Amazon Will Have Their Ratings Dropped</title>
		<link>https://stasosphere.com/entrepreneur-being/138-machine-learning-hallucinations-amazon/</link>
					<comments>https://stasosphere.com/entrepreneur-being/138-machine-learning-hallucinations-amazon/#comments</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Fri, 24 Jan 2020 19:51:38 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Marketing]]></category>
		<category><![CDATA[authors]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[manipulations]]></category>
		<category><![CDATA[ratings]]></category>
		<category><![CDATA[reviews]]></category>
		<guid isPermaLink="false">https://stasosphere.com/entrepreneur-being/?p=138</guid>

					<description><![CDATA[TL;DR: Amazon has implemented a new strategy of giving old reviews a significantly lower weight than the newer ones. And should anybody post a single bad review as of recent, which would be normal for outdated tech books, that review will pretty much define the quality of the book (== terrible). Gaming reviews has been [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p><strong>TL;DR</strong>: Amazon has implemented a new strategy of giving old reviews a significantly lower weight than the newer ones. And should anybody post a single bad review as of recent, which would be normal for outdated tech books, that review will pretty much define the quality of the book (== terrible).</p>



<p>Gaming reviews has been a big issue for any marketplace, and in particular on Amazon. Amazon does a pretty so-so job at weeding out fake reviews out as it is <a href="/entrepreneur-being/147-chinese-companies-fake-reviews-amazon/">gamed freely by some Chinese vendors in an inventive way that appears legit</a>. But it has recently deployed a new strategy of how it weighs each given review. Reviews with no &#8216;Verified Purchase&#8217; tag get a lesser weight than those with the tag, and older reviews get progressively less weight in the game.</p>



<p><span id="more-138"></span></p>



<p>(note: I have no direct source to confirm that this is so, and I derived this by looking at multiple listing, <a href="https://www.reddit.com/r/MachineLearning/comments/euciix/discussion_machine_learning_hallucinations_at/">asking for help</a> from <a href="https://forums.fast.ai/t/ml-unfairness-here-and-now/62368/">others</a> and doing the math, which after several days of investigation lead to what appears to be a good understanding of what is probably going on. So please don&#8217;t take this as a definitive answer.)</p>



<h2 class="wp-block-heading">Let&#8217;s Destroy my Book&#8217;s Reputation</h2>



<p>My <a href="https://amzn.to/2Gwj0lC">first book</a> was published in 2003. Now long forgotten by most, the Apache/mod_perl technology became outdated around 2010, and moreover, this particular book was written about the first generation of this technology, and which got superseded by the second generation in 2004. As you can tell this book is not relevant for today&#8217;s tech world, unless you happen to maintain a 15-year old system that refuses to quit.</p>



<p>Until recently, that book had an almost excellent total 4.4 rating with 3 5-star reviews, and 4 4-star reviews. Which I think reflects well the reality. It&#8217;s an excellent work of 900 pages that took me 3.5 years of hard work to create. Once I finished the writing, I discovered that writing technology books is a foolishness, because they get outdated before they hit the presses. It&#8217;s better now, when books are digital, as compared to dead trees. But even with digital books, for an author to constantly update the book to match the ever-changing at a furious speed APIs and validating that the code still works is an insanely time-wasteful effort.</p>



<p>Now, I know all those reviews on my book were honest and I have never solicited those. Only one of those reviews has a Verified Purchase tag, but many of my books were purchased at the O&#8217;Reilly Open Source Convention  &#8211; I know that, because for 7 years I taught workshops at that conference  and signed many of those books that were purchased there. Powell Books was usually the book vendor there. </p>



<p>Someone recently rated my book at 1-star (no review). I can&#8217;t imagine, who in their right mind would buy a 17-year old book on a long dead technology, and expect something good out of it? No wonder they must have been enraged when they started reading it and as a result left a I-hate-it 1-star rating. It could have been done out of pure malice too. Who knows.</p>



<p>You might say, what&#8217;s the big deal. Indeed, it shouldn&#8217;t be a big deal, since no single review should define a book&#8217;s quality &#8211; but, unfortunately, it is a big deal. Since after that rating was added, the total rating of the book went from 4.4 to 2.9. </p>



<p>Now, you might say, no way. But, unfortunately, numbers don&#8217;t lie:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1019" height="965" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/practical_mod_perl-2.9.jpg" alt="" class="wp-image-189" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/practical_mod_perl-2.9.jpg 1019w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/practical_mod_perl-2.9-300x284.jpg 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/practical_mod_perl-2.9-768x727.jpg 768w" sizes="auto, (max-width: 1019px) 100vw, 1019px" /></figure>



<p>So we have 7 positive and 1 negative ratings and you can see the system gave a whooping 48% weight to a single negative rating! Don&#8217;t you find it just odd?</p>



<p>Of course, the book has no chance to balance this machinated wrong-doing, because other than a straightforward manipulation of me asking someone to buy the book, rating it at 5-stars and refunding them the money, nobody in their sane mind is going to do it on their own will, because the book is 17 years old and is no longer relevant.</p>



<p>You might say that, isn&#8217;t this a good thing that no longer relevant books get bad rating? It&#8217;s a difficult question. It depends on who is looking:</p>



<ul class="wp-block-list"><li>To a potential buyer of this specific old book it&#8217;s a good rating &#8211; signaling don&#8217;t buy.</li><li>To an author who worked hard to create the book this is a really bad rating as it doesn&#8217;t reflect the quality of their work. </li><li>To someone who might be looking at author&#8217;s page before buying his/her other books this is a bad rating, since who wants to take the risque of buying new books if their other books have a bad rating, and this is clearly a misrepresentation of the truth.</li><li>To a potential publisher who might be considering giving this author a contract for a new book this is a bad rating.</li></ul>



<p>Moreover, depending on whether you&#8217;re logged in or not, Amazon is likely to show you a different rating. For example, my book shows 3.1/5 when I&#8217;m logged out.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="361" height="232" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/screenshot_14.png" alt="" class="wp-image-190" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/screenshot_14.png 361w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/screenshot_14-300x193.png 300w" sizes="auto, (max-width: 361px) 100vw, 361px" /></figure>



<p>Note, how the 1-star rating went from 48% in weight to 41%. Why do I get a different rating whether I&#8217;m logged in or not?</p>



<h2>International Reviews</h2>



<p>Another thing you may notice is that Amazon started incorporating International reviews and ratings into the total ratings. This too is very confusing for some books, since you may get a totally different rating depending on whether you&#8217;re logged in or not. For example, this <a href="https://amzn.to/2RQf5oT">book</a> shows a rating of 5/5 if you aren&#8217;t logged in (and sometimes even if you are logged in):</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="216" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-0-1024x216.png" alt="" class="wp-image-139" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-0-1024x216.png 1024w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-0-300x63.png 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-0-768x162.png 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-0.png 1182w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>But when I log in and I go to the book&#8217;s <a href="https://amzn.to/2TPjmeS">page</a>, it drops from 5/5 to 3/5:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="834" height="272" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-1.png" alt="" class="wp-image-140" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-1.png 834w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-1-300x98.png 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2020/01/amazon-review-1-768x250.png 768w" sizes="auto, (max-width: 834px) 100vw, 834px" /></figure>



<p><!--more--></p>



<p>Here, despite 9 5-star reviews, the book is ranked at 3/5.  When I log in, a single 2-star International review is being displayed, with a crazy disproportionate weight of 68%, which brings a 9 5-star reviews book, to 3.0. Can you make sense of it? Do you read the fantastic reviews and buy the book, or do you run like hell because Amazon thinks it&#8217;s a terrible book &#8211; based on a single recent Verified purchase international review?</p>



<p>Have I mentioned that none of those 9 5-star reviews are &#8216;Verified Purchase&#8217;? Bad signal for sure &#8211; we don&#8217;t trust reviews from those who didn&#8217;t pay us money, Amazon communicates. </p>



<p>Note, that this example is not of a technical book, so something else must be going on.</p>



<h2 class="wp-block-heading">Customer Service Blames Everything on Machine Learning</h2>



<p>I contacted Amazon customer service, asking for help to understand what&#8217;s going on and got a reply:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>The overall star rating for a product is determined by a machine-learned model that considers factors such as the age of the review, helpful votes by customers, and whether the reviews are from verified purchasers. Similar machine-learned factors help determine a review&#8217;s ranking in the list of reviews. The system continues to learn which reviews are most helpful to customers and improves the experience over time. Any changes that customers may currently experience in the review ranking or star ratings is expected as we continue to fine-tune our algorithms.</p></blockquote>



<p>This doesn&#8217;t help me, as a customer to understand what&#8217;s going on.</p>



<h2 class="wp-block-heading">Please Remove my Books from Amazon</h2>



<p>One thing for sure, as an author, I don&#8217;t want my book to be misrepresented and I&#8217;m going to ask O&#8217;Reilly Media, Inc, the publisher, to pull my book off Amazon and I&#8217;m going to try to find a way for Amazon to remove all books published by me and never sell any of my books on Amazon. Amazon <a href="/entrepreneur-being/88-how-amazon-discriminates-against-small-publishers/">banned me from self-publishing a few years ago</a>, because of some internal conflict with a 3rd party vendor, which had nothing to do with me, and without giving me any reason of why suddenly I wasn&#8217;t a kosher writer, while continuing selling my other books. Now this situation should hopefully remove any income coming from my works to Amazon. Surely, I will <a href="/experience-life/my-books/">sell less books without Amazon</a>, but I prefer that, to Amazon transmitting falsified information about my works.</p>



<h2 class="wp-block-heading">Recommendations for Tech Book Authors on Amazon</h2>



<p>To conclude, here is what you need to know as an author to prevent Amazon from burying your works over time:</p>



<ul class="wp-block-list"><li>Your older reviews get less and less valid with time &#8211; you need to find a way to get recent reviews (hint, hint)</li><li>When you (hint, hint) get them, make sure they are posted by those who purchased the book on Amazon. Should they buy it elsewhere, too bad &#8211; they will have to buy another copy on Amazon. Basically, Amazon forces you to game its system, while making money for Amazon.</li><li>If your book is aging, which in particular impacts technology authors, but I&#8217;m sure it affects other categories as well, its reputation is going to get destroyed by recent negative reviews, which you can&#8217;t avoid. Because the relevancy of your work is diminishing, the 1-star reviews will prevail. The only recourse here at the moment to keep your reputation unblemished, I think, is to find a way to remove your book from Amazon. I haven&#8217;t investigated other marketplaces to tell whether it&#8217;s the same situation over there, but I&#8217;m sure all markets will follow Amazon&#8217;s lead. So, it&#8217;s best to remove the aged book altogether and not allow automated systems to misrepresent the child of your hard work.</li></ul>



<h2 class="wp-block-heading">Recommendations for Marketplace Designers</h2>



<p>I doubt Amazon would care for my recommendations, but you never know, someone on their dev team might read them and at least consider them.</p>



<p>After contemplating the current situation here are some recommendations that I came up with:</p>



<ul class="wp-block-list"><li>When you introduce weights for different signals, visible to users, make sure the outcome makes sense. e.g. the <a href="https://amzn.to/313U9Pp">second book</a> presented in this article has a rating of 3.0 and it displays 9 5-star reviews and nothing else &#8211; weird, no?</li><li>Weigh ratings w/o reviews at a lower weight than ratings with reviews. It&#8217;s too easy to submit a number on the rating scale without giving it much thought, usually resulting in less than reality-reflecting ratings (in both good and bad directions). It takes time and thinking to write a review, so usually ratings bundled with reviews are of a higher fidelity.</li><li>Weigh higher reviews with real customer names as compared to Anonymous reviews. It&#8217;s easy to write a negative review when one intends malice (competitor/hater/etc.). If you read listings you&#8217;d find few positive reviews by Anonymous. Surely, some people don&#8217;t feel safe to write a totally valid negative review with their name on it for fear of persecution. So this recommendation is highly debatable.</li><li>Apply all the weighing smarts for products with dozens and hundreds of reviews, the truth may prevail there (unless it was gamed), but be careful with niche products/books with under 10-15 reviews. Any automated weight re-balancing of products with very few reviews can lead to a disastrous outcome for the creator. A carefully thought out different algorithm or a ML model should be applied there.</li><li>Give authors of technology books (and whatever similar domains fit here) a chance to indicate when a book is outdated, to prevent disappointments from users and almost guaranteed ensuing negative reviews/ratings. The book should probably still be available for some years to support those who maintain old systems. But it should be made loud and clear to the customers, that they are buying an outdated manual before they hit the &#8216;Pay&#8217; button. That is if the author was given a way to signal this situation.</li><li>Separate second hand market ratings from the normal books. Very often I see a 1-star post from a second-hand book buyer: &#8220;I received a damaged copy&#8221;. What does this rating have to do with the quality of the contents of the book.  This is very damaging to the total book rating. Why do you do nothing about this? Perhaps, the rating form can be adjusted to indicate whether the user provides feedback on the physical quality of the book as compared to the content.</li></ul>



<p>I honestly don&#8217;t know what to do about the &#8216;Verified purchase&#8217; reviews outweighing the reviews w/o that tag. From one point, it&#8217;s easier to add fake reviews w/o purchasing the product. On the other hand it&#8217;s just as easy to make people buy a book/product, write a review and then refund them the cost via paypal (a common <a href="/entrepreneur-being/147-chinese-companies-fake-reviews-amazon/">current way to game the system</a>). And as shared earlier in this article, this forces more sales for Amazon, and discourages competition. I trust Amazon has a good insight on this one.</p>



<p>This situation took several days to sort out and I&#8217;m glad the picture is finally clear. </p>



<p>The automated future is looking is great for the masses. Niche creators watch out!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/138-machine-learning-hallucinations-amazon/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Online Scammers Are Getting More Creative</title>
		<link>https://stasosphere.com/entrepreneur-being/100-online-scammers-are-getting-more-creative/</link>
					<comments>https://stasosphere.com/entrepreneur-being/100-online-scammers-are-getting-more-creative/#respond</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Sat, 23 Sep 2017 02:20:28 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[scam]]></category>
		<guid isPermaLink="false">http://stasosphere.com/entrepreneur-being/?p=100</guid>

					<description><![CDATA[My email address has been online since 1999, so I get a lot of spam emails, a big chunk of which is scam emails. Most of such scam emails you are well familiar with &#8211; the kind someone is asking you to help them transfer $5M inheritance to some poor widow. Obviously I never reply [&#8230;]]]></description>
										<content:encoded><![CDATA[<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/09/Online-Scammers-Are-Getting-More-Creative.png"><img loading="lazy" decoding="async" class="size-medium wp-image-102 alignright" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/09/Online-Scammers-Are-Getting-More-Creative-300x300.png" alt="" width="300" height="300" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/09/Online-Scammers-Are-Getting-More-Creative-300x300.png 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/09/Online-Scammers-Are-Getting-More-Creative-150x150.png 150w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/09/Online-Scammers-Are-Getting-More-Creative-768x768.png 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/09/Online-Scammers-Are-Getting-More-Creative.png 800w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a>My email address has been online since 1999, so I get a lot of spam emails, a big chunk of which is scam emails. Most of such scam emails you are well familiar with &#8211; the kind someone is asking you to help them transfer $5M inheritance to some poor widow. Obviously I never reply to any of those.</p>
<p>Today one of the scammers almost got me.</p>
<p>I do <a href="https://stasosphere.com/healing/">healing work</a> amongst other things and today I received what seems to be a normal contact from a potential client with a slightly odd side to it.</p>
</p>
<blockquote>
<p><b>Hello, i&#8217;m kylie , i would like to know if you are a Wellness Coach or Reiki Healing and if you accept credit card for payment?</b></p>
</p>
</blockquote>
<p>I follow up, yes, sure. I get next:</p>
<p><span id="more-100"></span></p>
<blockquote>
<div dir="ltr">
<div><strong>the treatment is for 4 ladies in my family, pls get back to me with the total estimate for 2 times in a week 4 (Females) for 4 weeks and i want the session to be 1 hour, and i will like to know the total cost for the service</strong></div>
<div></div>
<div><strong>would prefer you treat them on</strong></div>
<div><strong>Depression</strong></div>
<div><strong>weight loss</strong></div>
<div><strong>reducing pain and discomfort?</strong></div>
<div><strong>nutrition</strong></div>
<div><strong>stress management and Anger management</strong></div>
<div><strong>relationships</strong></div>
<div></div>
</div>
</blockquote>
<div>I find it somewhat of an odd request, so I ask for a clarification, and I get:</div>
<div></div>
<blockquote>
<div><strong>both and I want you to know that I am diagnosed of lung cancer and i&#8217;m currently in the hospital right now due to my health issue and i want to secure this appointment for my family as a surprise to console them and keep them busy while am undergoing my surgery so i hope you can handle them very well? &#8230; could kindly get back to me with the total cost for 4 ladies times.2 sessions per week, times 4 weeks which is 32 sessions</strong></div>
<div></div>
</blockquote>
<div>I compassionately reply that it&#8217;s a nice gesture, send the cost for that number of sessions &#8211; a few thousand dollars worth of services and then get a follow up:</div>
<div></div>
<blockquote>
<div><strong>am okay with the price ,i will be making the payment now to secure the appointment and please i will like you to help me with little favor ?</strong></div>
<div></div>
</blockquote>
<div>Until now I thought it was an odd request, but this reply instantly puts me on guard. A friend was visiting me and I shared with her that someone is most likely trying to scam me. I reply &#8216;Yes, what&#8217;s the favor&#8217; and I receive the punch line:</div>
<div></div>
<blockquote>
<div><strong>here is the favor i need from you&#8230;</strong><br />
<strong>I want you to charge my credit card for the sum of $4750,once the money clear you are to keep $2,100 as payment for the services and help me send $2500 to my private Limo driver, via cash deposit and the money is for refreshment,transportation and accommodation fees he will also be their guardian for the entire weeks of the coaching</strong><br />
<strong>and am making this as a huge surprise to console them for my upcoming surgery operation, and i will like you to hold the remaining $150 as a tips for handling this for me,Thanks</strong></div>
<div></div>
</blockquote>
<div>Yeah, right. I almost participated in the first scam. Beware!</div>
<div></div>
<div>Re-reading now the whole input from the scammer many signals are seen as obvious, but in retrospect it&#8217;s always easier to spot faults&#8230;</p>
<p>And of course it&#8217;s sad people abuse people&#8217;s pain &#8211; but what to do, that&#8217;s where they get us.</p>
</div>
<div></div>
<div>Also please make sure to read: <a title="Permanent Link to How To Reduce the Risk Of Getting Scammed and Owned Online" href="https://stasosphere.com/experience-life/2017/11/scam-owned-online-risk-reduction/" rel="bookmark">How To Reduce the Risk Of Getting Scammed and Owned Online</a>.</div>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/100-online-scammers-are-getting-more-creative/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How Amazon Discriminates Against Small Publishers</title>
		<link>https://stasosphere.com/entrepreneur-being/88-how-amazon-discriminates-against-small-publishers/</link>
					<comments>https://stasosphere.com/entrepreneur-being/88-how-amazon-discriminates-against-small-publishers/#comments</comments>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Sat, 10 Jun 2017 19:28:04 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[publishing]]></category>
		<guid isPermaLink="false">http://stasosphere.com/entrepreneur-being/?p=88</guid>

					<description><![CDATA[It&#8217;s well known that Amazon is loved by online shoppers, mainly due to its amazing customer support. Amazon is now a lot more than a big online shop, as they annexed a bunch of other businesses to it. One of which is a publishing business. This department unfortunately can&#8217;t be called amazing. Here is my [&#8230;]]]></description>
										<content:encoded><![CDATA[<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/06/Amazon-KDP-Discriminates-Against-Small-Publishers.png"><img loading="lazy" decoding="async" class="size-medium wp-image-91 alignright" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/06/Amazon-KDP-Discriminates-Against-Small-Publishers-300x300.png" alt="" width="300" height="300" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/06/Amazon-KDP-Discriminates-Against-Small-Publishers-300x300.png 300w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/06/Amazon-KDP-Discriminates-Against-Small-Publishers-150x150.png 150w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2017/06/Amazon-KDP-Discriminates-Against-Small-Publishers.png 619w" sizes="auto, (max-width: 300px) 100vw, 300px" /></a>It&#8217;s well known that Amazon is loved by online shoppers, mainly due to its amazing customer support. Amazon is now a lot more than a big online shop, as they annexed a bunch of other businesses to it. One of which is a publishing business. This department unfortunately can&#8217;t be called amazing. Here is my little story to showcase what&#8217;s happening behind the scenes and how injustice is done to <em>the long tail.</em></p>
<p>As of this writing I have published 4 books. First with O&#8217;Reilly and Associates, second with Onyx Neon Press, and the the most recent two I self-published via Amazon KDP (Kindle Digital Publishing).</p>
<p>Publishing via KDP was OK, they still have a long way to go to make things smooth, the main gift to us was to make it possible to publish books without needing to convince a mainstream publisher to want to publish them. When things work, they work, when there are problems, usually after stumbling through a few clueless support people, eventually you hit someone who knows their business and things get resolved. All was good.</p>
<p><span id="more-88"></span></p>
<p>Recently I started studying a fascinating science of Reichian Therapy, and stumbled upon a manuscript by Jack Willis, called &#8220;Reichian Therapy &#8211; The Technique, For Home Use&#8221;.</p>
<p>He released his manuscript for free as a pdf on his website in 2008. <strong>The manuscript allows anybody to republish it for free or a for profit</strong>. I quote from page 2 of the <a href="http://wayback.archive.org/web/20130329073908/http://reichiantherapy.net/book%20in%20pdf/Reich%20home%20Book.pdf">pdf</a>:</p>
<blockquote>
<p>Copyright (C) 2007 by Jack Willis.<br />
Edition 3 posted June 2008<br />
<strong>Authorization to reprint: This work may be copied, distrib-</strong><br />
<strong> uted, published and republished by any individual or entity for</strong><br />
<strong> free or as a commercial venture without payment of royalty fees</strong><br />
<strong> to the author</strong>. The book may be reformatted as need for publica-<br />
tion. Pictures may be substituted with identical pictures using<br />
different models. The only copyright restriction is that the text<br />
may not be changed for copied, distributed, published or repub-<br />
lished copies or editions. If reformatted for publication, the<br />
table of contents and the index may be re-calculated or omitted<br />
at the discretion of the person or entity doing the reformatting<br />
for publication.</p>
</blockquote>
<p>Jack Willis passed away in 2010 and his site is now gone, but luckily waybackmachine did it again and it has a copy of the site here: <a href="http://wayback.archive.org/web/20130329073908/http://reichiantherapy.net/">http://reichiantherapy.net</a>.</p>
<p>I wanted to make the text readable on an ebook reader, so I OCRed the pdf, fixed spelling errors and typography issues, and spent a lot of time reformatting it for a digital book usage, merging paragraphs, moving images and footnotes to where they belonged in the text, etc.</p>
<p>After that I decided that others will probably benefit from this text as well in its digital format and decided to upload it to my KDP bookshelf. This was something new, and I read through the materials KDP supplied on the issue of copyright, it seemed to me that based on the quoted above authorization from the author I was doing the right thing, so I made a nice book cover for it and uploaded the manuscript.</p>
<p>At the moment of my uploading there was only a print version of the book available from a large publisher &#8220;New Falcon Publications&#8221;, which they published in August 21, 2013, and which they sell for $40. They renamed the book to: &#8220;<a href="http://amzn.to/2sprFk5">Reichian Therapy: A Practical Guide for Home Use</a>&#8220;.</p>
<p>After several days of waiting I received an email from KDP support:</p>
<blockquote>
<p>Hello,</p>
<p>Thank you for publishing with Amazon. Copyright is important to us &#8211; we want to make sure that no author or other copyright holder has his or her books sold by anyone else. To publish your book, please respond with documentation confirming your publishing rights within four days:</p>
<p>Reichian Therapy: The Technique, for Home Use by Willis, Jack (AUTHOR) (ID:9084681)</p>
<p>Acceptable documentation includes:</p>
<p>&#8211; A contract or statement from the author or publisher verifying you retain publishing rights<br />
&#8211; An e-mail from the address listed on the official author or agent&#8217;s website<br />
&#8211; For authors using a pseudonym, copyright registration or statement of pseudonym use</p>
<p>If you publish books for which you do not hold the publishing rights, your account may be terminated.</p>
</blockquote>
<p>I supplied the documentation emailing them the pdf released by the author and quoting the authorization I pasted earlier in this article.</p>
<p>A day later I received another email from KDP support:</p>
<blockquote>
<p>Hello,</p>
<p>Thank you for the information you provided regarding the following book(s):</p>
<p>Reichian Therapy: The Technique, for Home Use (ID:9084681)</p>
<p><strong>Prior to your submission, we received a notice and takedown for a book that matches to yours, from a third party claiming that the distribution of the book above was not properly authorized due to copyright infringement.</strong></p>
<p>We don&#8217;t involve ourselves in third party disputes and because we have not received any communication from the involved parties that the matter has been resolved, we have determined that we will not be making the book(s) available for sale on Amazon at this time.</p>
<p>We appreciate your understanding in this matter.</p>
</blockquote>
<p>As a result of that, the entry for that book on my Amazon KDP bookshelf had a big BLOCKED on it (unfortunately no snapshot).</p>
<p>After a bit of thinking I realized that probably the conflict they referred to was with &#8220;New Falcon Publications&#8221;, who obviously doesn&#8217;t want competition, yet they won&#8217;t make the effort to release a kindle version of the book. Why? Let me guess &#8211; probably because $40 is much better than maximum $9.99 one can sell for a kindle version of any book. My only guess is that whoever published the digital version before me received a takedown notice from &#8220;New Falcon Publications&#8221;, because if you follow the money trail, there is nobody else who currently profits from this book.</p>
<p>I replied to KDP support, saying that I understand the situation and please remove that blocked entry from my bookshelf.</p>
<p>The same day I received a reply:</p>
<blockquote>
<p>Hello,</p>
<p>At this time, it is not possible to completely remove an unpublished book from the Bookshelf. I&#8217;m sorry if this may cause you any inconvenience.</p>
</blockquote>
<p>OK, I said, weird, but what to do. I moved on.</p>
<p>Having done the work, I didn&#8217;t want it to go wasted, so I made it available for free on my website under <a href="https://chestofbooks.com/health/psychology/Jack-Willis-Reichian-Therapy/">Reichian Therapy</a>.</p>
<p>Two weeks passed and I get a new email from KDP:</p>
<blockquote>
<p>Subject: Your Amazon KDP Account</p>
<p>Hello,</p>
<p><strong>Due to copyright infringement</strong> and Content Guideline violations with respect to books you have submitted through your account, <strong>specifically, the submission of content for which you did not have the necessary rights, we are terminating your account</strong> and your Agreement effective immediately. Here are some examples of the book(s) that you submitted through your account that fall into this category:</p>
<p>Reichian Therapy: The Technique, for Home Use, by Willis, Jack (Title ID: 9084681)</p>
<p>As part of the termination process, we will close your KDP account and the related CreateSpace account (if any) and remove the books you have uploaded through our channels from sale on Amazon. Note that you are no longer eligible to receive unpaid royalties for sales that occurred prior to this termination.</p>
<p>Additionally, as per our Terms and Conditions, you are not permitted to open any new KDP and/or CreateSpace accounts.</p>
<p>If you have any questions, please email us at content-review@amazon.com.</p>
<p>Best Regards,</p>
<p>Claudia Z.</p>
</blockquote>
<p>Say what?! That was odd, so I sent a reply asking for a clarification, since it was very odd that they decided that I infringed on a copyright when I sent them the manuscript&#8217;s authorization that allows anybody to republish it. A few days later I received no clarification, but a different version of the same:</p>
<blockquote>
<p>Subject: Your Amazon KDP Account</p>
<p>Hello,</p>
<p>Thank you for the email concerning the status of your account.</p>
<p>After reviewing your response, we have reevaluated the Content Guideline violations relating to the books in your account.</p>
<p><strong>We found that you have uploaded material through your account for which you do not have the necessary rights.</strong></p>
<p>As a result, we are upholding our previous decision to terminate your KDP account and remove all your books from Amazon.</p>
<p>If you have any questions regarding this issue, please email us at content-review@amazon.com.</p>
<p>If you would like to review our Content Guidelines, please visit: https://kdp.amazon.com/self-publishing/help?topicId=A2TOZW0SV7IR1U</p>
<p>Best Regards,</p>
<p>Luca F</p>
</blockquote>
<p>I hope you can see the obvious.</p>
<p>I next called KDP trying to talk to someone, who could explain, what the real problem was. You can&#8217;t call KDP directly, so you call normal Amazon customer support 1-866-216-1072 and press 0 to talk to an agent. This is general customer service, tell them you want to talk to KDP, they will check your identity and transfer you to KDP. Ideally, have some recent amazon.com order number available if you have such, otherwise it&#8217;s going to take one hour or so for the usually incompetent agent somewhere in 3rd world to find and validate your account. One time the agent claimed that I was shopping with Amazon Mexico! When I&#8217;m from Canada and have never used Amazon Mexico in my life.</p>
<p>After the very inefficient process of validation I was redirected to someone who listened to my story and told me that that my account got closed due to a ransomware virus on my computer. He was very sure of that. It took me a few moments of bewildered surprise to connect the dots and realize I was talking to a wrong person. After realizing that, I discovered I wasn&#8217;t redirected to KDP support, but I was talking to yet another clueless person at tech support.</p>
<p>An hour later with a bunch more validations I reached someone at KDP. It immediately felt that I reached a higher place, since the tone of voice on the other side was intelligent and caring for a change. I was relieved. After telling my situation I was redirected to yet another person, who listened to me, said that there is no way to contact the content review team and that he (Chris) will inquire for me and get back to me in 2 business days. I said thank you very much and waited.</p>
<p>One odd thing Chris told me that if the content review team doesn&#8217;t solve the problem he said just open another account and we will transfer all the book reviews from the old account to the new one. This is despite content review team threatening me in every email from them that I&#8217;m not allowed to open another account with KDP and if I do they will never pay me a cent. Hello, Amazon?! Can you decide on your policies and not provide misleading information?</p>
<p>A week passed by, and nobody contacted me. So I contacted KDP again, again going through many agents before reaching the right people, patiently waiting for the strangely untrained agents who again and again asked me to validate my address and often not being able to do so. Yet, eventually I reached KDP again and telling my story twice I was redirected to someone senior (Karthis), who again asked me to tell the story, and said it makes sense and that again he can&#8217;t talk to anybody at content review and that he will prioritize this issue for them to get back to me within 2 biz days.</p>
<p>Another week passed by with no reply from anybody. I called KDP again, [fast forward the painful reaching them process], and after several bounces again reached Karthis, who promised the same thing. I objected and said that nothing will be different, and can you please talk to someone in person and tell me the real reason for the termination. The answer was the same, he will prioritize it even more.</p>
<p>Luckily this time I did receive a reply the next day, which gave me no new information, other than repeating its old threats:</p>
<blockquote>
<p>Hello,</p>
<p>We&#8217;ve reviewed the information you provided and we are upholding our previous decision to terminate your account and remove all your books from sale on Amazon.</p>
<p>As a result, we will not be reinstating your account.</p>
<p>Please note that, per our Terms and Conditions, you are not permitted to open new accounts and will not receive future royalty payments from additional accounts created.</p>
<p>Best Regards,</p>
<p>John M</p>
</blockquote>
<p>Obviously nobody wants to tell me the real reason for the false allegation.</p>
<p><strong>Amazon, wake up! We authors make money for you! You won&#8217;t have existed if it weren&#8217;t for people giving you stuff to sell. We appreciate the platform and the convenience, but you have just shut down an author, who wrote honest books, for some kind of mysterious conspiracy BS, that you don&#8217;t even have the guts to disclose.</strong></p>
<p>Also notice despite the &#8220;<strong>we are upholding our previous decision to terminate your account and remove all your books from sale on Amazon</strong>&#8221; that Amazon did not remove all my books from sale on Amazon.com, they only removed the books that I self-published. The two other books that I have authored, Amazon continues selling because they have been published by big publishers:</p>
<ol>
<li><a href="http://amzn.to/2s8Yx0a">Practical mod_perl</a></li>
<li><a href="http://amzn.to/2sfiKSV">mod_perl 2 User&#8217;s Guide</a></li>
</ol>
<p>So Amazon still sells books by a criminal, which they alleged me to be. This is so inconsistent and illogical.</p>
<p>The bottom line is that my effort to make a copyright-free digital work, that is no longer available, accessible to public, backfired with Amazon shutting down my account and banning me from ever publishing books with it. Very nice job, Amazon.</p>
<p>I don&#8217;t really care for my account reinstating, since I will not ever work with Amazon KDP again out of my own choice. I don&#8217;t work with people or companies, who lack integrity, play ambiguous games and display so much incompetence. It&#8217;s just sad that the company won&#8217;t honestly communicate with authors, who they rely on to make their profit.</p>
<p>Good bye, Amazon KDP.</p>
<p>p.s. I patiently waited for almost 3 weeks hoping this gets resolved, but clearly the one thing KDP wouldn&#8217;t do is to admit its mistake and reinstate my account. Meanwhile my books have been unavailable for sale. I need to start looking for a new publisher. If you have a good experience with a different publisher, please kindly share. My <a href="https://stasosphere.com/experience-life/my-books/">recent 2 books</a> are in the self-help/health category.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/88-how-amazon-discriminates-against-small-publishers/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>How To Get Hired &#8211; an Infallible Scheme Invented by Mark Twain</title>
		<link>https://stasosphere.com/entrepreneur-being/60-how-to-get-hired-scheme-by-mark-twain/</link>
					<comments>https://stasosphere.com/entrepreneur-being/60-how-to-get-hired-scheme-by-mark-twain/#comments</comments>
		
		<dc:creator><![CDATA[stas]]></dc:creator>
		<pubDate>Wed, 16 Dec 2015 20:29:46 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<guid isPermaLink="false">http://stasosphere.com/entrepreneur-being/?p=60</guid>

					<description><![CDATA[Mark Twain invented an infallible scheme for anyone to get hired. He describes his scheme and how he got three men to get a job using this scheme in the following text. The approach worked more than 100 years ago and chances are that it&#8217;ll still work just the same today as human nature hasn&#8217;t [&#8230;]]]></description>
										<content:encoded><![CDATA[<p><a href="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2015/12/young-mark-twain.jpg" rel="attachment wp-att-61"><img loading="lazy" decoding="async" class="size-medium wp-image-61 alignleft" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2015/12/young-mark-twain-255x300.jpg" alt="young mark twain" width="255" height="300" srcset="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2015/12/young-mark-twain-255x300.jpg 255w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2015/12/young-mark-twain-768x904.jpg 768w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2015/12/young-mark-twain-870x1024.jpg 870w, https://stasosphere.com/entrepreneur-being/wp-content/uploads/2015/12/young-mark-twain.jpg 1000w" sizes="auto, (max-width: 255px) 100vw, 255px" /></a>Mark Twain invented an infallible scheme for anyone to get hired. He describes his scheme and how he got three men to get a job using this scheme in the following text. The approach worked more than 100 years ago and chances are that it&#8217;ll still work just the same today as human nature hasn&#8217;t really changed that much in such a &#8220;short&#8221; period of time.</p>
<hr width="50%" />
<p>Higbie was the first person to profit by my great and infallible scheme for finding work for the unemployed. I have tried that scheme, now and then, for forty-four years. So far as I am aware it has always succeeded, and it is one of my high prides that I invented it, and that in basing it upon what I conceived to be a fact of human nature I estimated that fact of human nature accurately.</p>
<p>Higbie and I were living in a cotton-domestic lean-to at the base of a mountain. It was very cramped quarters, with barely room for us and the stove-wretched quarters indeed, for every now and then, between eight in the morning and eight in the evening, the thermometer would make an excursion of fifty degrees. We had a silver-mining claim under the edge of a hill half a mile away, in partnership with Bob Howland and Horatio Phillips, and we used to go there every morning carrying with us our luncheon, and remain all day picking and blasting in our shaft, hoping, despairing, hoping again, and gradually but surely running out of funds. At last, when we were clear out and still had struck nothing, we saw that we must find some other way of earning a living. I secured a place in a near-by quartz mill to screen sand with a long-handled shovel. I hate a long-handled shovel. I never could learn to swing it properly. As often as any other way the sand didn&#8217;t reach the screen at all, but went over my head and down my back, inside of my clothes. It was the most detestable work I have ever engaged in, but it paid ten dollars a week and board-and the board was worth while, because it consisted not only of bacon, beans, coffee, bread and molasses, but we had stewed dried apples every day in the week just the same as if it were Sunday. But this palatial life, this gross and luxurious life, had to come to an end, and there were two sufficient reasons for it. On my side, I could not endure the heavy labor; and on the Company&#8217;s side, they did not feel justified in paying me to shovel sand down my back; so I was discharged just at the moment that I was going to resign.</p>
<p><span id="more-60"></span>If Higbie had taken that job all would have been well and everybody satisfied, for his great frame would have been competent. He was muscled like a giant. He could handle a long-handled shovel like an emperor, and he could work patiently and contentedly twelve hours on a stretch without ever hastening his pulse or his breath. Meantime, he had found nothing to do, and was somewhat discouraged. He said, with an outburst of pathetic longing, &#8220;If I could only get a job at the Pioneer!&#8221;</p>
<p>I said &#8220;What kind of a job do you want at the Pioneer?&#8221;</p>
<p>He said &#8220;Why, laborer. They get five dollars a day.&#8221;</p>
<p>I said &#8220;If that&#8217;s all you want I can arrange it for you.&#8221;</p>
<p>Higbie was astonished. He said &#8220;Do you mean to say that you know the foreman there and could get me a job and yet have never said anything about it?&#8221;</p>
<p>&#8220;No&#8221; I said, &#8220;I don&#8217;t know the foreman.&#8221;</p>
<p>&#8220;Well&#8221; he said, &#8220;who is it you know? How is it you can get me the job?&#8221;</p>
<p>&#8220;Why,&#8221; I said, &#8220;that&#8217;s perfectly simple. If you will do as I tell you to do, and don&#8217;t try to improve on my instructions, you shall have the job before night.&#8221;</p>
<p>He said eagerly &#8220;I&#8217;ll obey the instructions, I don&#8217;t care what they are.&#8221;</p>
<p>&#8220;Well,&#8221; I said, &#8220;go there and say that you want work as a laborer; that you are tired of being idle; that you are not used to being idle, and can&#8217;t stand it; that you just merely want the refreshment of work, and require nothing in return.&#8221;</p>
<p>He said &#8220;Nothing?&#8221;</p>
<p>I said, &#8220;That&#8217;s it-nothing.&#8221;</p>
<p>&#8220;No wages at all?&#8221;</p>
<p>&#8220;No, no wages at all.&#8221;</p>
<p>&#8220;Not even board?&#8221;</p>
<p>&#8220;No, not even board. You are to work for nothing. Make them understand that-that you are perfectly willing to work for nothing. When they look at that figure of yours that foreman will understand that he has drawn a prize. You&#8217;ll get the job.&#8221;</p>
<p>Higbie said indignantly, &#8220;Yes, a hell of a job.&#8221;</p>
<p>I said, &#8220;You said you were going to do it, and now you are already criticising. You have said you would obey my instructions. You are always as good as your word. Clear out, now, and get the job.&#8221;</p>
<p>He said he would.</p>
<p>I was pretty anxious to know what was going to happen-more anxious than I would have wanted him to find out. I preferred to seem entirely confident of the strength of my scheme, and I made good show of that confidence. But really I was very anxious. Yet I believed that I knew enough of human nature to know that a man like Higbie would not be flung out of that place without reflection when he was offering those muscles of his for nothing. The hours dragged along and he didn&#8217;t return. I began to feel better and better. I began to accumulate confidence. At sundown he did at last arrive and I had the joy of knowing that my invention had been a fine inspiration and was successful.</p>
<p>He said the foreman was so astonished at first that he didn&#8217;t know how to take hold of the proposition, but that he soon recovered and was evidently very glad that he was able to accommodate Higbie and furnish him the refreshment he was pining for.</p>
<p>Higbie said &#8220;How long is this to go on?&#8221;</p>
<p>I said &#8220;The terms are that you are to stay right there; do your work just as if you were getting the going wages for it. You are never to make any complaint; you are never to indicate that you would like to have wages or board. This will go on one, two, three, four, five, six days, according to the make of that foreman. Some foremen would break down under the strain in a couple of days. There are others who would last a week. It would be difficult to find one who could stand out a whole fortnight without getting ashamed of himself and offering you wages. Now let&#8217;s suppose that this is a fortnight-foreman. In that case you will not be there a fortnight. Because the men will spread it around that the very ablest laborer in this camp is so fond of work that he is willing and glad to do it without pay. You will be regarded as the latest curiosity. Men will come from the other mills to have a look at you. You could charge admission and get it, but you mustn&#8217;t do that. Stick to your colors. When the foremen of the other mills cast their eyes upon this bulk of yours and perceive that you are worth two ordinary men they&#8217;ll offer you half a man&#8217;s wages. You are not to accept until you report to your foreman. Give him an opportunity to offer you the same. If he doesn&#8217;t do it then you are free to take up with that other man&#8217;s offer. Higbie, you&#8217;ll be foreman of a mine or a mill inside of three weeks, and at the best wages going.&#8221;</p>
<p>It turned out just so-and after that I led an easy life, with nothing to do, for it did not occur to me to take my own medicine. I didn&#8217;t want a job as long as Higbie had one. One was enough for so small a family-and so during many succeeding weeks I was a gentleman of leisure, with books and newspapers to read and stewed dried apples every day for dinner the same as Sunday, and I wanted no better career than this in this life. Higbie supported me handsomely, never once complained of it, never once suggested that I go out and try for a job at no wages and keep myself.</p>
<p>That would be in 1862. 1862 I parted from Higbie about the end of 62 or possibly it could have been the beginning of 63 and went to Virginia City, for I had been invited to come there and take William H. Wright&#8217;s place as sole reporter on the Territorial Enterprise and do Wright&#8217;s work for three months while he crossed the plains to Iowa to visit his family. However I have told all about this in &#8220;Roughing It.&#8221;</p>
<p>I have never seen Higbie since, in all these forty-four years.</p>
<p>Shortly after my marriage, in 1870, 1870 I received a letter from a young man in St. Louis who was possibly a distant relative of mine-I don&#8217;t remember now about that-but his letter said that he was anxious and ambitious to become a journalist-and would I send him a letter of introduction to some St. Louis newspaper and make an effort to get him a place as a reporter-It was the first time I had had an opportunity to make a new trial of my great scheme. I wrote him and said I would get him a place on any newspaper in St. Louis; he could choose the one he preferred, but he must promise me to faithfully follow out the instructions which I should give him. He replied that he would follow out those instructions to the letter and with enthusiasm. His letter was overflowing with gratitude-premature gratitude. He asked for the instructions. I sent them. I said he must not use a letter of introduction from me or from any one else. He must go to the newspaper of his choice and say that he was idle, and weary of being idle, and wanted work-that he was pining for work, longing for work-that he didn&#8217;t care for wages, didn&#8217;t want wages, but would support himself-he wanted work, nothing but work, and not work of a particular kind, but any kind of work they would give him to do. He would sweep out the editorial rooms; he would keep the ink-stands full, and the mucilage bottles, he would run errands, he would make himself useful in every way he could.</p>
<p>I suspected that my scheme would not work with everybody-that some people would scorn to labor for nothing, and would think it matter for self-contempt; also that many persons would think me a fool to suggest such a project; also that many persons would not have character enough to go into the scheme in a determined way and test it. I was interested to know what kind of a candidate this one was, but of course I had to wait some time to find out. I had told him he must never ask for wages; he must never be beguiled into making that mistake; that sooner or later an offer of wages would come from somewhere, and in that case he must go straight to his employer and give him the opportunity to offer him the like wages, in which case he must stay where he was-that as long as he was in anybody&#8217;s employ he must never ask for an advance of wages; that would always come from somewhere else if he proved his worthiness.</p>
<p>The scheme worked again. That young fellow chose his paper, and during the first few days he did the sweeping out and other humble work; and kept his mouth shut. After that the staff began to take notice of him. They saw that they could employ him in lots of ways that saved time and effort for them at no expense. They found that he was alert and willing. They began presently to widen his usefulness. Then he ventured to risk another detail of my instructions; I had told him not to be in a hurry about it, but to make his popularity secure first. He took up that detail now. When he was on his road between office and home, and when he was out on errands, he kept his eyes open and whenever he saw anything that could be useful in the local columns he wrote it out, then went over it and abolished adjectives, went over it again and extinguished other surplusages, and finally when he got it boiled down to the plain facts with the ruffles and other embroideries all gone, he laid it on the city editor&#8217;s desk. He scored several successes, and saw his stuff go into the paper unpruned. Presently the city editor when short of help sent him out on an assignment. He did his best with it, and with good results. This happened with more and more frequency. It brought him into contact with all the reporters of all the newspapers. He made friends with them and presently one of them told him of a berth that was vacant, and that he could get it and the wages too. He said he must see his own employers first about it. In strict accordance with my instructions he carried the offer to his own employers, and the thing happened which was to be expected. They said they could pay that wage as well as any other newspaper-stay where he was.</p>
<p>This young man wrote me two or three times a year and he always had something freshly encouraging to report about my scheme. Now and then he would be offered a raise by another newspaper. He carried the news to his own paper; his own paper stood the raise every time and he remained there. Finally he got an offer which his employers could not meet and then they parted. This offer was a salary of three thousand a year, to be managing editor on a daily in a Southern city of considerable importance, and it was a large wage for that day and region. He held that post three years. After that I never heard of him any more.</p>
<p>About 1886 my nephew, Samuel E. Moffett, a youth in the twenties, lost his inherited property and found himself obliged to hunt for something to do by way of making a living. He was an extraordinary young fellow in several ways. A nervous malady had early unfitted him for attending school in any regular way, and he had come up without a school education-but this was no great harm for him, for he had a prodigious memory and a powerful thirst for knowledge. At twelve years he had picked up, through reading and listening, a large and varied treasury of knowledge, and I remember one exhibition of it which was very offensive to me. He was visiting in our house and I was trying to build a game out of historical facts drawn from all the ages. I had put in a good deal of labor on this game, and it was hard labor, for the facts were not in my head. I had to dig them painfully out of the books. The boy looked over my work, found that my facts were not accurate and the game, as it stood, not usable. Then he sat down and built the whole game out of his memory. To me it was a wonderful performance, and I was deeply offended.</p>
<p>As I have said, he wrote me from San Francisco in his early twenties, and said he wanted to become a journalist, and would I send him some letters of introduction to the newspaper editors of that city-I wrote back and put him strictly under those same old instructions. I sent him no letter of introduction and forbade him to use one furnished by anybody else. He followed the instructions strictly. He went to work in the Examiner, a property of William R. Hearst. He cleaned out the editorial rooms and carried on the customary drudgeries required by my scheme. In a little while he was on the editorial staff at a good salary. After two or three years the salary was raised to a very good figure indeed. After another year or two he handed in his resignation-for in the meantime he had married and was living in Oakland, or one of those suburbs, and did not like the travel to and fro between the newspaper and his home in the late hours of the night and the morning. Then he was told to stay in Oakland, write his editorials there and send them over, and the large salary was continued. By and by he was brought to New York to serve on Mr. Hearst&#8217;s New York paper, and when he finally resigned from that employment he had been in Mr. Hearst&#8217;s employ sixteen years without a break. Then he became an editorial writer on the New York World with the privilege of living out of town and sending his matter in. His wage was eight thousand dollars a year. A couple of years ago Collier&#8217;s Weekly offered him an easy berth and one which was particularly desirable in his case, since it would deal mainly with historical matters, past and present and that was an industry which he liked. The salary was to be ten thousand dollars. He came to me for advice, and I told him to accept, which he did. When Mr. Pulitzer found that he was gone from the World he was not pleased with his managing editor for letting him go, but his managing editor was not to blame. He didn&#8217;t know that Moffett was going until he received his resignation. Pulitzer offered Moffett a billet for twenty years, this term to be secured in such a way that it could not be endangered by Pulitzer&#8217;s death, and to this offer was added the extraordinary proposition that Moffett could name his own salary. But of course Moffett remains with Collier, his agreement with Collier&#8217;s having been already arrived at satisfactorily to both parties.</p>
<hr width="50%" />
<p>Written by Mark Twain on Tuesday, March 27, 1906 in <a href="http://www.amazon.com/dp/B00413QAFG/?tag=theultimatlearna" rel="nofollow">Autobiography of Mark Twain Vol 1.</a> pages 447-451.</p>
<p>The amazing <a href="http://www.amazon.com/dp/B00413QAFG/?tag=theultimatlearna" rel="nofollow">Autobiography of Mark Twain</a> was published 100 years after his death. Samuel Langhorne Clemens (aka Mark Twain) willed that &#8211; so that he could write freely and without censorship to make sure his words won&#8217;t &#8220;hurt&#8221; anybody alive with his honest and direct account of the matters at hand. This unusual-unconstrained style of delivery for that era makes it a very delightful read. The full text is also available freely at <a href="http://www.marktwainproject.org" rel="nofollow">marktwainproject.org</a>.</p>
<p>If you have used this strategy or know someone who did please share with the rest of the readers. Thank you.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/60-how-to-get-hired-scheme-by-mark-twain/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>How to Get Your Business Cards for Free</title>
		<link>https://stasosphere.com/entrepreneur-being/34-how-to-get-your-business-cards-for-free/</link>
					<comments>https://stasosphere.com/entrepreneur-being/34-how-to-get-your-business-cards-for-free/#comments</comments>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Fri, 11 Nov 2011 08:09:34 +0000</pubDate>
				<category><![CDATA[Advertising]]></category>
		<guid isPermaLink="false">http://stasosphere.com/entrepreneur-being/?p=34</guid>

					<description><![CDATA[update: this is an old article &#8211; vistaprint gives no free biz cards for free these days. Running a small business can be tough, especially in the current economy conditions where clients aren&#8217;t easy to come by, so keeping your expenses to the minimum is a certain way to improve your bottom line. When it [&#8230;]]]></description>
										<content:encoded><![CDATA[<p><strong>update: this is an old article &#8211; vistaprint gives no free biz cards for free these days.</strong></p>
<p>Running a small business can be tough, especially in the current economy conditions where clients aren&#8217;t easy to come by, so keeping your expenses to the minimum is a certain way to improve your bottom line. When it comes to promotional materials, one could spend a lot of money there. And business cards are usually the most commonly used way to promote one&#8217;s business. It&#8217;s the magic handshake protocol most entrepreneurs are so familiar with:</p>
<p>&#8211; &#8220;What do you do?&#8221;</p>
<p>&#8211; &#8220;I&#8217;m a Registered Massage Therapist. What about you?&#8221;</p>
<p>&#8211; &#8220;I&#8217;m a Reiki Master, and here is my business card. Can I have your card?&#8221;</p>
<p>&#8211; &#8220;Of course, here it is.&#8221;</p>
<p>The Internet has been around for a relatively long time by now, yet, business cards aren&#8217;t a thing of a past and it&#8217;ll probably take quite some time before one will not carry any around with them anymore and not leave a few here and there on various posting boards in cafés and community places.</p>
<p>Business cards have been around for centuries (In China since 15th century, Europe &#8211; 17th century) . So it&#8217;s not surprising that these days there are many shops that will print your business cards in various shapes and forms, using different fancy kinds of paper and color. But they can be quite expensive, especially if someone designs them from scratch &#8211; since you&#8217;re paying for their time, their rent costs, etc.</p>
<p><span id="more-34"></span></p>
<p>I&#8217;d like to introduce you to a different way of getting your cards done and for free (only paying for the shipment). I have been using VistaPrint for probably at least 5 years now. I&#8217;d get their 250 free business cards package about once a year and each time only pay for the shipment. You design your own card using their online card design software, which is not fancy but it does what you need &#8211; it lets you put the essential information on the business card, while making it look good. Here is one of the recent cards I&#8217;ve designed using vistaprint service.</p>
<div align="center"><img loading="lazy" decoding="async" class="aligncenter wp-image-35 size-full" style="align: center; vertical-align: middle;" title="vista-print-business-card" src="https://stasosphere.com/entrepreneur-being/wp-content/uploads/2011/11/vista-print-business-card.png" alt="vista-print-business-card" width="299" height="177" /></div>
<p>They have a limited choice of backgrounds which are included in the free business card package and for a small extra cost you can add your own or browse through hundreds of other background designs. It&#8217;s tempting to spend a little extra, but for me I always was able to just choose one I liked from the free ones and go with it.</p>
<p>Once you&#8217;re happy with your card&#8217;s design, starts the process of going to the check out page. If you&#8217;ve ever been to Ikea, where they drag you through the whole shop to show you all of their wares, the process is similar here. They will try to upsell you 10 other things before you reach to the page where you do your payment: magnets, mugs, stickers, etc., etc.. Unless you really <strong>need</strong> any of those things, simply quickly press &#8220;Next&#8221; and get to the checkout page, pay and be done with it. The only cost here is the shipment which usually comes to about $9. It looks like they ship pretty much anywhere in the world.</p>
<p>The only one important thing to mention is that they will offer you to have a blank back side of the card for something like $5 extra, as compared to the little one-line text advertising of vistaprint site on the back of the card. I won&#8217;t pay the $5, because I actually like having that advertising there and I tell anybody I meet about the great deal vistaprint gives to people and show them the back of the card so they won&#8217;t forget the url of the site. I think vistaprint should pay me to have that ad on the back on the card, rather than me paying them to remove it, since I actively help them to promote their business. If you really don&#8217;t want the little ad, I&#8217;d say choose to put the calendar for the coming year, people are more likely to carry your card around, as small year calendars are always handy, even in the digital edge.</p>
<p>Once you paid &#8211; the package should arrive within a few weeks. I ordered probably about 10 times from them so far and never had any quality or delivery problems.</p>
<p>As I have only ever used one free business card printing company I don&#8217;t know of any others I can recommend, so if you have other recommendations for print shops that give away free business cards, let me know and I&#8217;ll add the link to other companies as well. Thank you.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://stasosphere.com/entrepreneur-being/34-how-to-get-your-business-cards-for-free/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
