<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>coding.vision</title>
    <description>Dan's Programming Notebook
</description>
    <link>https://codingvision.net/</link>
    <atom:link href="https://codingvision.net/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Wed, 10 Feb 2021 08:31:26 +0000</pubDate>
    <lastBuildDate>Wed, 10 Feb 2021 08:31:26 +0000</lastBuildDate>
    <generator>Jekyll v3.9.0</generator>
    
      <item>
        <title>Build Tesseract 5 in Conda Environment</title>
        <description>&lt;p&gt;Here’s a short guide to building &lt;strong&gt;Tesseract 5&lt;/strong&gt; from source (master branch on GitHub).&lt;/p&gt;

&lt;p&gt;I’m writing this mainly because conda offers as packages only versions of Tesseract up to 4.1.1 – at least at this moment. The other reason is that the cluster I’m compiling Tesseract on is running a CentOS 7 and permitting only inside-environment changes so I can’t install packages with yum.&lt;/p&gt;

&lt;h5 id=&quot;in-this-guide-im-using-gccg-version-620-it-is-recommended-to-use-recent-versions-when-compiling-tesseract-5-for-example-the-build-fails-with-gccg-485&quot;&gt;In this guide I’m using &lt;strong&gt;gcc/g++&lt;/strong&gt; version &lt;strong&gt;6.2.0&lt;/strong&gt;; it is recommended to use recent versions when compiling Tesseract 5. For example, the build fails with gcc/g++ 4.8.5.&lt;/h5&gt;

&lt;h2 id=&quot;building-steps&quot;&gt;Building Steps&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;Create your &lt;strong&gt;conda environment&lt;/strong&gt; and activate it:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;conda create &lt;span class=&quot;nt&quot;&gt;--name&lt;/span&gt; tess-build 
conda activate tess-build
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Install the following dependencies. You’ll need at least leptonica 1.74 for this to work - I’m using 1.78.0.
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;conda &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; conda-forge automake
conda &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; conda-forge libtool
conda &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; conda-forge pkgconfig
conda &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; conda-forge leptonica
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Clone the latest Tesseract version from the master branch and navigate into the directory:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;git clone https://github.com/tesseract-ocr/tesseract.git
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;tesseract
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Run the following scripts to prepare the building process
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;./autogen.sh
./configure
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Conda might not include the path to its libraries inside the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LD_LIBRARY_PATH&lt;/code&gt; environment variable. I had to include it manually otherwise the build fails during linking:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;LD_LIBRARY_PATH&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$LD_LIBRARY_PATH&lt;/span&gt;:~/.conda/envs/tess-build/lib
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Run the makefile:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;make
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Set the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TESSDATA_PREFIX&lt;/code&gt; environment variable in order to inform Tesseract where to look for language packs; also download the &lt;strong&gt;eng&lt;/strong&gt; (default) language pack into &lt;strong&gt;tessdata&lt;/strong&gt;
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;TESSDATA_PREFIX&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$HOME&lt;/span&gt;/tesseract/tessdata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata &lt;span class=&quot;nt&quot;&gt;-P&lt;/span&gt; tessdata/
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;See if it works:
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;(tess-build) [dan.sporici@hpsl-wn02 tesseract]$ ./tesseract -v
tesseract 5.0.0-alpha-781-gb19e3
leptonica-1.78.0
libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.1
Found AVX
Found SSE
Found OpenMP 201511
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;(tess-build) [dan.sporici@hpsl-wn02 tesseract]$ ./tesseract --list-langs
List of available languages (1):
eng
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;possible-leptonica-linking-issue&quot;&gt;Possible Leptonica Linking Issue&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;/usr/bin/ld: warning: libpng16.so.16, needed by /.conda/envs/tess-build/lib/liblept.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libjpeg.so.9, needed by /.conda/envs/tess-build/lib/liblept.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libgif.so.7, needed by /.conda/envs/tess-build/lib/liblept.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libwebp.so.7, needed by /.conda/envs/tess-build/lib/liblept.so, not found (try using -rpath or -rpath-link)
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `png_create_read_struct@PNG16_0'
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `DGifOpen'
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `png_get_PLTE@PNG16_0'
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `jpeg_std_error@LIBJPEG_9.0' 
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `png_write_image@PNG16_0'
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `EGifPutScreenDesc'
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `EGifPutComment'
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `WebPEncodeRGBA'
[...]
/.conda/envs/tess-build/lib/liblept.so: undefined reference to `png_init_io@PNG16_0'
collect2: error: ld returned 1 exit status
make[2]: *** [tesseract] Error 1
make[2]: Leaving directory `/tesseract'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tesseract'
make: *** [all] Error 2
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This happens because the libraries in cause (&lt;strong&gt;libpng16.so&lt;/strong&gt;, &lt;strong&gt;libjpeg.so&lt;/strong&gt;, &lt;strong&gt;libgif.so&lt;/strong&gt;, &lt;strong&gt;libwebp.so&lt;/strong&gt;) are not found in the directories included in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LD_LIBRARY_PATH&lt;/code&gt;.
If step 5 doesn’t work (although it should), you might be able to get around this by modifying the &lt;strong&gt;Makefile&lt;/strong&gt; and adding the libraries yourself after &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-llept&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-make highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;nv&quot;&gt;LEPTONICA_LIBS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-L&lt;/span&gt;/.conda/envs/tess-build/lib &lt;span class=&quot;nt&quot;&gt;-llept&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-lz&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-lpng16&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-ljpeg&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-lgif&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-lwebp&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you follow this approach, you need to copy the libraries to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tesseract/.libs&lt;/code&gt; otherwise you’ll get:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;(tess-build) [dan.sporici@hpsl-wn02 tesseract]$ ./tesseract
/tesseract/.libs/lt-tesseract: error while loading shared libraries: liblept.so.5: cannot open shared object file: No such file or directory
/tesseract/.libs/lt-tesseract: error while loading shared libraries: libpng16.so.16: cannot open shared object file: No such file or directory
/tesseract/.libs/lt-tesseract: error while loading shared libraries: libjpeg.so.9: cannot open shared object file: No such file or directory
/tesseract/.libs/lt-tesseract: error while loading shared libraries: libgif.so.7: cannot open shared object file: No such file or directory 
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That all; I hope this helps.&lt;/p&gt;
</description>
        <pubDate>Tue, 15 Sep 2020 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/build-tesseract-5-in-conda-environment</link>
        <guid isPermaLink="true">https://codingvision.net/build-tesseract-5-in-conda-environment</guid>
        
        <category>tesseract</category>
        
        <category>conda</category>
        
        <category>ocr</category>
        
        
      </item>
    
      <item>
        <title>PyTorch CRNN: Seq2Seq Digits Recognition w/ CTC</title>
        <description>&lt;p&gt;This article discusses handwritten character recognition (&lt;strong&gt;OCR&lt;/strong&gt;) in images using &lt;em&gt;sequence-to-sequence&lt;/em&gt; (&lt;strong&gt;seq2seq&lt;/strong&gt;) mapping performed by a &lt;em&gt;Convolutional Recurrent Neural Network&lt;/em&gt; (&lt;strong&gt;CRNN&lt;/strong&gt;) trained with &lt;em&gt;Connectionist Temporal Classification&lt;/em&gt; (&lt;strong&gt;CTC&lt;/strong&gt;) loss. The aforementioned approach is employed in multiple modern OCR engines for handwritten text (e.g., &lt;a href=&quot;https://arxiv.org/pdf/1902.10525.pdf&quot; rel=&quot;nofollow&quot;&gt;Google’s Keyboard App&lt;/a&gt; - convolutions are replaced with Bezier interpolations) or typed text (e.g., &lt;a href=&quot;https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/6ModernizationEfforts.pdf&quot; rel=&quot;nofollow&quot;&gt;Tesseract 4’s CRNN Based Recognition Module&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;For the sake of simplicity, the example I’ll be presenting performs only digit recognition but can be easily extended to accommodate more classes of characters.&lt;/p&gt;

&lt;h5 id=&quot;the-overall-source-code-for-this-project-is-quite-long-so-im-providing-a-google-colab-document-that-includes-a-working-example&quot;&gt;The overall source code for this project is quite long so I’m providing a &lt;a href=&quot;https://colab.research.google.com/drive/1VRyObLgslpzeB33xITPdm_3E2cAxLuX3?usp=sharing&quot; rel=&quot;nofollow&quot;&gt;Google Colab&lt;/a&gt; document that includes a working example.&lt;/h5&gt;

&lt;h2 id=&quot;previous-inadequacies-and-justification&quot;&gt;Previous Inadequacies and Justification&lt;/h2&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Why not simply segment characters in the image and recognize them one by one?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While the approach is, indeed, more straightforward and has been incorporated in older OCR engines, it has its caveats, especially when considering handwritten text. These are caused by the imperfections of the written characters which can produce segmentation issues thus attempting to recognize invalid glyphs or symbols. Consider the following images for clarification:&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/fragmented-characters.png&quot; alt=&quot;A fragmented '5' is segmented as 2 different characters that are later passed to the recognition module. &quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;A fragmented ‘5’ is segmented as 2 different characters that are later passed to the recognition module.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/merged-characters.png&quot; alt=&quot;The first 2 digits are 'merged' together and considered a single character by both segmentation mechanism and OCR engine.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;The first 2 digits are ‘merged’ together and considered a single character by both segmentation mechanism and OCR engine.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Whereas the MNIST problem is considered solved thus implying that reliable classifiers can be constructed to individually recognize digits, the problem of correct segmentation still remains in realistic scenarios. Splitting or merging glyphs to form valid digits proves to be a difficult challenge and requires additional knowledge to be embedded into the segmentation module.&lt;/p&gt;

&lt;h2 id=&quot;seq2seq-classifications&quot;&gt;Seq2Seq Classifications&lt;/h2&gt;

&lt;p&gt;In this context, the main advantage brought by a &lt;strong&gt;seq2seq&lt;/strong&gt; classifier is that it diminishes the impact of erroneous segmentations and takes advantage of the ability of a neural network to generalize. It only requires a valid segmentation of the word or text line in cause.&lt;/p&gt;

&lt;p&gt;Consider the following simplistic model that has a &lt;strong&gt;sliding window&lt;/strong&gt; or &lt;strong&gt;mask&lt;/strong&gt; (no convolutions), of size &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(1, img_height)&lt;/code&gt;. Each set of pixels covered by the sliding window is fed into a neural network made out of neurons with &lt;strong&gt;memory&lt;/strong&gt; (e.g., &lt;strong&gt;GRU&lt;/strong&gt; or &lt;strong&gt;LSTM&lt;/strong&gt;); the job of the neural network is to take a sequence of such stripes and output recognized digits. Take a look at the following figure:&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/one-digit-rnn.png&quot; alt=&quot;The RNN learns to recognize the digit '5' only by seeing stripes of width equal to 1 of the digit in cause - think of it as time series; by combining information from previous and current inputs, the RNN can determine the correct class.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;The RNN learns to recognize the digit ‘5’ only by seeing stripes of width equal to 1 of the digit in cause - think of it as time series; by combining information from previous and current inputs, the RNN can determine the correct class.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Multiple digits will be included in a single sequence - because we’re feeding the network an image which contains more than a digit. It is up to the neural network to determine during the training phase how many stripes to take into account when classifying a digit (i.e., how much to memorize). The image below illustrates how a RNN should ‘group’ stripes together in order to recognize each digit in the sequence.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/rnn-ctc-ocr.png&quot; alt=&quot;The RNN receives sequences of 'vertical' arrays of pixels (stripes) covered by the sliding window of width equal to 1; once trained, the RNN will be able to memorize that certain sequences of arrays (here in colors) form specific digits and properly separate multiple digits (i.e., 'change the colors') even though they are merged in the given image.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;The RNN receives sequences of ‘vertical’ arrays of pixels (stripes) covered by the sliding window of width equal to 1; once trained, the RNN will be able to memorize that certain sequences of arrays (here in colors) form specific digits and properly separate multiple digits (i.e., ‘change the colors’) even though they are merged in the given image.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Using this method, it is possible to train a neural network by simply saying that the image above contains the numbers ‘&lt;strong&gt;55207&lt;/strong&gt;’, without further information (e.g.: alignment, delimitations, bounding boxes etc.)&lt;/p&gt;

&lt;h2 id=&quot;ctc-and-duplicates-removal&quot;&gt;CTC and Duplicates Removal&lt;/h2&gt;

&lt;p&gt;CTC loss is most commonly employed to train seq2seq RNNs. It works by &lt;strong&gt;summing&lt;/strong&gt; the &lt;strong&gt;probabilities for all possible alignments&lt;/strong&gt;; the &lt;strong&gt;probability of an alignment&lt;/strong&gt; is determined by &lt;strong&gt;multiplying&lt;/strong&gt; the probabilities of having specific digits in certain slots. An alignment can be seen as a plausible sequence of recognized digits.&lt;/p&gt;

&lt;p&gt;Going back to the ‘&lt;strong&gt;55207&lt;/strong&gt;’ example, we can express the probability of the alignment \(A_{55207}\) as follows:&lt;/p&gt;

\[P(A_{55207}) = P(A_1 = 5) \cdot P(A_2 = 5) \cdot P(A_3 = 2) \cdot P(A_4 = 0) \cdot P(A_5 = 7)\]

&lt;p&gt;To properly remove duplicates and also correctly handle numbers that contain repeating digits, the &lt;strong&gt;blank&lt;/strong&gt; class is introduced, with the following rules:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;2 (or more) &lt;strong&gt;repeating digits&lt;/strong&gt; are &lt;strong&gt;collapsed&lt;/strong&gt; into a single instance of that digit unless separated by &lt;strong&gt;blank&lt;/strong&gt; - this compensates for the fact that the RNN performs a classification for each stripe that represents a part of a digit (thus producing duplicates)&lt;/li&gt;
  &lt;li&gt;multiple &lt;strong&gt;consecutive blanks&lt;/strong&gt; are &lt;strong&gt;collapsed&lt;/strong&gt; into one blank - this compensates for the spacing before, after or between the digits&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Given these aspects, there are multiple alignments that, once collapsed, lead to the correct alignment (‘&lt;strong&gt;55207&lt;/strong&gt;’).&lt;/p&gt;

&lt;p&gt;For example:
&lt;strong&gt;55-55222–07&lt;/strong&gt; once collapsed leads to ‘&lt;strong&gt;55207&lt;/strong&gt;’ and suggests the correct sequence even though it has a different structure because of additional duplicates and blanks (marked as ‘&lt;strong&gt;-&lt;/strong&gt;’ here). The probability of this alignment (\(A_{55-55222--07}\)) is computed as previously shown but it also includes the probabilities of the blank class:&lt;/p&gt;

\[P(A_{55-55222--07}) = P(A_1 = 5) \cdot P(A_2 = 5) \cdot P(A_3 = -) \cdot P(A_4 = 5) \cdot P(A_5 = 5) \cdot P(A_6 = 2) \cdot P(A_7 = 2) \cdot P(A_8 = 2) \cdot P(A_9 = -) \cdot P(A_{10} = -) \cdot P(A_{11} = 0) \cdot P(A_{12} = 7)\]

&lt;p&gt;Finally, the CTC probability of a sequence is calculated, as previously mentioned, by summing the probabilities for all different alignments:&lt;/p&gt;

\[P(S_{55207}) = \sum_{A \in Alignments(55207)}{A}\]

&lt;p&gt;When training, the neural network attempts to maximize this probability for the sequence provided as ground truth.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;decoding&lt;/strong&gt; method is used to recover the text from a set of digits probabilities; a naive approach would be to pick, for &lt;strong&gt;each slot&lt;/strong&gt; in the &lt;strong&gt;alignment&lt;/strong&gt;, the digits with the &lt;strong&gt;highest probability&lt;/strong&gt; and then collapse the result. This approach is easier to implement and might be enough for this example although &lt;strong&gt;beam search&lt;/strong&gt; (i.e.: greedy approach that picks first N digits with highest probabilities, instead of only one) is employed for such tasks in larger projects.&lt;/p&gt;

&lt;h2 id=&quot;including-convolutional-layers&quot;&gt;Including Convolutional Layers&lt;/h2&gt;

&lt;p&gt;Implementing convolutions in the previously described model simply implies that raw pixel information is replaced, in the input of the RNN, with higher level features. In PyTorch, the output of the convolution layers must be reshaped to the time sequence format &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(batch_size, sequence_length, gru_input_size)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the current project, the output of the convolution part has the following shape: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(batch_size, num_channels, convolved_img_height, convolved_img_width)&lt;/code&gt;. I’m permuting the tensor to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(batch_size, convolved_img_width, convolved_img_height, num_channels)&lt;/code&gt; and then reshaping the last 2 dimensions into one which becomes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gru_input_size&lt;/code&gt;).&lt;/p&gt;

&lt;h2 id=&quot;dataset-generation&quot;&gt;Dataset Generation&lt;/h2&gt;

&lt;p&gt;To avoid additional steps such as image preprocessing, segmentation and class balancing I picked a more friendly dataset: &lt;strong&gt;EMNIST&lt;/strong&gt; for digits. The following helper script randomly picks digits from the dataset, applies affine augmentations and concatenates them into sequences of a given length.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/dataset-example.png&quot; alt=&quot;Dataset example for the seq2seq CRNN - Input and Ground Truth&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Dataset example for the seq2seq CRNN - Input and Ground Truth&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;crnn-model&quot;&gt;CRNN Model&lt;/h2&gt;

&lt;p&gt;A LeNet-5 based convolution model is employed, with the following modifications:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;5x5 filters are replaced with 2 consecutive 3x3 filters&lt;/li&gt;
  &lt;li&gt;max-pooling is replaced with strided convolutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The resulted higher level features are fed into a &lt;strong&gt;Bi-GRU&lt;/strong&gt; RNN with a &lt;strong&gt;linear&lt;/strong&gt; layer in the end which has &lt;strong&gt;10&lt;/strong&gt; + 1 possible outputs ([0-9] digits + blank). I’ve chosen &lt;strong&gt;GRU&lt;/strong&gt; over &lt;strong&gt;LSTM&lt;/strong&gt; since it had similar results but required fewer resources. A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log_softmax&lt;/code&gt; activation function is used in the final layer since it the loss function (PyTorch’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CTCLoss&lt;/code&gt;) requires a logarithmized version of the output; also, this should provide better numerical properties as it highly penalizes incorrect classifications.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CRNN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Module&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;super&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CRNN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num_classes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;image_H&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stride&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stride&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;postconv_height&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;postconv_width&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_input_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;postconv_height&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_hidden_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;128&lt;/span&gt; 
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_num_layers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_cell&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GRU&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_input_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_hidden_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_num_layers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;batch_first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bidirectional&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Linear&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_hidden_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num_classes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;forward&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;batch_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leaky_relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leaky_relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leaky_relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leaky_relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leaky_relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leaky_relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;permute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reshape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;batch_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_input_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gru_h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_h&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gru_h&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;detach&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;log_softmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])])&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;reset_hidden&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;batch_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;zeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_num_layers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;batch_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_hidden_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gru_h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Variable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;crnn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CRNN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;criterion&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CTCLoss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blank&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reduction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'mean'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zero_infinity&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;optimizer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Adam&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;crnn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.001&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When performing backpropagation, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CTCLoss&lt;/code&gt; method will take the following parameters:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log_probabilities&lt;/code&gt; - this is the output from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log_softmax&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;targets&lt;/code&gt; - a tensor which contains the expected sequence of digits&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;input_lengts&lt;/code&gt; - the length of the input sequence after it is processed by the convolutional layers (i.e. post-convolution width)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;target_lengths&lt;/code&gt; - the length of the target sequence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last 2 parameters (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;input_lengths&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;target_lengths&lt;/code&gt;) are used to instruct the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CTCLoss&lt;/code&gt; function to ignore additional padding (in case you added padding to the imagine or the target sequences to fit them into a batch).&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log_probabilities&lt;/code&gt; will look like a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(T, C)&lt;/code&gt;-shaped tensor (T = number of timesteps, C = number of classes) and specifies, for teach timestep, the probability of it belonging in a specific class. This tensor is decoded into text using a &lt;strong&gt;best path&lt;/strong&gt; (greedy) approach: for each timestep, this algorithm picks the class with the maximum probability while also collapsing multiple occurences of the same character into one (unless they’re separated by a blank).&lt;/p&gt;

&lt;p&gt;In my implementation, I’ve used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y_pred.permute(1, 0, 2)&lt;/code&gt; to reorder the CRNN’s output so it matches the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CTCLoss&lt;/code&gt;’s desired input format.&lt;/p&gt;

&lt;p&gt;Another aspect you should pay attention to is resetting the &lt;strong&gt;hidden state&lt;/strong&gt; of the GRU layers (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;crnn.reset_hidden(batch_size)&lt;/code&gt;) before recognizing any new sequence; in my experience this provided better results.&lt;/p&gt;

&lt;p&gt;Feel free to check the code on my Google colab (link above) for further details.&lt;/p&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;

&lt;p&gt;I’ve tested the model using 10,000 generated sequences: 8,000 for training and 2,000 for testing. Below are the plots for training and testing loss and also the evolution of &lt;strong&gt;precision&lt;/strong&gt; - I’m considering that the dataset is approximately balanced. A &lt;em&gt;true positive&lt;/em&gt; (&lt;strong&gt;TP&lt;/strong&gt;) is counted only when the recognized sequence entirely matches the ground truth. The results are not ideal but I think the current model represents a decent starting point for greater projects.&lt;/p&gt;

&lt;p&gt;The CRNN manifests some overfitting behavior but the results are acceptable considering its purpose.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/loss-plot.png&quot; alt=&quot;Loss Evolution after 6 epochs&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Loss Evolution after 6 epochs&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-crnn-seq2seq-digits-recognition/precision-plot.png&quot; alt=&quot;Precision Evolution after 6 epochs&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Precision Evolution after 6 epochs&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;After 6 epochs, the CRNN successfully recognizes &lt;strong&gt;7567&lt;/strong&gt; out of &lt;strong&gt;8000&lt;/strong&gt; sequences in the training set and &lt;strong&gt;1776&lt;/strong&gt; out of &lt;strong&gt;2000&lt;/strong&gt; from the testing set.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/intuitively-understanding-connectionist-temporal-classification-3797e43a86c&quot; rel=&quot;nofollow&quot;&gt;An Intuitive Explanation of Connectionist Temporal Classification&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://actamachina.com/notebooks/2019/03/28/captcha.html&quot; rel=&quot;nofollow&quot;&gt;Solving CAPTCHA&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Thu, 30 Jul 2020 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/pytorch-crnn-seq2seq-digits-recognition-ctc</link>
        <guid isPermaLink="true">https://codingvision.net/pytorch-crnn-seq2seq-digits-recognition-ctc</guid>
        
        <category>pytorch</category>
        
        <category>ocr</category>
        
        <category>ctc</category>
        
        <category>python</category>
        
        <category>conv-neural-network</category>
        
        
      </item>
    
      <item>
        <title>Improving Tesseract 4's OCR Accuracy through Image Preprocessing</title>
        <description>&lt;p&gt;In this work I took a look at Tesseract 4’s performance at recognizing characters from a challenging dataset and proposed a minimalistic convolution-based approach for input image preprocessing that can boost the character-level &lt;strong&gt;accuracy&lt;/strong&gt; from &lt;strong&gt;13.4%&lt;/strong&gt; to &lt;strong&gt;61.6%&lt;/strong&gt; (+359% relative change), and the &lt;strong&gt;F1 score&lt;/strong&gt; from &lt;strong&gt;16.3%&lt;/strong&gt; to &lt;strong&gt;72.9%&lt;/strong&gt; (+347% relative change) on the aforementioned dataset. The convolution kernels are determined using reinforcement learning; moreover, to simulate the lack of ground truth in realistic scenarios, the &lt;strong&gt;training set&lt;/strong&gt; consists of only &lt;strong&gt;30&lt;/strong&gt; images while the &lt;strong&gt;testing set&lt;/strong&gt; includes &lt;strong&gt;10,000&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The dataset in cause is called &lt;a href=&quot;https://pero.fit.vutbr.cz/brno_mobile_ocr_dataset&quot; rel=&quot;nofollow&quot;&gt;Brno Mobile&lt;/a&gt;, and contains colored photographs of typed text, taken with handheld devices. Factors such as blurriness, low resolution, contrast, brightness are contributing to making the images challenging for an OCR engine.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/improving-tesseract-4-ocr-accuracy-through-image-preprocessing/dataset-sample.webp&quot; alt=&quot;Resized image from the Brno dataset which contains text that was not recognized by Tesseract 4 during the evaluation (an empty string was returned)&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Resized image from the Brno dataset which contains text that was not recognized by Tesseract 4 during the evaluation (an empty string was returned)&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;During this experiment, the &lt;em&gt;out of the box&lt;/em&gt; version of Tesseract 4 has been used, which implies:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;no retraining of the OCR engine&lt;/li&gt;
  &lt;li&gt;no lexicon / dictionary augmentations&lt;/li&gt;
  &lt;li&gt;no hints about the language used in the dataset&lt;/li&gt;
  &lt;li&gt;no hints about segmentation methods; default (automatic) segmentation is used&lt;/li&gt;
  &lt;li&gt;default settings for the recognition engine (LSTM + Tesseract)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;problem-analysis&quot;&gt;Problem Analysis&lt;/h2&gt;

&lt;p&gt;Tesseract 4 has proven great performance when tested on favorable datasets by achieving good balance between precision and recall. It is presumed that this evaluation is performed on images that resemble scanned documents or book pages (with or without additional preprocessing) in which the number of camera-caused distortions is minimal. Tests on the Brno dataset led to much worse performance that will be discussed later in the article.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/improving-tesseract-4-ocr-accuracy-through-image-preprocessing/tesseract-stats.webp&quot; alt=&quot;Tesseract 4's performance when evaluated using the Google Books Dataset - taken from [DAS 2016](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016){:rel='nofollow'}&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Tesseract 4’s performance when evaluated using the Google Books Dataset - taken from &lt;a href=&quot;https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016&quot; rel=&quot;nofollow&quot;&gt;DAS 2016&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In the above figure, a high &lt;strong&gt;precision&lt;/strong&gt; indicates favorable &lt;em&gt;True-Positives&lt;/em&gt; to &lt;em&gt;False-Positives&lt;/em&gt; ratio thus revealing proper differentiation between characters (i.e. a relatively small number of misclassifications). Despite this, almost no improvements in &lt;strong&gt;recall&lt;/strong&gt; can be observed when switching from the &lt;strong&gt;base&lt;/strong&gt; classification method to the &lt;em&gt;Long Short-Term Memory&lt;/em&gt; (&lt;strong&gt;LSTM&lt;/strong&gt;) based &lt;em&gt;Convolutional Recurrent Neural Network&lt;/em&gt; (&lt;strong&gt;CRNN&lt;/strong&gt;) for &lt;em&gt;sequence to sequence&lt;/em&gt; mapping.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Despite being designed over 20 years ago, the current Tesseract classifier is incredibly difficult to beat with so-called modern methods.” - Ray Smith, author of Tesseract&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I assume that further training for different fonts might not provide significant improvements and neither will a different model of classifier. &lt;em&gt;Is there a chance that the classifier doesn’t receive the correct input?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It was pointed out in a previous article that &lt;a href=&quot;https://codingvision.net/ai/evaluating-the-robustness-of-ocr-systems&quot;&gt;Tesseract is not robust to noise&lt;/a&gt;; certain &lt;em&gt;salt-and-pepper&lt;/em&gt; noise patterns disrupt the character recognition process, leading to large segments of text being completely ignored by the OCR engine - the infamous &lt;strong&gt;empty string&lt;/strong&gt;. From empirical observations, these errors seem to occur either for a whole word or sentence or not at all thus suggesting a weakness in the segmentation methodology.&lt;/p&gt;

&lt;p&gt;The existence of similar behavior, given images which present more natural distortions, is questioned - hence this experiment.&lt;/p&gt;

&lt;h2 id=&quot;black-box-considerations&quot;&gt;Black-box Considerations&lt;/h2&gt;

&lt;p&gt;Since analyzing Tesseract’s segmentation methods is a daunting task, I opted for an adaptive external image correction method. To avoid diving into Tesseract 4’s source code, the OCR engine is considered a black-box; in this case, an unsupervised learning method must be employed. This ensures easier transitions to other OCR engines as it doesn’t directly rely on concrete implementations but only on outputs - at the cost of processing power and optimality.&lt;/p&gt;

&lt;h2 id=&quot;proposed-solution&quot;&gt;Proposed Solution&lt;/h2&gt;
&lt;p&gt;The solution consists in directly preprocessing images before they are fed to Tesseract 4. An adaptive preprocessing operation is required, in order to properly compensate for any image features that cause problems in the segmentation process. In other words, an input image must be adapted so it complies with Tesseract 4’s preferences and maximizes the chance of producing the correct output, preferably without performing down-sampling.&lt;/p&gt;

&lt;p&gt;I choose a convolution-based approach for flexibility and speed; other articles tend to perform more rigid image adjustments (such as global changes in brightness, fixed-constant conversion to grayscale, histogram equalization, etc.). I preferred an approach that can properly learn to highlight or mask regions of the image according to various features. For this, the kernels are optimized using reinforcement learning using an actor-critic model. To be more specific, it relies on &lt;em&gt;Twin Delayed Deep Deterministic Policy Gradient&lt;/em&gt; (&lt;strong&gt;TD3&lt;/strong&gt; for short), for discovering features which minimize the &lt;em&gt;Levenshtein distance&lt;/em&gt; between the &lt;strong&gt;recognized text&lt;/strong&gt; and the &lt;strong&gt;ground truth&lt;/strong&gt;. I’ll not dive into implementation details of TD3 here as it would be somehow out of scope but think of it as a method of optimizing the following formula:&lt;/p&gt;

\[\max_{K1,K2,K3,K4,K5}\sum_{i=1}^{N}{-Levenshtein(OCR(Image_i * K1 * K2 * K3 * K4 * K5),Text_i)}\]

&lt;p&gt;Where \(K_j\) is a kernel, and \(&amp;lt;Image_i, Text_i&amp;gt;\) is a tuple from the training set.&lt;/p&gt;

&lt;h5 id=&quot;a-short-simpler-proof-of-concept-of-the-convolutional-preprocessor-is-presented-in-this-google-colab-it-uses-a-different-architecture-than-the-final-one-and-has-the-purpose-of-verifying-if-the-idea-of-using-convolutions-is-feasible-and-offers-good-results-a-comparison-is-presented-between-original-and-preprocessed-images-including-recognized-texts-for-each-sample&quot;&gt;A short (simpler) proof of concept of the convolutional preprocessor is presented in &lt;a href=&quot;https://colab.research.google.com/drive/1l0qT2S3tkY4WHTRbkVK_J5jATPg0t41-?usp=sharing&quot; rel=&quot;nofollow&quot;&gt;this Google Colab&lt;/a&gt;. It uses a different architecture than the final one and has the purpose of verifying if the idea of using convolutions is feasible and offers good results. A comparison is presented between original and preprocessed images including recognized texts for each sample.&lt;/h5&gt;

&lt;p&gt;The final model is illustrated below, with &lt;strong&gt;ReLU&lt;/strong&gt; activations after each convolution to capture nonlinearities and prevent having negative values as pixels’ colors.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/improving-tesseract-4-ocr-accuracy-through-image-preprocessing/convolutional-preprocessor.webp&quot; alt=&quot;Architecture of the Convolutional Preprocessor used to adapt images for Tesseract 4&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Architecture of the Convolutional Preprocessor used to adapt images for Tesseract 4&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;To properly compensate for image coloring and reduce the number of channels (&lt;span style=&quot;color:red&quot;&gt;R&lt;/span&gt;, &lt;span style=&quot;color:green&quot;&gt;G&lt;/span&gt;, &lt;span style=&quot;color:blue&quot;&gt;B&lt;/span&gt;), 1x1 convolutions are used. This prevents overfitting up to a point while also ensuring grayscale output. Further convolutions are applied only on the grayscale image.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symmetry constraints&lt;/em&gt; are additionally enforced for each 3x3 kernel in order to minimize the number of trainable parameters and avoid overfitting. This means that for a 3x3 kernel only 6 variables out of 9 must be determined while the rest can be generated through &lt;em&gt;mirroring&lt;/em&gt;. Below are the values I got for the five kernels (bold to emphasize symmetry):&lt;/p&gt;

&lt;table class=&quot;data-table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;#1&lt;/th&gt;
      &lt;th&gt;#2&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;#3&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;span style=&quot;color:red&quot;&gt;&lt;strong&gt;0.7&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
      &lt;td&gt;0.2573&lt;/td&gt;
      &lt;td&gt;-0.3&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.2996&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;span style=&quot;color:green&quot;&gt;&lt;strong&gt;1.3&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.3&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1.3&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.295&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1.2949&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;span style=&quot;color:blue&quot;&gt;&lt;strong&gt;1.3&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
      &lt;td&gt;0.2573&lt;/td&gt;
      &lt;td&gt;-0.3&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;-0.2802&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.2922&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;-0.2802&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;table class=&quot;data-table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;#4&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;#5&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;-0.2793&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.2395&lt;/td&gt;
      &lt;td&gt;0.2885&lt;/td&gt;
      &lt;td&gt;-0.294&lt;/td&gt;
      &lt;td&gt;-0.2905&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.2939&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;0.2395&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.7119&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1.162&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;-0.2905&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;0.2885&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.2828&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.2328&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.3&lt;/td&gt;
      &lt;td&gt;-0.294&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;preprocessing-results&quot;&gt;Preprocessing Results&lt;/h2&gt;

&lt;p&gt;I extracted the image from each convolution layer and clamped its values to the &lt;em&gt;0-255&lt;/em&gt; interval to properly view each transformation:&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/improving-tesseract-4-ocr-accuracy-through-image-preprocessing/transformations.webp&quot; alt=&quot;Transformations of an image as it passes through the convolutional preprocessor, viewed from left (original) to right (final sample); observe the removal of incomplete characters from the upper-left region&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Transformations of an image as it passes through the convolutional preprocessor, viewed from left (original) to right (final sample); observe the removal of incomplete characters from the upper-left region&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;comparison&quot;&gt;Comparison&lt;/h2&gt;

&lt;p&gt;I used 10,000 images from the testing set for the evaluation of the current methodology and compiled the following graphs. The differences between original and preprocessed samples are illustrated with three metrics of interest: &lt;em&gt;Character Error Rate&lt;/em&gt; (&lt;strong&gt;CER&lt;/strong&gt;), &lt;em&gt;Word Error Rate&lt;/em&gt; (&lt;strong&gt;WER&lt;/strong&gt;) and &lt;em&gt;Longest Common Subsequence Error&lt;/em&gt; (&lt;strong&gt;LCSE&lt;/strong&gt;). In this article, &lt;strong&gt;LCSE&lt;/strong&gt; is computed as follows:&lt;/p&gt;

\[LCSE(Text_1,Text_2 )=|Text_1 |-|LCS(Text_1,Text_2 )|+|Text_2 |-|LCS(Text_1,Text_2 )|\]

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/improving-tesseract-4-ocr-accuracy-through-image-preprocessing/results-comparison.webp&quot; alt=&quot;&amp;lt;span style='color:green'&amp;gt;Preprocessed&amp;lt;/span&amp;gt; vs &amp;lt;span style='color:red'&amp;gt;Original&amp;lt;/span&amp;gt; Images from the testing set; lower is better for each metric; dashed lines represent first degree approximations using least squares regression for the ease of interpretation&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;&lt;span style=&quot;color:green&quot;&gt;Preprocessed&lt;/span&gt; vs &lt;span style=&quot;color:red&quot;&gt;Original&lt;/span&gt; Images from the testing set; lower is better for each metric; dashed lines represent first degree approximations using least squares regression for the ease of interpretation&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Additionally, I plotted everything in histogram format to properly see the distributions of errors. For &lt;strong&gt;CER&lt;/strong&gt; and &lt;strong&gt;WER&lt;/strong&gt;, it is easy to observe the spikes around &lt;strong&gt;1&lt;/strong&gt; (100%) that suggest the aforementioned segmentation problem (at block-of-text level) produces the most frequent error (&lt;strong&gt;empty strings&lt;/strong&gt; are returned so all characters are wrong). In certain situations, the &lt;strong&gt;WER&lt;/strong&gt; is larger than &lt;strong&gt;1&lt;/strong&gt; because the preprocessing step introduces artifacts near the border of the image thus leading to recognition of non-existent characters. When looking at the &lt;strong&gt;LCSE&lt;/strong&gt; plot, a distribution shift can be seen from the original approximately gaussian shape with its peak (mode) near the average number of characters in an image (&lt;strong&gt;56.95&lt;/strong&gt;) to a more favorable shape with overall lower error rates.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/improving-tesseract-4-ocr-accuracy-through-image-preprocessing/results-distributions.webp&quot; alt=&quot;&amp;lt;span style='color:green'&amp;gt;Preprocessed&amp;lt;/span&amp;gt; vs &amp;lt;span style='color:red'&amp;gt;Original&amp;lt;/span&amp;gt; Images from the testing set; comparison of distributions of errors&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;&lt;span style=&quot;color:green&quot;&gt;Preprocessed&lt;/span&gt; vs &lt;span style=&quot;color:red&quot;&gt;Original&lt;/span&gt; Images from the testing set; comparison of distributions of errors&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;A numeric comparison is presented below:&lt;/p&gt;

&lt;table class=&quot;data-table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Original (Avg.)&lt;/th&gt;
      &lt;th&gt;Preprocessed (Avg.)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;CER&lt;/td&gt;
      &lt;td&gt;0.866&lt;/td&gt;
      &lt;td&gt;0.384&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WER&lt;/td&gt;
      &lt;td&gt;0.903&lt;/td&gt;
      &lt;td&gt;0.593&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;LCSE&lt;/td&gt;
      &lt;td&gt;48.834&lt;/td&gt;
      &lt;td&gt;24.987&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Precision&lt;/td&gt;
      &lt;td&gt;0.155&lt;/td&gt;
      &lt;td&gt;0.725&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Recall&lt;/td&gt;
      &lt;td&gt;0.172&lt;/td&gt;
      &lt;td&gt;0.734&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;F1 Score&lt;/td&gt;
      &lt;td&gt;0.163&lt;/td&gt;
      &lt;td&gt;0.729&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;takeaways&quot;&gt;Takeaways&lt;/h2&gt;

&lt;p&gt;Significant improvements can be observed through this preprocessing operation. Moreover, the majority of errors probably do not occur in the &lt;em&gt;sequence to sequence&lt;/em&gt; classifier (since all the recognized characters are erroneous and would contradict previous performance analysis). A page-segmentation issue when automatic mode is used seems more plausible. It is shown that an array of convolutions is sufficient, in this case, to decrease error rates substantially.&lt;/p&gt;

&lt;p&gt;The OCR performance on the preprocessed images is overall better but not good enough to be reliable. A 38% character error rate is still a large setback. I’m pretty sure that better recognitions can be obtained with more fine-tuning, a more complex architecture for the convolutional preprocessor and a more diverse training set. However, the current implementation is already very slow to train which makes me question if the entire methodology is feasible from this point of view.&lt;/p&gt;

&lt;h2 id=&quot;cite&quot;&gt;Cite&lt;/h2&gt;

&lt;p&gt;If you found this relevant to your work, you can cite the article using:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;@article{sporici2020improving,
  title={Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing},
  author={Sporici, Dan and Cușnir, Elena and Boiangiu, Costin-Anton},
  journal={Symmetry},
  volume={12},
  number={5},
  pages={715},
  year={2020},
  publisher={Multidisciplinary Digital Publishing Institute}
}
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

</description>
        <pubDate>Sun, 07 Jun 2020 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/improving-tesseract-4-ocr-accuracy-through-image-preprocessing</link>
        <guid isPermaLink="true">https://codingvision.net/improving-tesseract-4-ocr-accuracy-through-image-preprocessing</guid>
        
        <category>ocr</category>
        
        <category>pytorch</category>
        
        <category>python</category>
        
        <category>research</category>
        
        <category>tesseract</category>
        
        <category>conv-neural-network</category>
        
        <category>reinforcement-learning</category>
        
        <category>unsupervised-learning</category>
        
        
      </item>
    
      <item>
        <title>PyTorch Iterative FGVM: Targeted Adversarial Samples for Traffic-Sign Recognition</title>
        <description>&lt;p&gt;Inspired by the progress of driverless cars and by the fact that this subject is not thoroughly discussed I decided to give it a shot at creating smooth &lt;strong&gt;targeted&lt;/strong&gt; adversarial samples that are interpreted as legit traffic signs with a high confidence by a PyTorch Convolutional Neural Network (&lt;strong&gt;CNN&lt;/strong&gt;) classifier trained on the &lt;a href=&quot;http://benchmark.ini.rub.de/?section=gtsrb&amp;amp;subsection=dataset&quot; rel=&quot;nofollow&quot;&gt;GTSRB&lt;/a&gt; dataset.&lt;/p&gt;

&lt;p&gt;I’ll be using the &lt;em&gt;Fast Gradient Value Method&lt;/em&gt; (&lt;strong&gt;FGVM&lt;/strong&gt;) in an iterative manner - which is also called the &lt;em&gt;Basic Iterative Method&lt;/em&gt; (BIM). I noticed that most articles only present PyTorch code for non-targeted &lt;em&gt;Fast Gradient Sign Method&lt;/em&gt; (&lt;strong&gt;FGSM&lt;/strong&gt;) - which performs well in evading classifiers but is, in my opinion, somehow limited.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-iterative-fgvm-targeted-adversarial-samples-traffic-sign-recognition/fgvm-gtsrb-adversarial-sample.png&quot; alt=&quot;Smooth targeted adversarial sample generated using the current implementation, being misclassified as a 'Stop' sign.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Smooth targeted adversarial sample generated using the current implementation, being misclassified as a ‘Stop’ sign.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h5 id=&quot;ill-try-to-discuss-in-this-article-only-the-important-aspects-of-this-problem-however-i-also-prepared-a-google-colab-notebook-which-includes-complete-source-code-and-results&quot;&gt;I’ll try to discuss in this article only the important aspects of this problem. However, I also prepared a &lt;a href=&quot;https://colab.research.google.com/drive/1CndPD5ZsW022qO1xgEAWbmcXJwkJKBAX&quot; rel=&quot;nofollow&quot;&gt;Google Colab Notebook&lt;/a&gt; which includes complete source code and results.&lt;/h5&gt;

&lt;h2 id=&quot;targeted-network&quot;&gt;Targeted Network&lt;/h2&gt;

&lt;p&gt;For this experiment, I’ve constructed a basic &lt;strong&gt;LeNet5&lt;/strong&gt; inspired CNN in PyTorch. It performs 2 convolutions of size 5x5 on 32x32 grayscale images, separated by max-pooling. The dataset is slightly unbalanced, but this was compensated for during the training process.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-iterative-fgvm-targeted-adversarial-samples-traffic-sign-recognition/gtsrb-results.png&quot; alt=&quot;Results of the Traffic-Sign Recognition CNN on the GTSRB Test Dataset&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Results of the Traffic-Sign Recognition CNN on the GTSRB Test Dataset&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This network is represented using the following PyTorch snippet:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;LeNet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Module&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num_classes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;47&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;affine&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

      &lt;span class=&quot;nb&quot;&gt;super&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;affine&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;affine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstanceNorm2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;affine&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;affine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      
      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Linear&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Linear&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Linear&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num_classes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;


  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;forward&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_pool2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_pool2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fc3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The architecture is not optimal for the sake of simplicity; additionally, achieving state-of-the-art traffic-sign recognition is not in the scope of this article. Evaluation results on the GTSRB testing set are as follows:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Accuracy:&lt;/strong&gt; ~95%&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Precision:&lt;/strong&gt; ~93%&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Recall:&lt;/strong&gt; ~93%&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;targeted-adversarial-samples-with-iterative-fgvm&quot;&gt;Targeted Adversarial Samples with Iterative FGVM&lt;/h2&gt;

&lt;p&gt;When &lt;strong&gt;training&lt;/strong&gt; a neural network the focus is on optimizing parameters (i.e. weights) in order to minimize the &lt;strong&gt;loss&lt;/strong&gt; (e.g.: Mean Squared Error, Cross Entropy, etc.) between the &lt;strong&gt;current output&lt;/strong&gt; and &lt;strong&gt;desired output&lt;/strong&gt; while the inputs are fixed. This is done through &lt;a href=&quot;https://codingvision.net/numerical-methods/gradient-descent-simply-explained-with-example&quot;&gt;gradient descent&lt;/a&gt;. As an example, if a neural network models the function below, the \(w\) (weight) and \(b\) (bias) variables are adjusted during the training.&lt;/p&gt;

\[f(x) = w \cdot x + b\]

&lt;p&gt;When talking about targeted &lt;strong&gt;FGVM&lt;/strong&gt;, \(w\) and \(b\) are fixed and the input \(x\) is adjusted through &lt;strong&gt;gradient descent&lt;/strong&gt; (computed w.r.t. different variables, obviously). Usually this implies minimizing the error between the &lt;strong&gt;targeted adversarial output&lt;/strong&gt; and the &lt;strong&gt;current output&lt;/strong&gt; - basically shifting the current output towards the targeted output.&lt;/p&gt;

&lt;p&gt;Moreover, when the input is in image-format, additional constraints must be addressed:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;images (inputs) must be clamped between 0 and 1 (float representation)&lt;/li&gt;
  &lt;li&gt;images must be smooth in order to mitigate basic noise filtering mechanisms&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pytorch-generating-adversarial-samples&quot;&gt;PyTorch: Generating Adversarial Samples&lt;/h2&gt;

&lt;p&gt;The code I ended up with is posted below; further implementation details will also be presented.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;targeted_adversarial_class&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INV_TRAFFIC_SIGNS_LABELS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'stop'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;requires_grad_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; 

&lt;span class=&quot;c1&quot;&gt;# optimizer for the adversarial sample
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_optimizer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Adam&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1e-3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;adversarial_optimizer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;zero_grad&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;net&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  
  &lt;span class=&quot;c1&quot;&gt;# classification loss + 0.05 * image smoothing loss
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CrossEntropyLoss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;targeted_adversarial_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; \
          &lt;span class=&quot;mf&quot;&gt;0.05&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;functional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;functional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pad&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'reflect'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FloatTensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([[[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]]).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
  

  &lt;span class=&quot;c1&quot;&gt;# this is the predicted class number
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;predicted_class&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;detach&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;numpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# updates gradient and backpropagates errors to the input
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;backward&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;adversarial_optimizer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;step&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# ensuring that the image is valid
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;imshow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cmap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'gray'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Predicted:'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TRAFFIC_SIGNS_LABELS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predicted_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Loss:'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The current CNN is trained on 32x32 grayscale images so it makes sense to start with an adversarial sample of same size which consists of random noise distributed over one channel. It is also required to indicate through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requires_grad_()&lt;/code&gt; that this variable should be updated by Autograd.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;requires_grad_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; 
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, an optimizer is created that instead of tweaking weights will tweak the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;adversarial_sample&lt;/code&gt; defined above:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;adversarial_optimizer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Adam&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1e-3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The loss function is defined using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.nn.CrossEntropyLoss()&lt;/code&gt; - which is the same criterion used for training. In this example, I’ll try to create a sample that is classified as a stop sign (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;targeted_adversarial_class&lt;/code&gt;).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;targeted_adversarial_class&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INV_TRAFFIC_SIGNS_LABELS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'stop'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;net&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# classification loss
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CrossEntropyLoss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;targeted_adversarial_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This loss function does well in generating adversarial images but the results have a &lt;strong&gt;noisy&lt;/strong&gt; aspect (e.g., powerful contrasts between small groups of pixels) and might look suspicious. Since this noise can be easily removed using basic filtering, &lt;strong&gt;smooth&lt;/strong&gt; images are wanted.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/pytorch-iterative-fgvm-targeted-adversarial-samples-traffic-sign-recognition/fgvm-noisy-sample.png&quot; alt=&quot;Using only the `CrossEntropyLoss()` will most likely generate noisy adversarial samples&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Using only the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CrossEntropyLoss()&lt;/code&gt; will most likely generate noisy adversarial samples&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Defining a smooth-image constraint can be done by minimizing the &lt;strong&gt;Mean Squared Error&lt;/strong&gt; between &lt;strong&gt;adjacent&lt;/strong&gt; pixels. Think of it as applying an edge-detection filter and attempting to minimize the overall result. However, this has an impact on the efficiency of the generated sample as it adds dependencies between pixels. To minimize the loss of freedom, only the adjacent pixels from the bottom-right side are taken into account.
The following 3x3 &lt;strong&gt;convolution&lt;/strong&gt; kernel is used to determine the color difference between a pixel and its 3 other neighbors:&lt;/p&gt;

&lt;table class=&quot;data-table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;K&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;-3&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In PyTorch, I implemented the aforementioned method using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.nn.functional.conv2d()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.nn.functional.pad()&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c1&quot;&gt;# image smoothing loss
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;functional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;functional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pad&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'reflect'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FloatTensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([[[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]]).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, the image is clamped to create a valid float tensor using:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;adversarial_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Multiple iterations are required in order to properly optimize the input.&lt;/p&gt;

&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;FGVM proves reliable in crafting smooth targeted adversarial samples for basic classifiers implemented with CNNs. However, additional problems need to be addressed in order to become a feasible attack. The crafted sample must be picked up by the segmentation algorithm as a possible traffic sign in the detection phase. Next, the adversarial sample’s efficiency should not be impacted by small affine transformations (e.g., being shifted 3 pixels to the left) - this might be fixed through data augmentation. Additionally, factors such as brightness, contrast or various camera properties can still reduce the success rate of an adversarial sample.&lt;/p&gt;

&lt;p&gt;Finally, samples which are more resistant to uniformly distributed noise can be obtained by removing the image smoothing constraint.&lt;/p&gt;
</description>
        <pubDate>Thu, 30 Apr 2020 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/iterative-fgvm-targeted-adversarial-samples-traffic-sign-recognition</link>
        <guid isPermaLink="true">https://codingvision.net/iterative-fgvm-targeted-adversarial-samples-traffic-sign-recognition</guid>
        
        <category>pytorch</category>
        
        <category>python</category>
        
        <category>adversarial-machine-learning</category>
        
        <category>conv-neural-network</category>
        
        
      </item>
    
      <item>
        <title>RSA: Encrypt in .NET &amp; Decrypt in Python</title>
        <description>&lt;p&gt;So… one of my current projects required the following actions: asymmetrically &lt;strong&gt;encrypt&lt;/strong&gt; a string in &lt;strong&gt;.NET&lt;/strong&gt; using a public key and &lt;strong&gt;decrypt&lt;/strong&gt; it in a &lt;strong&gt;python&lt;/strong&gt; script using a private key.&lt;/p&gt;

&lt;p&gt;The problem that I’ve encountered was that, apparently, I couldn’t achieve compatibility between the two exposed classes: &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.rsacryptoserviceprovider?view=netframework-4.8&quot; rel=&quot;nofollow&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RSACryptoServiceProvider&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://pycryptodome.readthedocs.io/en/latest/src/cipher/pkcs1_v1_5.html&quot; rel=&quot;nofollow&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PKCS1_v1_5&lt;/code&gt;&lt;/a&gt;. To be more specific, the python script couldn’t decrypt the ciphertext even though proper configurations were made and the provided keys were compatible. Additionally, separate encryption-decryption actions worked inside .NET and python but not in-between them.&lt;/p&gt;

&lt;p&gt;I wasn’t able to find too much information about this specific problem in the &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.rsaparameters?view=netframework-4.8&quot; rel=&quot;nofollow&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RSAParameters&lt;/code&gt;&lt;/a&gt; documentation, hence this post.&lt;/p&gt;

&lt;h2 id=&quot;solution&quot;&gt;Solution&lt;/h2&gt;

&lt;p&gt;Alright, the issue seems to be caused by a difference in &lt;strong&gt;endianness&lt;/strong&gt; between the two classes, when the RSA parameters are provided. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PKCS1_v1_5&lt;/code&gt; uses &lt;strong&gt;little endian&lt;/strong&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RSACryptoServiceProvider&lt;/code&gt; prefers &lt;strong&gt;big endian&lt;/strong&gt;. In my case, this made the encryption method use a different key than the one I though I specified. Nevertheless, it was more fun to debug because of PKCS which always ensured different ciphertexts.&lt;/p&gt;

&lt;p&gt;I fixed this by &lt;strong&gt;base64&lt;/strong&gt;-encoding the &lt;strong&gt;exponent&lt;/strong&gt; and &lt;strong&gt;modulus&lt;/strong&gt; in &lt;strong&gt;big-endian&lt;/strong&gt; format (in python) and then loading them with &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.rsa.fromxmlstring?view=netframework-4.8&quot; rel=&quot;nofollow&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RSACryptoServiceProvider.FromXmlString()&lt;/code&gt;&lt;/a&gt; (in .NET).&lt;/p&gt;

&lt;h2 id=&quot;working-example&quot;&gt;Working Example&lt;/h2&gt;

&lt;p&gt;I hardcoded the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(N, E, D)&lt;/code&gt; parameters for a private key in python and exported the &lt;strong&gt;exponent&lt;/strong&gt; and &lt;strong&gt;modulus&lt;/strong&gt; to be used later for encryption.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c1&quot;&gt;# custom base64 encoding
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;b64_enc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'big'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;base64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b64encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# fixed a set of keys for testing purposes
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;N&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26004126751443262055682011081007404548850063543219588539086190001742195632834884763548378850634989264309169823030784372770378521274048211537270851954737597964394738860810397764157069391719551179298507244962912383723776384386127059976543327113777072990654810746825378287761304202032439750301912045623786736128233730798303406858144431081065384988539277630625160727011582345942687126935423502995613920211095965452425548919926951203151483590222152446516520421379279591807660810550784744188433550335950652666201439521115515355539373928576162221297645781251953236644092963307595988040539993067709240004782161131243282208593&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;65537&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;D&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;844954574014654722486150458473919587206863455991060222377955072839922571984098861772377020041002939383041291761051853484512886782322743892284027026528735139923685801975918062144627908962369108081178131103781404720078456605432924519279933702927938064507063482999903002331319671303661755165294744970869186178561527578261522199503340027952798084625109041630166309505066404215223685733585467434168146932177924040219720383860880583466676764286302300281603021045351842170755190359364339936360197909582974922675680101321863304283607829144759777189360340512230537108705852116021758740440195445732631657876008160876867027543&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# construct pair of keys
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;private_key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RSA&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;construct&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;public_key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;private_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;publickey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# base64-encode parameters in big-endian format
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EXP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b64_enc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;public_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MODULUS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b64_enc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;public_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'EXP:'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'MODULUS:'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODULUS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Output:
# EXP: b'AQAB' MODULUS: b'zf4LgceVPvjMLz/pp8exH58AeBrhjLe0k4FRmd59I0k4sH6oug6Z9RfY4FvEFcssBwH1cmWF5/Zen8xbRVRyUnzer6b6cKmlzHFYf0LlbovvYMkW5pdhRcTHK2ijByGtmVgU/CEKEQTy3elpU7ZsHE8D6T1M7L2gmGAxvgldUMRu4l8BPuRyht1a9dA9b6005atpdlkCSc3emXSfyBOBwNE0UicVTVncn9SBjP7bTBGgOKshYnYsqh4BD0I7AU3xdoAsZVWudECX/zVa7uUOk1ooVYjMEyfBngrEDXrmIkAlVruUuj/eWiYwT2vXqByQgDfDvat5IS4i3ywiHAWXUQ=='
&lt;/span&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In &lt;strong&gt;.NET&lt;/strong&gt; (I used &lt;strong&gt;C#&lt;/strong&gt;), there will be something like this:&lt;/p&gt;
&lt;div class=&quot;language-csharp highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;System.Security.Cryptography&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;System.Text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RSACryptoApp&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// parameters from the python script (public key)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;readonly&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXP&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;AQAB&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;readonly&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODULUS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;zf4LgceVPvjMLz/pp8exH58AeBrhjLe0k4FRmd59I0k4sH6oug6Z9RfY4FvEFcssBwH1cmWF5/Zen8xbRVRyUnzer6b6cKmlzHFYf0LlbovvYMkW5pdhRcTHK2ijByGtmVgU/CEKEQTy3elpU7ZsHE8D6T1M7L2gmGAxvgldUMRu4l8BPuRyht1a9dA9b6005atpdlkCSc3emXSfyBOBwNE0UicVTVncn9SBjP7bTBGgOKshYnYsqh4BD0I7AU3xdoAsZVWudECX/zVa7uUOk1ooVYjMEyfBngrEDXrmIkAlVruUuj/eWiYwT2vXqByQgDfDvat5IS4i3ywiHAWXUQ==&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;RSACryptoServiceProvider&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csp&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RSACryptoServiceProvider&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;2048&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;csp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;FromXmlString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&amp;lt;RSAKeyValue&amp;gt;&amp;lt;Exponent&amp;gt;&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXP&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&amp;lt;/Exponent&amp;gt;&amp;lt;Modulus&amp;gt;&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODULUS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&amp;lt;/Modulus&amp;gt;&amp;lt;/RSAKeyValue&amp;gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

       &lt;span class=&quot;c1&quot;&gt;// encrypting a string for testing purposes&lt;/span&gt;
       &lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plainText&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Encoding&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ASCII&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;GetBytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hello from .NET&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
       &lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cipherText&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Encrypt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plainText&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

       &lt;span class=&quot;n&quot;&gt;Console&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;WriteLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Encrypted: &quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Convert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;ToBase64String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cipherText&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

       &lt;span class=&quot;c1&quot;&gt;// Output:&lt;/span&gt;
       &lt;span class=&quot;c1&quot;&gt;// Encrypted: F/agXpfSrs7HSXZz+jVq5no/xyQDXuOiVAG/MOY7WzSlp14vMOTM8TshFiWtegB3+2BZCMOEPLQFFFbxusuCFOYGGJ8yRaV7q985z/UDJVXvbX5ANYqrirobR+c868mY4V33loAt2ZFNXwr+Ubk11my1aJgHmoBem/6yPfoRd9GrZaSQnbJRSa3EDtP+8pXETkF9B98E7KvElrsRTLXEXSBygmeKsyENo5DDcARW+lVVsQuP8wUEGnth9SX4oG8i++gmQKkrv0ep6yFrn05xZJKgpOfRiTTo/Bkh7FxNP2wo7utzhtYkNnvtXaJPWAvqXg93KmNPqg1IsN4P1Swb8w==&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Back to the &lt;strong&gt;python&lt;/strong&gt; script:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;cipher&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PKCS1_v1_5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;private_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;random_generator&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sentinel&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random_generator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cipher_text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'F/agXpfSrs7HSXZz+jVq5no/xyQDXuOiVAG/MOY7WzSlp14vMOTM8TshFiWtegB3+2BZCMOEPLQFFFbxusuCFOYGGJ8yRaV7q985z/UDJVXvbX5ANYqrirobR+c868mY4V33loAt2ZFNXwr+Ubk11my1aJgHmoBem/6yPfoRd9GrZaSQnbJRSa3EDtP+8pXETkF9B98E7KvElrsRTLXEXSBygmeKsyENo5DDcARW+lVVsQuP8wUEGnth9SX4oG8i++gmQKkrv0ep6yFrn05xZJKgpOfRiTTo/Bkh7FxNP2wo7utzhtYkNnvtXaJPWAvqXg93KmNPqg1IsN4P1Swb8w=='&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;plain_text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cipher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decrypt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;base64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b64decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cipher_text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'ASCII'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sentinel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Decrypted:'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plain_text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'ASCII'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Output:
# Decrypted: Hello from .NET
&lt;/span&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

</description>
        <pubDate>Mon, 06 Apr 2020 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/rsa-encrypt-in-net-decrypt-in-python</link>
        <guid isPermaLink="true">https://codingvision.net/rsa-encrypt-in-net-decrypt-in-python</guid>
        
        <category>c-sharp</category>
        
        <category>python</category>
        
        <category>rsa</category>
        
        <category>encryption</category>
        
        
      </item>
    
      <item>
        <title>Avoid a Mistake: Correctly Calculate Multiclass Accuracy</title>
        <description>&lt;p&gt;Today I held a short laboratory which tackled different metrics used in evaluating classifiers. One of the tasks required that, given the performances of 2 classifiers as &lt;strong&gt;confusion matrices&lt;/strong&gt;, the students will calculate the &lt;strong&gt;accuracy&lt;/strong&gt; of the 2 models. One model was a &lt;strong&gt;binary classifier&lt;/strong&gt; and the other was a &lt;strong&gt;multiclass classifier&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Many students resorted to googling for an &lt;strong&gt;accuracy formula&lt;/strong&gt; which returned the following function:&lt;/p&gt;

\[{\color{Red}{ACC = \frac{TP + TN}{TP + TN + FP +FN}}}\]

&lt;p&gt;Then, they calculated a &lt;strong&gt;‘per-class’ accuracy&lt;/strong&gt; (for class \(i\), they had \(ACC_i\)) and &lt;strong&gt;macro-averaged&lt;/strong&gt; the results like below:&lt;/p&gt;

\[ACC = \frac{\sum_{i=1}^{i=N}{ACC_i}}{N}\]

&lt;p&gt;To their surprise, the resulted accuracy for the &lt;strong&gt;multiclass classifier&lt;/strong&gt; was &lt;strong&gt;erroneous&lt;/strong&gt; and highly different (when compared to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;accuracy_score()&lt;/code&gt; from &lt;strong&gt;sklearn&lt;/strong&gt;). However, the accuracy of the &lt;strong&gt;binary classifier&lt;/strong&gt; was correct.&lt;/p&gt;

&lt;p&gt;As there wasn’t much time available, I told them to use the following &lt;strong&gt;accuracy formula&lt;/strong&gt; to match the results of &lt;strong&gt;sklearn&lt;/strong&gt; and I’ll send an explanation later:&lt;/p&gt;

\[{\color{Green}{ACC = \frac{\sum_{i=1}^{i=N}{TP_i}}{\sum_{i = 1}^{i=N}{(TP_i + FP_i)}}}}\]

&lt;p&gt;Some of you might recognize this as &lt;strong&gt;micro-averaged precision&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The purpose of this article is to serve as a list of DO’s and DONT’s so we can avoid such mistakes in the future.&lt;/p&gt;

&lt;h2 id=&quot;what-was-wrong&quot;&gt;What was wrong?&lt;/h2&gt;

&lt;p&gt;Basically, you’re prone to get invalid results if you &lt;strong&gt;average&lt;/strong&gt; accuracy values in an attempt to obtain the &lt;strong&gt;global accuracy&lt;/strong&gt;. But… even if you directly calculate the &lt;strong&gt;global accuracy&lt;/strong&gt; using the &lt;span style=&quot;color:red&quot;&gt;above formula&lt;/span&gt;, you’d get skewed values.&lt;/p&gt;

&lt;p&gt;Take a look at the following classifier, described using a &lt;strong&gt;confusion matrix&lt;/strong&gt;:&lt;/p&gt;

&lt;table class=&quot;data-table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;\&lt;/th&gt;
      &lt;th&gt;Class #0&lt;/th&gt;
      &lt;th&gt;Class #1&lt;/th&gt;
      &lt;th&gt;Class #2&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Class #0&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Class #1&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Class #2&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You’ll notice that \(TP = 0\) thus the classifier is doing a really bad job.&lt;/p&gt;

&lt;p&gt;If we follow the students’ approach and calculate the &lt;strong&gt;‘per-class’ accuracy&lt;/strong&gt; (let’s say &lt;strong&gt;Class #0&lt;/strong&gt;), we have:&lt;/p&gt;

\[TP_0 = 0, TN_0 = 200, FP_0 = 200, FN_0 = 200\]

\[\color{Red}{ACC_0 = \frac{0 + 200}{0+200+200+200} = 0.333(3)}\]

&lt;p&gt;This already looks suspicious. You’ll get the same results for the other 2 classes, so… on average, \(\color{Red}{ACC = 0.333(3)}\).
This is definitely wrong.&lt;/p&gt;

&lt;p&gt;If you directly compute &lt;strong&gt;global accuracy&lt;/strong&gt; using the &lt;span style=&quot;color:red&quot;&gt;same formula&lt;/span&gt; (summing all \(TP's\), \(TN's\), …), you get the same result because of the symmetry. This happens mainly because of the \(TN\) in the numerator which grows faster than any other term. In other words, as the number of classes grows, this error grows as well; a similar model, but with &lt;strong&gt;4 classes&lt;/strong&gt;, gets a &lt;strong&gt;0.5&lt;/strong&gt; accuracy.&lt;/p&gt;

&lt;p&gt;Using the &lt;span style=&quot;color:green&quot;&gt;second formula&lt;/span&gt;, the &lt;strong&gt;global accuracy&lt;/strong&gt; becomes:&lt;/p&gt;

\[\color{Green}{ACC = \frac{0+0+0}{(0+200) + (0+200) + (0 + 200)} = 0}\]

&lt;p&gt;Which yields, indeed, a better result. Moreover, it generates the same results as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;accuracy_score()&lt;/code&gt; from &lt;strong&gt;sklearn&lt;/strong&gt;, given more diverse confusion matrices.&lt;/p&gt;

&lt;h5 id=&quot;if-you-compute-per-class-accuracies-using-the-second-formula-and-average-the-values-youre-basically-getting-a-macro-averaged-precision-point-is-thats-not-accuracy---so-dont-do-that&quot;&gt;If you compute &lt;strong&gt;‘per class’ accuracies&lt;/strong&gt; using the &lt;span style=&quot;color:green&quot;&gt;second formula&lt;/span&gt; and average the values, you’re basically getting a &lt;strong&gt;macro-averaged precision&lt;/strong&gt;. Point is, that’s not &lt;strong&gt;accuracy&lt;/strong&gt; - so don’t do that.&lt;/h5&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I’d recommend avoiding:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;the idea of calculating a &lt;strong&gt;global accuracy&lt;/strong&gt; by averaging &lt;strong&gt;‘per-class’ accuracies&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;the &lt;span style=&quot;color:red&quot;&gt;red formula&lt;/span&gt;, which includes \(TN\), since the &lt;span style=&quot;color:green&quot;&gt;other one&lt;/span&gt; returns correct values for any number of classes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, you can compute &lt;strong&gt;precision&lt;/strong&gt;, &lt;strong&gt;recall&lt;/strong&gt;, &lt;strong&gt;F1&lt;/strong&gt; in a ‘per-class’ manner. But I’m not so sure it also works with the &lt;strong&gt;accuracy&lt;/strong&gt;.&lt;/p&gt;

</description>
        <pubDate>Tue, 10 Dec 2019 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/avoid-a-mistake-correctly-calculate-multiclass-accuracy</link>
        <guid isPermaLink="true">https://codingvision.net/avoid-a-mistake-correctly-calculate-multiclass-accuracy</guid>
        
        <category>sklearn</category>
        
        <category>python</category>
        
        <category>metric</category>
        
        
      </item>
    
      <item>
        <title>C# Predict the Random Number Generator of .NET</title>
        <description>&lt;p&gt;This post targets to underline the &lt;strong&gt;predictability&lt;/strong&gt; of the random… or better said &lt;strong&gt;pseudo-random number generator&lt;/strong&gt; (PRNG) exposed by the &lt;strong&gt;.NET&lt;/strong&gt; framework (aka the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Random()&lt;/code&gt; class), under certain assumptions. Because of the nature of the implementation, &lt;strong&gt;100% accuracy&lt;/strong&gt; can be obtained with a fairly simple idea and a rather short code snippet.&lt;/p&gt;

&lt;h5 id=&quot;the-presented-method-definitely-isnt-something-new-in-the-domain-of-cryptography-however-the-purpose-of-the-article-is-to-bring-awareness-about-this-specific-weakness&quot;&gt;The presented method definitely isn’t something new in the domain of cryptography, however the purpose of the article is to bring awareness about this specific weakness.&lt;/h5&gt;

&lt;p&gt;The following scenario is considered:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;no access&lt;/strong&gt; to the &lt;strong&gt;process’s memory&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;must work for &lt;strong&gt;any chosen seed&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;a limited set of generated &lt;strong&gt;random numbers&lt;/strong&gt; is &lt;strong&gt;visible&lt;/strong&gt; to the attacker&lt;/li&gt;
  &lt;li&gt;we focus on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Random.nextDouble()&lt;/code&gt; as there is no data loss because &lt;strong&gt;int casting&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ll be presenting a short summary of the algorithm used by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Random()&lt;/code&gt; and how can we predict the random numbers. If you feel like going directly to code, scroll down to the bottom of the article.&lt;/p&gt;

&lt;h2 id=&quot;the-random-class&quot;&gt;The Random class&lt;/h2&gt;

&lt;p&gt;While many pseudo-random implementations (e.g., libc’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rand()&lt;/code&gt;) rely on a &lt;a href=&quot;https://en.wikipedia.org/wiki/Linear_congruential_generator&quot; rel=&quot;nofollow&quot;&gt;Linear Congruential Generator (LCG)&lt;/a&gt; which generates each number in the sequence by taking into account the previous one, I discovered that &lt;strong&gt;.NET&lt;/strong&gt;’s &lt;strong&gt;random number generator&lt;/strong&gt; uses a different approach.&lt;/p&gt;

&lt;p&gt;By looking at the implementation of the &lt;a href=&quot;https://referencesource.microsoft.com/#mscorlib/system/random.cs&quot; rel=&quot;nofollow&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Random()&lt;/code&gt;&lt;/a&gt; class, one can easily observe that pseudo-random number generation is based on a &lt;a href=&quot;https://rosettacode.org/wiki/Subtractive_generator&quot; rel=&quot;nofollow&quot;&gt;Subtractive Generator&lt;/a&gt;, which permits the user to specify a custom seed or use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Environment.TickCount&lt;/code&gt; (system’s uptime in milliseconds) as default.&lt;/p&gt;

&lt;p&gt;The core of the pseudo-random generator is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InternalSample()&lt;/code&gt; (line #100) method which constructs the sequence of numbers. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Random.nextDouble()&lt;/code&gt; will actually call the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Sample()&lt;/code&gt; method which returns the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InternalSample()&lt;/code&gt; divided by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Int32.MaxValue&lt;/code&gt;, as this is claimed to improve the distribution of random numbers.
Without going into much details regarding the included gimmicks, we can describe the generator as follows:&lt;/p&gt;

\[R_i = R_i - R_j, j=i+21\]

\[R_i = \left\{\begin{matrix}
R_i - 1, if (R_i = Int32.Max)\\ 
R_i, else
\end{matrix}\right.\]

\[R_i = \left\{\begin{matrix}
R_i + Int32.Max, if (R_i &amp;lt; 0)\\ 
R_i, if (R_i \geqslant 0)
\end{matrix}\right.\]

\[retVal = \frac{R_i}{Int32.Max}\]

&lt;p&gt;where \(R_i\) contributes to describing the state of the algorithm and \(retVal\) is, obviously, the returned value.&lt;/p&gt;

&lt;p&gt;To store the state of the pseudo-random number generator, a &lt;strong&gt;circular array&lt;/strong&gt; of &lt;strong&gt;56 ints&lt;/strong&gt; is employed - this means \(i\) and \(j\) will get re-initialized to &lt;strong&gt;1&lt;/strong&gt; whenever they exceed the length of the array - however the &lt;strong&gt;offset&lt;/strong&gt; of &lt;strong&gt;21&lt;/strong&gt; remains constant.&lt;/p&gt;

&lt;h2 id=&quot;predicting-random-numbers&quot;&gt;Predicting Random Numbers&lt;/h2&gt;

&lt;p&gt;In my opinion, it seems rather difficult to determine the starting state of the algorithm without knowing the seed. But… we notice that the algorithm is outputting pseudo-random numbers which properly describe each value of its state array.&lt;/p&gt;

&lt;p&gt;In other words, if we have access to a randomly generated number \(retVal\), we can compute \(R_i\) and \(R_i\) is used to generate future states &amp;amp; numbers in the sequence. However, we will need values for \(i = 1,55\) in order to cover all the properties.&lt;/p&gt;

&lt;h5 id=&quot;if-we-manage-to-leak-a-continuous-set-of-55-generated-numbers-we-have-enough-information-to-describe-and-construct-a-new-generator-by-providing-a-circular-array-of-states-which-will-output-the-same-numbers-as-the-original-but-can-be-used-as-a-predictor&quot;&gt;If we manage to leak a continuous set of &lt;strong&gt;55&lt;/strong&gt; generated numbers, we have enough information to describe and construct a new generator (by providing a circular array of states) which will output the same numbers as the original but can be used as a predictor.&lt;/h5&gt;

&lt;p&gt;In my implementation, I’m using the following trick to simplify the things: I don’t convert the leaked \(retVal\) back to \(R_i\) (by multiplying with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Int32.MaxValue&lt;/code&gt;) because I’ll have to divide it again to compare the results. So I’m working directly with differences of leaked values (instead of differences of \(R_i\)’s) – I hope it makes sense.&lt;/p&gt;

&lt;p&gt;Here’s the code I used, it should help clear things up.&lt;/p&gt;

&lt;div class=&quot;language-csharp highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Program&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* predicts random numbers, given 2 state descriptors */&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;computeDiffAndOffset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		
		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MaxValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=-&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;/(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MaxValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
	
	&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;cm&quot;&gt;/* this we break */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;Random&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
		
		&lt;span class=&quot;cm&quot;&gt;/* describes the state of the subtractive generator */&lt;/span&gt;
		&lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SeedArray&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;56&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
		
		&lt;span class=&quot;cm&quot;&gt;/* leaking the state by observing the first 55 random numbers */&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;56&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;++)&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;SeedArray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;NextDouble&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
		
		&lt;span class=&quot;cm&quot;&gt;/* the offset is known from the original implementation */&lt;/span&gt;
		&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		
		&lt;span class=&quot;cm&quot;&gt;/* from the theory part: i = index1, j = index2 */&lt;/span&gt;
		&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		
		&lt;span class=&quot;cm&quot;&gt;/* running a few tests */&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;++)&lt;/span&gt;
		&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;cm&quot;&gt;/* handling the circular array limits */&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;56&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
			
			&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;56&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;index2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
			
			&lt;span class=&quot;cm&quot;&gt;/* this is the predicted random number */&lt;/span&gt;
			&lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predictedValue&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;computeDiffAndOffset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SeedArray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SeedArray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;

			&lt;span class=&quot;cm&quot;&gt;/* this is the correct random number */&lt;/span&gt;
			&lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correctRandom&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;NextDouble&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
			
			&lt;span class=&quot;cm&quot;&gt;/* we compare them as doubles */&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Abs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predictedValue&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correctRandom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;0.00001&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
				&lt;span class=&quot;k&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Failed at {0} vs {1}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predictedValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correctRandom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
			
			&lt;span class=&quot;cm&quot;&gt;/* printing the results */&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;Console&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;WriteLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Predicted: &quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predictedValue&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; | Correct: &quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correctRandom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

			&lt;span class=&quot;cm&quot;&gt;/* updating the state of the generator */&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;SeedArray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predictedValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
			
			&lt;span class=&quot;n&quot;&gt;index1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;++;&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;index2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;++;&lt;/span&gt;
		&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should get something like this when running it (well, different numbers because you’ll have a different seed - but you get the point). Tested it on &lt;strong&gt;.NET 4.7.2&lt;/strong&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;Predicted: 0.562743733899083 | Correct: 0.562743733899083
Predicted: 0.0782367256834342 | Correct: 0.0782367256834343
Predicted: 0.48149561019684 | Correct: 0.48149561019684
Predicted: 0.768610569075034 | Correct: 0.768610569075034
Predicted: 0.288163338456379 | Correct: 0.288163338456379
Predicted: 0.652038850659523 | Correct: 0.652038850659523
Predicted: 0.331446861071254 | Correct: 0.331446861071255
Predicted: 0.573066327056413 | Correct: 0.573066327056413
[...]
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;Definitely don’t use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Random()&lt;/code&gt; for cryptographic functions. Bad idea.
However, limiting the information provided to the adversary (i.e. hiding the randomly generated numbers) would greatly diminish the effectiveness of this attack.&lt;/p&gt;

&lt;p&gt;Not much else to be said. It’s my first take at breaking something which is not an LCG - it might not be state-of-the-art level (performance-wise) but I hope you found this informative.&lt;/p&gt;
</description>
        <pubDate>Fri, 06 Dec 2019 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/c-predict-random-number-generator-net</link>
        <guid isPermaLink="true">https://codingvision.net/c-predict-random-number-generator-net</guid>
        
        <category>c-sharp</category>
        
        <category>prng</category>
        
        <category>exploit</category>
        
        
      </item>
    
      <item>
        <title>Evaluating the Robustness of OCR Systems</title>
        <description>&lt;p&gt;In this article, I’m going to discuss about my Bachelor’s degree final project, which is about evaluating the robustness of &lt;strong&gt;OCR systems&lt;/strong&gt; (such as &lt;strong&gt;Tesseract&lt;/strong&gt; or &lt;strong&gt;Google’s Cloud Vision&lt;/strong&gt;) when adversarial samples are presented as inputs. It’s somewhere in-between &lt;strong&gt;fuzzing&lt;/strong&gt; and &lt;strong&gt;adversarial samples crafting&lt;/strong&gt;, on a black box, the main objective being the creation of &lt;strong&gt;OCR-proof&lt;/strong&gt; images, with minimal amounts of noise.&lt;/p&gt;

&lt;p&gt;It’s an old project that I recently presented at an &lt;a href=&quot;https://spritz.math.unipd.it/events/2019/PIU2019/PagesOutput/SSS/index.html&quot; rel=&quot;nofollow&quot;&gt;International Security Summer School&lt;/a&gt; hosted by the University of Padua. I decided to also publish it here mainly because of the positive feedback received when presented at the summer school.&lt;/p&gt;

&lt;p&gt;I’ll try to focus on methodology and results, which I consider being of interest, without diving into implementation details.&lt;/p&gt;

&lt;h5 id=&quot;i-published-this-1-year-ago---not-sure-if-it-still-works-as-described-here-hopefully-it-does-but-im-pretty-sure-google-made-changes-to-the-vision-engine-since-then&quot;&gt;I published this ~1 year ago - not sure if it still works as described here. Hopefully it does, but I’m pretty sure Google made changes to the Vision engine since then.&lt;/h5&gt;

&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;

&lt;p&gt;Let’s start with what I considered to be plausible use cases for this project and what problems it would be able to solve.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Confidentiality&lt;/strong&gt; of text included in images? – It is no surprise to us that large services (that’s you, Google) will scan hosted images for texts in order to improve classification or extract user information. We might want some of that information to remain private.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Smart &lt;strong&gt;CAPTCHA&lt;/strong&gt;? – This aims to improve the efficiency of CAPTCHAs by creating images which are easier to read by humans, thus reducing the discomfort, while also rendering OCR-based bots ineffective.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Defense against &lt;strong&gt;content generators&lt;/strong&gt;? – This could serve as a defense mechanism against programs which scan documents and republish content (sometimes using different names) in order to gain undeserved merits.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;challenges&quot;&gt;Challenges&lt;/h2&gt;

&lt;p&gt;Now, let’s focus on the different constraints and challenges:&lt;/p&gt;

&lt;h3 id=&quot;1-complex--closed-source-architecture&quot;&gt;1. Complex / closed-source architecture&lt;/h3&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/tess-pipeline.png&quot; alt=&quot;Tesseract's pipeline as [presented at DAS 2016](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf){:rel='nofollow'}&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Tesseract’s pipeline as &lt;a href=&quot;https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf&quot; rel=&quot;nofollow&quot;&gt;presented at DAS 2016&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Modern OCR systems are more complex than basic convolutional neural networks as they need to perform multiple actions (e.g.: deskewing, layout detection, text rows segmentation), therefore finding ways to correctly compute gradients is a daunting task. Moreover, many of them do not provide access to source code thus making it difficult to use techniques such as &lt;strong&gt;FGSM&lt;/strong&gt; or &lt;strong&gt;GAN&lt;/strong&gt;s.&lt;/p&gt;

&lt;h3 id=&quot;2-binarization&quot;&gt;2. Binarization&lt;/h3&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/binarization.png&quot; alt=&quot;Result of the binarization procedure, using an adaptive threshold&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Result of the binarization procedure, using an adaptive threshold&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;An OCR system usually applies a binarization procedure (e.g.: &lt;strong&gt;Otsu&lt;/strong&gt;’s method) to the image before running it through the main classifier in order to separate the text from the background, the ideal output being pure black text on a clean white background.&lt;/p&gt;

&lt;p&gt;This proves troublesome because it restricts the samples generator from altering pixels using small values: as an example, converting a black pixel to a grayish color will be reverted in the binarization process thus generating no feedback.&lt;/p&gt;

&lt;h3 id=&quot;3-adaptive-classification&quot;&gt;3. Adaptive classification&lt;/h3&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/adaptive-classifier.png&quot; alt=&quot;Tesseract's adaptive classifier incorrectly recognizes an 'h' as a 'b', in the first image. In the second sample, Tesseract observes a correct 'h' character (confidence is larger than a threshold) adjusts the classifier's configuration and correctly classifies the first 'h'&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Tesseract’s adaptive classifier incorrectly recognizes an ‘h’ as a ‘b’, in the first image. In the second sample, Tesseract observes a correct ‘h’ character (confidence is larger than a threshold) adjusts the classifier’s configuration and correctly classifies the first ‘h’&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This is specific to Tesseract, which is rather deprecated nowadays - still very popular, though. Modern classifiers might be using this method, too. It consists of performing 2 iterations over the same input image. In the first pass, characters which can be recognized with a certain confidence are selected and used as temporary training data. In the second pass, the OCR attempts to classify characters which were not recognized in the first iteration, but using what it previously learned.&lt;/p&gt;

&lt;p&gt;Considering this, having an adversarial generator which alters one character at a time might not work as expected since that character might appear later in the image.&lt;/p&gt;

&lt;h3 id=&quot;4-lower-entropy&quot;&gt;4. Lower entropy&lt;/h3&gt;

&lt;p&gt;This refers to the fact that the input data is rather ‘limited’ for an OCR system when compared to… let’s say object recognition. As an example, images which contain 3D objects have larger variance than those which contain characters since the characters have a rather fixed shape and format. This should make it more difficult to create adversarial samples for character classifiers without applying distortions.&lt;/p&gt;

&lt;p&gt;A direct consequence is that it greatly restricts the amount of noise that can be added to an image so that the readability is preserved.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/noise-readability.png&quot; alt=&quot;Applying noise in an image usually decreases readability, which is not what we want here&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Applying noise in an image usually decreases readability, which is not what we want here&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;5-dictionaries&quot;&gt;5. Dictionaries&lt;/h3&gt;

&lt;p&gt;OCR systems will attempt to improve their accuracy by employing dictionaries with predefined words. Altering a single character in a word (i.e.: the incremental approach) might not be effective in this case.&lt;/p&gt;

&lt;h2 id=&quot;targeted-ocr-systems&quot;&gt;Targeted OCR Systems&lt;/h2&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/tesseract-gocr.png&quot; alt=&quot;Tested locally on Tesseract 4.0 and remotely on Google's Cloud Vision OCR&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Tested locally on Tesseract 4.0 and remotely on Google’s Cloud Vision OCR&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;For this project, I used &lt;strong&gt;Tesseract 4.0&lt;/strong&gt; for prototyping and testing, as it had no timing restrictions and allowed me to run a fast, parallel model with high throughput so I could test if the implementation works as expected. Later, I moved to &lt;strong&gt;Google’s Cloud Vision OCR&lt;/strong&gt; and tried some ‘remote’ fuzzing through the API.&lt;/p&gt;

&lt;h2 id=&quot;methodology&quot;&gt;Methodology&lt;/h2&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/architecture.png&quot; alt=&quot;A rather simplified view of the flow; a feedback-based adversarial samples generator (in image: obfuscator) alters inputs in order to maximize the error of the OCR system&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;A rather simplified view of the flow; a feedback-based adversarial samples generator (in image: obfuscator) alters inputs in order to maximize the error of the OCR system&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In order to be able to cover even black box cases, I used a &lt;strong&gt;genetic algorithm&lt;/strong&gt; guided by the feedback of the targeted OCR system. We observe that the confidence of the classifier, alone, is not a good metric for this problem, a score function based on the &lt;a href=&quot;https://en.wikipedia.org/wiki/Levenshtein_distance&quot; rel=&quot;nofollow&quot;&gt;Levenshtein distance&lt;/a&gt; and the &lt;strong&gt;amount of noise&lt;/strong&gt; is employed.&lt;/p&gt;

&lt;p&gt;One of the main problems here was the size of the search space which was partially solved by identifying regions of interest in the image and focusing only on these. Also, lots of parameter tuning…&lt;/p&gt;

&lt;h2 id=&quot;noise-properties&quot;&gt;Noise properties&lt;/h2&gt;

&lt;p&gt;Given the constraints, the following properties of the noise model must be matched:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;high contrast&lt;/strong&gt; – so it bypasses the binarization process and generates feedback&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;low density&lt;/strong&gt; – in order to maintain readability by exploiting the natural &lt;strong&gt;low-filtering&lt;/strong&gt; capability of the human vision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Applying &lt;strong&gt;salt-and-pepper&lt;/strong&gt; noise in a smart manner will, hopefully, satisfy the constraints.&lt;/p&gt;

&lt;h2 id=&quot;working-modes&quot;&gt;Working modes&lt;/h2&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/modes.png&quot; alt=&quot;Different working modes for small and large characters, in order to preserve readability. Both managed to entirely hide the given text when tested on Tesseract 4.0&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Different working modes for small and large characters, in order to preserve readability. Both managed to entirely hide the given text when tested on Tesseract 4.0&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Initially, the algorithm worked using only &lt;strong&gt;overtext&lt;/strong&gt; mode, which applied noise in the rectangle which contained characters. However, this method is not the best choice for texts written using smaller characters mainly because there are less pixels that can be altered thus drastically lowering the readability even with minimal amounts of noise. For this special case, the decision to insert the noise in-between the text rows (&lt;strong&gt;artifacts&lt;/strong&gt;) was taken in order to preserve the original characters. Both methods presented similar success rates in hiding texts from the targeted OCR system.&lt;/p&gt;

&lt;p&gt;Just for fun, here’s what happens if the score function is inverted, which translates as “generate an image with as much noise as possible, but which can be read by OCR software”. Weird, but it’s still recognized…&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/inverted-function.png&quot; alt=&quot;Tesseract recognized the original text with **no errors**. How about you?&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Tesseract recognized the original text with &lt;strong&gt;no errors&lt;/strong&gt;. How about you?&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;results-on-tesseract&quot;&gt;Results on Tesseract&lt;/h2&gt;

&lt;p&gt;Promising results were achieved while testing against Tesseract 4.0. In the following figure is presented an early (non-final) sample in which the word “&lt;strong&gt;Random&lt;/strong&gt;” is not recognized by Tesseract:&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/tess-results-ui.png&quot; alt=&quot;The first word is successfully hidden from the OCR system&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;The first word is successfully hidden from the OCR system&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;tests-on-googles-cloud-vision-platform&quot;&gt;Tests on Google’s Cloud Vision Platform&lt;/h2&gt;

&lt;p&gt;This is where things get interesting.&lt;/p&gt;

&lt;h5 id=&quot;the-implemented-score-function-can-be-maximized-in-2-ways-hiding-characters-or-tricking-the-ocr-engine-into-adding-characters-which-shouldnt-be-there&quot;&gt;The implemented score function can be maximized in 2 ways: hiding characters or tricking the OCR engine into adding characters which shouldn’t be there.&lt;/h5&gt;

&lt;p&gt;One of the samples managed to create a &lt;strong&gt;loop&lt;/strong&gt; in the recognition process of &lt;strong&gt;Google’s Cloud Vision OCR&lt;/strong&gt;, basically recognizing the same text multiple times. No &lt;strong&gt;DoS&lt;/strong&gt; or anything (or I’m not aware of it), I’m still not sure if the loop persisted or not - it either produced a small number of iterations, failed (timed out?) or they had load balancers which compensated for this and used different instances.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/cloud_ocr_bug.png&quot; alt=&quot;Possible loop in the recognition process: the same text gets recognized multiple times. The bottom-left and the top-right corners are 'merged' into an oblique text row so the recognition process is sent back to already processed text.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Possible loop in the recognition process: the same text gets recognized multiple times. The bottom-left and the top-right corners are ‘merged’ into an oblique text row so the recognition process is sent back to already processed text.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Let’s take a closer look at the sample: below, you can see how the adversarial sample was interpreted by Google’s Cloud Vision OCR system. The image was submitted directly to the Cloud Vision platform via the &lt;a href=&quot;https://cloud.google.com/vision/&quot; rel=&quot;nofollow&quot;&gt;“Try the API”&lt;/a&gt; option so, at the moment of testing, the results could be easily reproduced.&lt;/p&gt;
&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/cloud_ocr_bug2.png&quot; alt=&quot;Rectangles returned by Cloud Vision indicate that additional text rows are 'created' during the recognition thus creating a loop&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Rectangles returned by Cloud Vision indicate that additional text rows are ‘created’ during the recognition thus creating a loop&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Also the ‘boring’ case where the characters are hidden:&lt;/p&gt;
&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/cloud-ocr-artifacts.png&quot; alt=&quot;Once again, using the artifacts mode on a small text since larger texts are way easier to hide&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Once again, using the artifacts mode on a small text since larger texts are way easier to hide&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;It works, but the project reached its objective and is no longer in development.
It seems difficult to create samples that work for all OCR systems (&lt;strong&gt;generalization&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;Also, the samples are vulnerable to changes at the &lt;strong&gt;preprocessing&lt;/strong&gt; stage in the OCR pipeline such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;noise filtering (e.g.: median filters)&lt;/li&gt;
  &lt;li&gt;compression techniques (e.g.: Fourier compression)&lt;/li&gt;
  &lt;li&gt;downscaling-&amp;gt;upscaling (e.g.: Autoencoders)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, we can conclude that, using this approach, it is more challenging to mask small characters without making the text difficult to read. I compiled the following graph, in which are compared: the images generated by the algorithm (below &lt;strong&gt;7%&lt;/strong&gt; noise density) and a set of images that contain random noise (&lt;strong&gt;15%&lt;/strong&gt; noise density). The 2 sets contain different images with characters of sizes: 12, 21, 36, 50. Each random noise set contains 62 samples for each size - average values were used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Noise efficiency&lt;/strong&gt; is computed by taking into account the &lt;strong&gt;Levenshtein distance&lt;/strong&gt; and the total &lt;strong&gt;amount of noise&lt;/strong&gt; in the image.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/evaluating-the-robustness-of-ocr-systems/noise-eff-cloudocr.png&quot; alt=&quot;As characters get smaller, the efficiency of the noise added by the algorithm decreases - the random noise samples behave in an opposite manner.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;As characters get smaller, the efficiency of the noise added by the algorithm decreases - the random noise samples behave in an opposite manner.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;interesting-todos&quot;&gt;Interesting TODO’s&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Extracting templates from samples and training a generator?&lt;/li&gt;
  &lt;li&gt;Exploiting directly the row segmentation feature?&lt;/li&gt;
  &lt;li&gt;Attacking Otsu’s binarization method?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Maybe someday…&lt;/p&gt;

&lt;h2 id=&quot;cite&quot;&gt;Cite&lt;/h2&gt;

&lt;p&gt;Should you find this relevant to your work, you can cite the article using:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;@inproceedings{sporici2018evaluation,
  title={An Evaluation of OCR Systems Against Adversarial Machine Learning},
  author={Sporici, Dan and Chiroiu, Mihai and Cioc{\^\i}rlan, Dan},
  booktitle={International Conference on Security for Information Technology and Communications},
  pages={126--141},
  year={2018},
  organization={Springer}
}
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

</description>
        <pubDate>Sat, 07 Sep 2019 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/evaluating-the-robustness-of-ocr-systems</link>
        <guid isPermaLink="true">https://codingvision.net/evaluating-the-robustness-of-ocr-systems</guid>
        
        <category>research</category>
        
        <category>genetic-algorithm</category>
        
        <category>ocr</category>
        
        <category>tesseract</category>
        
        <category>adversarial-machine-learning</category>
        
        
      </item>
    
      <item>
        <title>Hot Patching C/C++ Functions with Intel Pin</title>
        <description>&lt;p&gt;5 years ago, I said in one of my articles that I shall return, one day, with a method of &lt;strong&gt;hot patching&lt;/strong&gt; functions inside live processes; So… I guess this is that day.&lt;/p&gt;

&lt;p&gt;What we’ll try to achieve here is to &lt;strong&gt;replace&lt;/strong&gt;, from outside, a function inside a &lt;strong&gt;running executable&lt;/strong&gt;, without stopping/freezing the process (or crashing it…).&lt;/p&gt;

&lt;p&gt;In my opinion, applying hot patches is quite a daunting task, if implemented from scratch, since:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;it requires access to a different process’ memory (most operating systems are fans of &lt;strong&gt;process isolation&lt;/strong&gt;)&lt;/li&gt;
  &lt;li&gt;has software compatibility constraints (&lt;strong&gt;Windows&lt;/strong&gt; binaries vs &lt;strong&gt;Linux&lt;/strong&gt; binaries)&lt;/li&gt;
  &lt;li&gt;has architecture compatibility constraints (&lt;strong&gt;32bit&lt;/strong&gt; vs &lt;strong&gt;64bit&lt;/strong&gt;)&lt;/li&gt;
  &lt;li&gt;it implies working with machine code and brings certain issues to the table&lt;/li&gt;
  &lt;li&gt;it has only a didactic purpose - probably no one would actually use a ‘from-scratch’ method since there are tools that do this better&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Considering these, I guess it is better to use something that was actually written for this task and not coding something manually.
Therefore, we’ll be looking at a way to do this with &lt;a href=&quot;https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool&quot; rel=&quot;nofollow&quot;&gt;Intel Pin&lt;/a&gt;. I stumbled upon this tool while working at a completely different project but it seems to be quite versatile. Basically, it is described as a &lt;strong&gt;Dynamic Binary Instrumentation Tool&lt;/strong&gt;, however we’ll be using it to facilitate the procedure of writing code to another process’ memory.&lt;/p&gt;

&lt;h2 id=&quot;initial-preparations&quot;&gt;Initial Preparations&lt;/h2&gt;

&lt;p&gt;Start by &lt;a href=&quot;https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-downloads&quot; rel=&quot;nofollow&quot;&gt;downloading Intel Pin&lt;/a&gt; and extract it somewhere in your workspace.&lt;/p&gt;

&lt;h5 id=&quot;im-doing-this-tutorial-on-ubuntu-x86_64-but-im-expecting-the-code-to-be-highly-similar-on-windows-or-other-operating-systems&quot;&gt;I’m doing this tutorial on Ubuntu x86_64, but I’m expecting the code to be highly similar on Windows or other operating systems.&lt;/h5&gt;

&lt;p&gt;Now, I imagine this turns out to be useful for endpoints that provide remote services to clients - i.e.: a server receives some sort of input and is expected to also return something. Let’s say that someone discovered that a  service is vulnerable to certain inputs - so it can be compromised by the first attacker who submits a specially crafted request. We’ll consider that taking the service down, compiling, deploying and launching a new instance is not a desirable solution so hot patching is wanted until a new version is ready. &lt;/p&gt;

&lt;p&gt;I’ll use the following &lt;strong&gt;dummy&lt;/strong&gt; C program to illustrate the aforementioned model - to keep it simple, I’m reading inputs from &lt;strong&gt;stdin&lt;/strong&gt; (instead of a tcp stream / network).&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// TODO: hot patch this method&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;read_input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Tell me your name:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;scanf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// this looks bad&lt;/span&gt;
    
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hello, %s!&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// not gonna end too soon&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;read_input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Some of you probably noticed that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read_input()&lt;/code&gt; function is not very well written since it’s reading inputs using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scanf(&quot;%s&quot;, name);&lt;/code&gt; and thus enabling an attacker to hijack the program’s execution using &lt;strong&gt;buffer overflow&lt;/strong&gt;.&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/hot-patching-functions-with-intel-pin/buffer_overflow.png&quot; alt=&quot;Scanf() reading exceeds the limits of the allocated buffer&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Scanf() reading exceeds the limits of the allocated buffer&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;We intend to patch this vulnerability by “replacing” the vulnerable reading function (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read_input()&lt;/code&gt;) with another that we know it’s actually safe. I’m using quotes there to express the fact that
it will act more like a re-routing procedure - the code of the original (vulnerable) function will still be in the process’ memory, but all the calls will be forwarded to the new (patched) method.&lt;/p&gt;

&lt;p&gt;I hope it makes sense for now.&lt;/p&gt;

&lt;h2 id=&quot;projects-structure&quot;&gt;Project’s Structure&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Intel Pin&lt;/strong&gt; works by performing actions, indicated in &lt;strong&gt;tools&lt;/strong&gt;, to targeted &lt;strong&gt;binaries&lt;/strong&gt; or &lt;strong&gt;processes&lt;/strong&gt;. As an example, you may have a tool that says &lt;em&gt;‘increase a counter each time you find a RET instruction’&lt;/em&gt; that you can attach to an executable and get the value of the counter at a certain time.&lt;/p&gt;

&lt;p&gt;It offers a directory with examples of &lt;strong&gt;tools&lt;/strong&gt; which can be found at: &lt;strong&gt;pin/source/tools/&lt;/strong&gt;. In order to avoid updating makefile dependencies, we’ll work here so continue by creating a new directory (mine’s named &lt;strong&gt;Hotpatch&lt;/strong&gt;) - this is where the coding happens.&lt;/p&gt;

&lt;p&gt;Also, copy a &lt;strong&gt;makefile&lt;/strong&gt; to your new directory, if you don’t feel like writing one:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;nb&quot;&gt;cp&lt;/span&gt; ../SimpleExamples/makefile &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And use the following as your &lt;strong&gt;makefile.rules&lt;/strong&gt; file:&lt;/p&gt;

&lt;div class=&quot;language-make highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;nv&quot;&gt;TEST_TOOL_ROOTS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; hotpatch &lt;span class=&quot;c&quot;&gt;# for hotpatch.cpp&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;SANITY_SUBSET&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$(TEST_TOOL_ROOTS)&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$(TEST_ROOTS)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, create a file named &lt;strong&gt;hotpatch.cpp&lt;/strong&gt; with some dummy code and run the &lt;strong&gt;make&lt;/strong&gt; command. If everything works fine, you should end up with something like this…&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/hot-patching-functions-with-intel-pin/directory_structure.png&quot; alt=&quot;Directory structure for the Hotpatch tool&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Directory structure for the Hotpatch tool&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;coding-the-hot-patcher&quot;&gt;Coding the Hot Patcher&lt;/h2&gt;

&lt;p&gt;The whole idea revolves around registering a &lt;strong&gt;callback&lt;/strong&gt; which is called every time the binary loads an image (see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IMG_AddInstrumentFunction()&lt;/code&gt;). Since the method is defined in the running program, we’re interested when the process loads its own image. In this callback, we look for the method that we want to &lt;strong&gt;hot patch&lt;/strong&gt; (replace) - in my example, it’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read_input()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can list the functions that are present in a binary using:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;nm targeted_binary_name
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The process of replacing a function (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RTN_ReplaceSignatureProbed()&lt;/code&gt;) is based on &lt;strong&gt;probes&lt;/strong&gt; - as you can tell by the name, which, according to &lt;strong&gt;Intel&lt;/strong&gt;’s claims, ensure less overhead and are less intrusive. Under the hood, &lt;strong&gt;Intel Pin&lt;/strong&gt; will overwrite the original function’s instructions with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JMP&lt;/code&gt; that points to the replacement function. It is up to you to call the original function, if needed.&lt;/p&gt;

&lt;p&gt;Without further ado, the code I ended up with:&lt;/p&gt;

&lt;div class=&quot;language-cpp highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;cp&quot;&gt;#include &quot;pin.H&quot;
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;stdio.h&amp;gt;
&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_routine_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;read_input&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;


&lt;span class=&quot;c1&quot;&gt;// replacement routine's code (i.e. patched read_input)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;read_input_patched&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;original_routine_ptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_address&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Tell me your name:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    
    &lt;span class=&quot;c1&quot;&gt;// 5 stars stdin reading method&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stdin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strcspn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\r\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// discard rest of the data from stdin&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\n'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EOF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hello, %s!&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;


&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;loaded_image_callback&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IMG&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current_image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// look for the routine in the loaded image&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;RTN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current_routine&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RTN_FindByName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_routine_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    

    &lt;span class=&quot;c1&quot;&gt;// stop if the routine was not found in this image&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RTN_Valid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_routine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// skip routines which are unsafe for replacement&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RTN_IsSafeForProbedReplacement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_routine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cerr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Skipping unsafe routine &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_routine_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; in image &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IMG_Name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// replacement routine's prototype: returns void, default calling standard, name, takes no arugments &lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;PROTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;replacement_prototype&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PROTO_Allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PIN_PARG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CALLINGSTD_DEFAULT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_routine_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PIN_PARG_END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// replaces the original routine with a jump to the new one &lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;RTN_ReplaceSignatureProbed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_routine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
                               &lt;span class=&quot;n&quot;&gt;AFUNPTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read_input_patched&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
                               &lt;span class=&quot;n&quot;&gt;IARG_PROTOTYPE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
                               &lt;span class=&quot;n&quot;&gt;replacement_prototype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;IARG_ORIG_FUNCPTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;IARG_FUNCARG_ENTRYPOINT_VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;IARG_RETURN_IP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;IARG_END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;PROTO_Free&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;replacement_prototype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cout&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Successfully replaced &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_routine_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; from image &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IMG_Name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;


&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[])&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;PIN_InitSymbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PIN_Init&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cerr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Failed to initialize PIN.&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; 
        &lt;span class=&quot;n&quot;&gt;exit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EXIT_FAILURE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// registers a callback for the &quot;load image&quot; action&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;IMG_AddInstrumentFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loaded_image_callback&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    
    &lt;span class=&quot;c1&quot;&gt;// runs the program in probe mode&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;PIN_StartProgramProbed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXIT_SUCCESS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After running &lt;strong&gt;make&lt;/strong&gt;, use a command like the following one to attach &lt;strong&gt;Intel Pin&lt;/strong&gt; to a running instance of the targeted process.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;nb&quot;&gt;sudo&lt;/span&gt; ../../../pin &lt;span class=&quot;nt&quot;&gt;-pid&lt;/span&gt; &lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;pidof targeted_binary_name&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; obj-intel64/hotpatch.so
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;results-and-conclusions&quot;&gt;Results and Conclusions&lt;/h2&gt;

&lt;p&gt;Aaand it seems to be working:&lt;/p&gt;
&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/hot-patching-functions-with-intel-pin/hot_patched_process.png&quot; alt=&quot;Testing the Hot Patched version against Buffer Overflow&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;Testing the Hot Patched version against Buffer Overflow&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;To conclude, I’m pretty sure &lt;strong&gt;Intel Pin&lt;/strong&gt; is capable of more complex stuff than what I’m presenting here - which I believe is examples-level (actually it’s inspired by an example). To me, it seems rather strange that it is not a more popular tool - and no, I’m not paid by Intel to endorse it.&lt;/p&gt;

&lt;p&gt;However, I hope this article manages to provide support and solutions/ideas to those who are looking at &lt;strong&gt;hot patching&lt;/strong&gt; methods and who, like me, never heard of &lt;strong&gt;Intel Pin&lt;/strong&gt; before.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.codeproject.com&quot; rel=&quot;tag&quot; style=&quot;display:none&quot;&gt;CodeProject&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Tue, 20 Aug 2019 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/hot-patching-functions-with-intel-pin</link>
        <guid isPermaLink="true">https://codingvision.net/hot-patching-functions-with-intel-pin</guid>
        
        <category>intel-pin</category>
        
        <category>hot-patch</category>
        
        <category>cpp</category>
        
        <category>buffer-overflow</category>
        
        
      </item>
    
      <item>
        <title>Gradient Descent Simply Explained (with Example)</title>
        <description>&lt;p&gt;So… I’ll try to explain here the concept of &lt;strong&gt;gradient descent&lt;/strong&gt; as simple as possible in order to provide some insight of what’s happening from a mathematical perspective and why the formula works. I’ll try to keep it short and split this into 2 &lt;em&gt;chapters&lt;/em&gt;: &lt;strong&gt;theory&lt;/strong&gt; and &lt;strong&gt;example&lt;/strong&gt; - take it as a ELI5 linear regression tutorial.&lt;/p&gt;

&lt;p&gt;Feel free to skip the mathy stuff and jump directly to the &lt;strong&gt;example&lt;/strong&gt; if you feel that it might be easier to understand.&lt;/p&gt;

&lt;h2 id=&quot;theory-and-formula&quot;&gt;Theory and Formula&lt;/h2&gt;

&lt;p&gt;For the sake of simplicity, we’ll work in the &lt;strong&gt;1D&lt;/strong&gt; space: we’ll optimize a function that has only one &lt;strong&gt;coefficient&lt;/strong&gt; so it is easier to plot and comprehend.
The function can look like this:&lt;/p&gt;

\[f(x) = w \cdot x + 2\]

&lt;p&gt;where we have to determine the value of \(w\) such that the function successfully matches / approximates a set of known points.&lt;/p&gt;

&lt;p&gt;Since our interest is to find the best coefficient, we’ll consider \(w\) as a &lt;strong&gt;variable&lt;/strong&gt; in our formulas and while computing the derivatives; \(x\) will be treated as a &lt;strong&gt;constant&lt;/strong&gt;. In other words, we don’t compute the &lt;strong&gt;derivative&lt;/strong&gt; with respect to \(x\) since we don’t want to find values for it - we already have a set of inputs for the function, we’re not allowed to change them.&lt;/p&gt;

&lt;p&gt;To properly grasp the gradient descent, as an optimization method, you need to know the following mathematical fact:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;strong&gt;derivative&lt;/strong&gt; of a function is &lt;span style=&quot;color:green&quot;&gt;positive&lt;/span&gt; when the function &lt;span style=&quot;color:green&quot;&gt;increases&lt;/span&gt; and is &lt;span style=&quot;color:red&quot;&gt;negative&lt;/span&gt; when the function &lt;span style=&quot;color:red&quot;&gt;decreases&lt;/span&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And writing this mathematically…&lt;/p&gt;

\[\frac{\mathrm{d} }{\mathrm{d} w}f(w) {\color{Green}&amp;gt; 0} \rightarrow  f(w) {\color{Green}\nearrow }\]

\[\frac{\mathrm{d} }{\mathrm{d} w}f(w) {\color{Red}&amp;lt; 0} \rightarrow  f(w) {\color{Red}\swarrow }\]

&lt;p&gt;This is happening because the derivative can be seen as the slope of a function’s plot at a given point. I won’t go into details here, but check out the graph below - it should help.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why is this important?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Because, as you probably know already, &lt;strong&gt;gradient descent&lt;/strong&gt; attempts to &lt;span style=&quot;color:red&quot;&gt;minimize&lt;/span&gt; the &lt;strong&gt;error function&lt;/strong&gt; (aka cost function).&lt;/p&gt;

&lt;p&gt;Now, assuming we use the &lt;strong&gt;MSE&lt;/strong&gt; (Mean Squared Error) function, we have something that looks like this:&lt;/p&gt;

\[\hat{y_i} = f(x_i)\]

\[MSE = \frac{1}{n} \cdot \sum_{i=1}^{i=n}{(y_i - \hat{y_i})^2}\]

&lt;p&gt;Where: \(y_i\) is the correct value, \(\hat{y_i}\) is the current (computed) value and \(n\) is the number of points we’re using to compute the \(MSE\).&lt;/p&gt;

&lt;h5 id=&quot;the-mse-is-always-positive-since-its-a-sum-of-squared-values-and-therefore-has-a-known-minimum-which-is-0---so-it-can-be-minimized-using-the-aforementioned-method&quot;&gt;The &lt;strong&gt;MSE&lt;/strong&gt; is &lt;strong&gt;always positive&lt;/strong&gt; (since it’s a sum of squared values) and therefore has a &lt;strong&gt;known minimum&lt;/strong&gt;, which is &lt;strong&gt;0&lt;/strong&gt; - so it can be &lt;span style=&quot;color:red&quot;&gt;minimized&lt;/span&gt; using the aforementioned method.&lt;/h5&gt;

&lt;p&gt;Take a look at the plot below: the &lt;strong&gt;sign&lt;/strong&gt; of the &lt;strong&gt;slope&lt;/strong&gt; provides useful information of where the &lt;strong&gt;minimum&lt;/strong&gt; of the function is. We can use the value of the &lt;strong&gt;slope&lt;/strong&gt; (the derivative) to adjust the value of the coefficient &lt;strong&gt;w&lt;/strong&gt; (i.e.: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;w = w - slope&lt;/code&gt;).&lt;/p&gt;

&lt;figure class=&quot;image&quot;&gt;
  &lt;img src=&quot;data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=&quot; data-echo=&quot;/imgs/posts/gradient-descent-simply-explained-with-example/mse-slope-plot.png&quot; alt=&quot;The sign of the slope can be used to locate the function's minimum value.&quot; /&gt;
  &lt;figcaption&gt;&lt;p&gt;The sign of the slope can be used to locate the function’s minimum value.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Time to compute the derivative. Before that, I must warn you: it’s quite a &lt;em&gt;long&lt;/em&gt; formula but I tried to do it step by step. Behold!&lt;/p&gt;

\[\frac{\mathrm{d}}{\mathrm{d} w}MSE = \frac{\mathrm{d}}{\mathrm{d} w} (\frac{1}{n} \cdot \sum_{i=1}^{i=n}{(y_i - \hat{y_i})^2}) =\]

\[= \frac{1}{n} \cdot \frac{\mathrm{d}}{\mathrm{d} w} (\sum_{i=1}^{i=n}{(y_i - \hat{y_i})^2}) =\]

\[= \frac{1}{n} \cdot \sum_{i=1}^{i=n}{\frac{\mathrm{d}}{\mathrm{d} w}((y_i - \hat{y_i})^2}) =\]

\[= \frac{1}{n} \cdot \sum_{i=1}^{i=n}{\frac{\mathrm{d}}{\mathrm{d} w}((y_i - \hat{y_i})^2}) =\]

\[= \frac{2}{n} \cdot \sum_{i=1}^{i=n}{(y_i - \hat{y_i})} \cdot (-1) \cdot \frac{\mathrm{d \hat{y_i}}}{\mathrm{d} w}\]

&lt;p&gt;Phew. 
From here, you’d have to replace \(\frac{\mathrm{d \hat{y_i}}}{\mathrm{d} w}\) with the derivative of the function you chose to optimize. For \(\hat{y} = w \cdot x + 2\), we get:&lt;/p&gt;

\[= \frac{2}{n} \cdot \sum_{i=1}^{i=n}{(y_i - \hat{y_i})} \cdot (-1) \cdot x\]

&lt;p&gt;And that’s about it. You can now update the values of your coefficient \(w\) using the following formula:&lt;/p&gt;

\[w = w - learning\_rate \cdot \frac{\mathrm{d }}{\mathrm{d} w}MSE(w)\]

&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;

&lt;p&gt;We’ll do the example in a &lt;strong&gt;2D&lt;/strong&gt; space, in order to represent a basic &lt;strong&gt;linear regression&lt;/strong&gt; (a &lt;strong&gt;Perceptron&lt;/strong&gt; without an activation function). 
Given the function below:&lt;/p&gt;

\[f(x) = w_1 \cdot x + w_2\]

&lt;p&gt;we have to find \(w_1\) and \(w_2\), using &lt;strong&gt;gradient descent&lt;/strong&gt;, so it approximates the following set of points:&lt;/p&gt;

\[f(1) = 5, f(2) = 7\]

&lt;p&gt;We start by writing the &lt;strong&gt;MSE&lt;/strong&gt;:&lt;/p&gt;

\[MSE = \frac{1}{n} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x + w_2))^2}\]

&lt;p&gt;And then the differentiation part. Since there are &lt;strong&gt;2 coefficients&lt;/strong&gt;, we compute &lt;strong&gt;partial derivatives&lt;/strong&gt; - each one corresponds to its coefficient.&lt;/p&gt;

&lt;p&gt;For \(w_1\):&lt;/p&gt;

\[\frac{\partial}{\partial w_1} (\frac{1}{n} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2))^2}) =\]

\[= \frac{1}{n} \cdot \sum_{i=1}^{i=2}{(y_i - \frac{\partial}{\partial w_1}(w_1 \cdot x_i + w_2))^2} =\]

\[= \frac{1}{n} \cdot 2 \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2)) \cdot (-1) \cdot x_i} =\]

\[= -\frac{2}{n} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2)) \cdot x_i}\]

&lt;p&gt;For \(w_2\):&lt;/p&gt;

\[\frac{\partial}{\partial w_2} (\frac{1}{n} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2))^2}) =\]

\[= -\frac{2}{n} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2))}\]

&lt;p&gt;Now, we pick some &lt;strong&gt;random&lt;/strong&gt; values for our coefficients. Let’s say \(w_1 = 9\) and \(w_2 = 10\).&lt;/p&gt;

&lt;p&gt;We compute:&lt;/p&gt;

\[f(1) = 9 \cdot 1 + 10 = 19, f(2) = 9 \cdot 2 + 10 = 28\]

&lt;p&gt;Obviously, these are not the outputs we’re looking for, so we’ll continue by adjusting the coefficients (we’ll consider a &lt;strong&gt;0.15&lt;/strong&gt; learning rate):&lt;/p&gt;

\[w_1 = w_1 - learning\_rate \cdot \frac{\partial}{\partial w_1} MSE =\]

\[= 9 + 0.15 \cdot \frac{2}{2} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2)) \cdot x_i} =\]

\[= 9 + 0.15 \cdot ((5 - (9 \cdot 1 + 10)) \cdot 1 + (7 - (9 \cdot 2 + 10)) \cdot 2) =\]

\[= 9 - 0.15 \cdot 56 = 0.6\]

\[w_2 = w_2 - learning\_rate \cdot \frac{\partial}{\partial w_2} MSE =\]

\[= 10 + 0.15 \cdot \frac{2}{2} \cdot \sum_{i=1}^{i=2}{(y_i - (w_1 \cdot x_i + w_2))} =\]

\[= 10 + 0.15 \cdot ((5 - (9 \cdot 1 + 10)) + (7 - (9 \cdot 2 + 10))) =\]

\[= 10 - 0.15 \cdot 35 = 4.75\]

&lt;p&gt;Recalculating the output of our function, we observe that the outputs are somehow closer to our expected values.&lt;/p&gt;

\[f(1) = 0.6 \cdot 1 + 4.75 = 5.35, f(2) = 0.6 \cdot 2 + 1.25 = 5.95\]

&lt;p&gt;Running a second step of optimization:&lt;/p&gt;

\[w_1 = 0.6 + 0.15 \cdot ((5 - (0.6 \cdot 1 + 4.75)) \cdot 1 + (7 - (0.6 \cdot 2 + 4.75)) \cdot 2) =\]

\[= 0.6 + 0.15 \cdot 1.75 = 0.86\]

\[w_2 = 4.75 + 0.15 \cdot ((5 - (0.6 \cdot 1 + 4.75)) + (7 - (0.6 \cdot 2 + 4.75))) =\]

\[= 4.75 + 0.15 \cdot 0.7 = 4.85\]

&lt;p&gt;Now, this is going to take multiple iterations in order to converge and we’re not going to do everything by hand.
Writing this formula as a Python script yields the following results:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;rouge-gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;rouge-code&quot;&gt;&lt;pre&gt;1: w1 = 9.000, w2 = 10.000, MSE: 318.5 
   f(1) = 19.000, f(2) = 28.000
------------------------------------------------
2: w1 = 0.600, w2 = 4.750, MSE: 0.6125 
   f(1) = 5.350, f(2) = 5.950
------------------------------------------------
3: w1 = 0.862, w2 = 4.855, MSE: 0.345603125 
   f(1) = 5.718, f(2) = 6.580
------------------------------------------------
4: w1 = 0.881, w2 = 4.810, MSE: 0.330451789063 
   f(1) = 5.691, f(2) = 6.572
------------------------------------------------
5: w1 = 0.906, w2 = 4.771, MSE: 0.316146225664 
   f(1) = 5.676, f(2) = 6.582
------------------------------------------------
6: w1 = 0.929, w2 = 4.732, MSE: 0.302460106908 
   f(1) = 5.662, f(2) = 6.591
------------------------------------------------
7: w1 = 0.953, w2 = 4.694, MSE: 0.289366466781 
   f(1) = 5.647, f(2) = 6.600
------------------------------------------------
8: w1 = 0.976, w2 = 4.657, MSE: 0.276839656487 
   f(1) = 5.633, f(2) = 6.609
------------------------------------------------
9: w1 = 0.998, w2 = 4.621, MSE: 0.264855137696 
   f(1) = 5.619, f(2) = 6.617


[...]


------------------------------------------------
195: w1 = 1.984, w2 = 3.026, MSE: 7.04866766459e-05 
     f(1) = 5.010, f(2) = 6.994
------------------------------------------------
196: w1 = 1.984, w2 = 3.026, MSE: 6.74352752985e-05 
     f(1) = 5.010, f(2) = 6.994
------------------------------------------------
197: w1 = 1.984, w2 = 3.025, MSE: 6.45159705491e-05 
     f(1) = 5.010, f(2) = 6.994
------------------------------------------------
198: w1 = 1.985, w2 = 3.025, MSE: 6.17230438739e-05 
     f(1) = 5.009, f(2) = 6.994
------------------------------------------------
199: w1 = 1.985, w2 = 3.024, MSE: 5.90510243065e-05 
     f(1) = 5.009, f(2) = 6.994
------------------------------------------------
200: w1 = 1.985, w2 = 3.024, MSE: 5.64946777215e-05 
     f(1) = 5.009, f(2) = 6.994
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It converges to \(w_1 = 2\) and \(w_2 = 3\) which are, indeed, the coefficients we were looking for.&lt;/p&gt;

&lt;p&gt;In practice, I recommend experimenting with &lt;strong&gt;smaller&lt;/strong&gt; learning rates and more iterations - large learning rates can lead to &lt;strong&gt;divergence&lt;/strong&gt; (the coefficients stray from their correct values and tend to plus or minus infinity).&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I guess this is all. Reading it now, I think it might take more than 5 minutes but… I guess it’s still a short article when compared to others that discuss the same subject :))&lt;/p&gt;

&lt;p&gt;I hope this proves useful as a starting point and you’ve got something out of it. &lt;strong&gt;Backpropagation&lt;/strong&gt; of errors in &lt;strong&gt;nerual networks&lt;/strong&gt; works in a similar fashion, although the number of dimensions is way larger than what was presented here. Aaand it contains some additional features in order to handle &lt;strong&gt;non-convex&lt;/strong&gt; functions (and avoid getting stuck in &lt;strong&gt;local minima&lt;/strong&gt;). Maybe in other article we’ll take a look at those, too.&lt;/p&gt;
</description>
        <pubDate>Mon, 12 Aug 2019 21:45:05 +0000</pubDate>
        <link>https://codingvision.net/gradient-descent-simply-explained-with-example</link>
        <guid isPermaLink="true">https://codingvision.net/gradient-descent-simply-explained-with-example</guid>
        
        <category>algorithm</category>
        
        <category>optimization</category>
        
        <category>gradient-descent</category>
        
        
      </item>
    
  </channel>
</rss>
