<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Joomla! - Open Source Content Management" -->
<?xml-stylesheet href="/plugins/system/jce/css/content.css?aa754b1f19c7df490be4b958cf085e7c" type="text/css"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>bnosac :: open analytical helpers - bnosac :: open analytical helpers</title>
		<description><![CDATA[]]></description>
		<link>http://www.bnosac.be/index.php/blog</link>
		<lastBuildDate>Tue, 09 Jun 2026 13:21:31 +0000</lastBuildDate>
		<generator>Joomla! - Open Source Content Management</generator>
		<atom:link rel="self" type="application/rss+xml" href="http://www.bnosac.be/index.php/blog?format=feed&amp;type=rss"/>
		<language>en-gb</language>
		<managingEditor>info@bnosac.be (bnosac :: open analytical helpers)</managingEditor>
		<item>
			<title>audio transcription with whisper from R</title>
			<link>http://www.bnosac.be/index.php/blog/105-audio-transcription-with-whisper-from-r</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/105-audio-transcription-with-whisper-from-r</guid>
			<description><![CDATA[<p>Last week, OpenAI released version 2 of an updated neural net called <a href="https://github.com/openai/whisper">Whisper </a>that approaches human level robustness and accuracy on speech recognition. You can now directly call from R a C/C++ inference engine which allow you to transcribe .wav audio files.</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/logo-audio-whisper-x100.png" alt="logo audio whisper x100" width="114" height="100" style="margin: 15px; float: left;" /></p>
<p>To allow to easily do this in R, <a href="https://www.bnosac.be">BNOSAC</a> created an R wrapper around the <a href="https://github.com/ggerganov/whisper.cpp">whisper.cpp</a> code. This R package is available at&nbsp;<a href="https://github.com/bnosac/audio.whisper">https://github.com/bnosac/audio.whisper</a>&nbsp;and can be installed as follows.&nbsp;</p>
<pre>remotes::install_github("bnosac/audio.whisper")</pre>
<p>The following code shows how you can transcribe an example 16-bit wav file with a <a href="https://github.com/bnosac/audio.whisper/blob/master/inst/samples/jfk.wav">fragment of a speech by JFK available here</a>.&nbsp;</p>
<p>
<video controls="controls" width="100%" height="80"><source src="https://user-images.githubusercontent.com/1991296/199337465-dbee4b5e-9aeb-48a3-b1c6-323ac4db5b2c.mp4" type="video/mp4" /> Your browser does not support the video tag. </video>
</p>
<pre>library(audio.whisper)
model &lt;- whisper("tiny")
path  &lt;- system.file(package = "audio.whisper", "samples", "jfk.wav")
trans &lt;- predict(model, newdata = path, language = "en", n_threads = 2)
trans
$n_segments
[1] 1

$data
 segment         from           to                                                                                                       text
       1 00:00:00.000 00:00:11.000  And so my fellow Americans ask not what your country can do for you ask what you can do for your country.

$tokens
 segment      token token_prob
       1        And  0.7476438
       1         so  0.9042299
       1         my  0.6872202
       1     fellow  0.9984470
       1  Americans  0.9589157
       1        ask  0.2573057
       1        not  0.7678108
       1       what  0.6542882
       1       your  0.9386917
       1   counstry  0.9854987
       1        can  0.9813995
       1         do  0.9937403
       1        for  0.9791515
       1        you  0.9925495
       1        ask  0.3058807
       1       what  0.8303462
       1        you  0.9735528
       1        can  0.9711444
       1         do  0.9616748
       1        for  0.9778513
       1       your  0.9604713
       1    country  0.9923630
       1          .  0.4983074</pre>
<p>Another example based on a Micro Machines commercial from the 1980's.</p>
<p>
<video controls="controls" width="100%" height="80"><source src="https://user-images.githubusercontent.com/1991296/199337504-cc8fd233-0cb7-4920-95f9-4227de3570aa.mp4" type="video/mp4" /> Your browser does not support the video tag. </video>
</p>
<p>I've always wanted to get the transcription of the performances of Francis E. Dec available on&nbsp;<a href="https://www.ubu.com/sound/dec.html">UbuWeb Sound - Francis E. Dec</a> like this performance: <a href="https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3">https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3</a>. This is how you can now do that from R.</p>
<pre>library(av)
download.file(url = "https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3", <br />              destfile = "rant1.mp3", mode = "wb")
av_audio_convert("rant1.mp3", output = "output.wav", format = "wav", sample_rate = 16000)<br /><br />trans &lt;- predict(model, newdata = "output.wav", language = "en", 
                 duration = 30 * 1000, offset = 7 * 1000, 
                 token_timestamps = TRUE)
trans
$n_segments
[1] 11

$data
segment         from           to                                                             text
      1 00:00:07.000 00:00:09.000                                             Look at the picture.
      2 00:00:09.000 00:00:11.000                                                   See the skull.
      3 00:00:11.000 00:00:13.000                                        The part of bone removed.
      4 00:00:13.000 00:00:16.000                     The master race Frankenstein radio controls.
      5 00:00:16.000 00:00:18.000                           The brain thoughts broadcasting radio.
      6 00:00:18.000 00:00:21.000        The eyesight television. The Frankenstein earphone radio.
      7 00:00:21.000 00:00:25.000  The threshold brain wash radio. The latest new skull reforming.
      8 00:00:25.000 00:00:28.000                            To contain all Frankenstein controls.
      9 00:00:28.000 00:00:31.000                     Even in thin skulls of white pedigree males.
     10 00:00:31.000 00:00:34.000                                   Visible Frankenstein controls.
     11 00:00:34.000 00:00:37.000            The synthetic nerve radio, directional and an alloop.

$tokens
segment         token token_prob   token_from     token_to
      1          Look  0.4281234 00:00:07.290 00:00:07.420
      1            at  0.9485379 00:00:07.420 00:00:07.620
      1           the  0.9758387 00:00:07.620 00:00:07.940
      1       picture  0.9734664 00:00:08.150 00:00:08.580
      1             .  0.9688568 00:00:08.680 00:00:08.910
      2           See  0.9847929 00:00:09.000 00:00:09.420
      2           the  0.7588121 00:00:09.420 00:00:09.840
      2         skull  0.9989663 00:00:09.840 00:00:10.310
      2             .  0.9548351 00:00:10.550 00:00:11.000
      3           The  0.9914295 00:00:11.000 00:00:11.170
      3          part  0.9789217 00:00:11.560 00:00:11.600
      3            of  0.9958754 00:00:11.600 00:00:11.770
      3          bone  0.9759618 00:00:11.770 00:00:12.030
      3       removed  0.9956936 00:00:12.190 00:00:12.710
      3             .  0.9965582 00:00:12.710 00:00:12.940<br />...</pre>
<p>Maybe in the near future we will put it on CRAN, currently it is only at <a href="https://github.com/bnosac/audio.whisper">https://github.com/bnosac/audio.whisper</a>.</p>
<p><a href="http://www.bnosac.be/index.php/contact/get-in-touch?id=0">Get in touch</a> if you are interested in this and&nbsp;let us know what you plan to use it for.&nbsp;</p>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Thu, 15 Dec 2022 22:38:06 +0000</pubDate>
		</item>
		<item>
			<title>Image Annotation</title>
			<link>http://www.bnosac.be/index.php/blog/104-image-annotation</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/104-image-annotation</guid>
			<description><![CDATA[<p>This week, I uploaded a newer version of the <a href="https://cran.r-project.org/web/packages/recogito/index.html">R package recogito</a> to CRAN.</p>
<p>The <a href="https://cran.r-project.org/web/packages/recogito/index.html">recogito R package</a> provides tools to manipulate and annotate images and text in shiny. It is a htmlwidgets R wrapper around the excellent <a href="https://github.com/recogito/recogito-js">recogito-js</a> and <a href="https://github.com/recogito/annotorious">annotorious </a>javascript libraries as well as it's integration with <a href="https://openseadragon.github.io/">openseadragon</a>.<br />You can use the package to set up shiny apps which</p>
<ul>
<li><strong>annotate areas of interest (rectangles / polygons) in images</strong> with specific labels</li>
<li><strong>annotate text using tags and relations</strong> between these tags (for entity labelling / entity linking).</li>
</ul>
<p>The video below shows the image manipulation functionality in action in a shiny app which allows to align image areas with chunks of transcribed handwritten texts.</p>
<p>
<video controls="controls" width="100%" height="600"><source src="https://user-images.githubusercontent.com/1710810/181257966-2e08027c-b039-413f-a310-75bee2d6088c.mp4" type="video/mp4" /> Your browser does not support the video tag. </video>
</p>
<p>Although the package was orginally designed to extract information from handwritten text documents from the 18th-19th century, you can probably use it in other domains as well.<br />To get you started install the package from CRAN and read the <a href="https://cran.r-project.org/web/packages/recogito/readme/README.html">README</a>.</p>
<pre>install.packages("recogito")</pre>
<p>The following code shows an example app which shows an url and allows you to annotate areas of interest. Enjoy.</p>
<pre>library(shiny)<br />library(recogito)<br />url &lt;- "<a href="https://upload.wikimedia.org/wikipedia/commons/a/a0/Pamphlet_dutch_tulipomania_1637.jpg">https://upload.wikimedia.org/wikipedia/commons/a/a0/Pamphlet_dutch_tulipomania_1637.jpg</a>"<br />ui &lt;- fluidPage(openseadragonOutput(outputId = "anno", height = "700px"),<br />                tags$h3("Results"),<br />                verbatimTextOutput(outputId = "annotation_result"))<br />server &lt;- function(input, output) {<br /> current_img &lt;- reactiveValues(url = url)<br /> output$anno &lt;- renderOpenSeaDragon({<br />   annotorious(inputId = "results", src = current_img$url, tags = c("IMAGE", "TEXT"), type = "openseadragon")<br /> })<br /> output$annotation_result &lt;- renderPrint({<br />   read_annotorious(input$results)<br /> })<br />}<br />shinyApp(ui, server)<br /><br /></pre>
<p><img src="http://www.bnosac.be/images/bnosac/blog/recogito-example.png" alt="recogito example" width="1157" height="953" style="display: block; margin-left: auto; margin-right: auto;" /></p>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Mon, 08 Aug 2022 11:24:17 +0000</pubDate>
		</item>
		<item>
			<title>doc2vec in R</title>
			<link>http://www.bnosac.be/index.php/blog/103-doc2vec-in-r</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/103-doc2vec-in-r</guid>
			<description><![CDATA[<p>Learn how to apply doc2vec in R on your text in this pdf presentation available at <a href="http://www.bnosac.be/index.php/blog/103-doc2vec-in-R">https://www.bnosac.be/index.php/blog/103-doc2vec-in-R</a>. Where we focus on our R package doc2vec available at&nbsp;<a href="https://github.com/bnosac/doc2vec">https://github.com/bnosac/doc2vec</a></p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/doc2vec-github.png" alt="word2vec github" style="margin-top: 10px; margin-right: auto; margin-bottom: 10px; display: block;" />You can view the presentation below.</p>
<p style="text-align: justify;"><span style="color: #808000;"><strong>NEW, since 2020, you can now access courses Text Mining with R and Advanced R programming online through our online school,&nbsp;</strong><a href="http://www.bnosac.be/index.php/contact/get-in-touch" style="color: #808000;">let us know here</a><strong>&nbsp;if you want to obtain access.</strong></span></p>
<p>{aridoc engine="pdfjs" width="100%" height="550"}images/bnosac/blog/R_doc2vec.pdf{/aridoc}</p>
<ul>
<li><strong>In case you are reading this at a blog aggregator, head to&nbsp;<a href="http://www.bnosac.be/index.php/blog/103-doc2vec-in-R">https://www.bnosac.be/index.php/blog/103-doc2vec-in-R</a>&nbsp;directly to see the presentation.&nbsp;</strong></li>
<li>If you are reading this, you might as well be interested in applying word2vec in R: head to&nbsp;<a href="http://www.bnosac.be/index.php/blog/100-word2vec-in-r">http://www.bnosac.be/index.php/blog/100-word2vec-in-r</a>&nbsp;for details.</li>
</ul>
<p>Enjoy.</p>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Tue, 15 Dec 2020 11:51:39 +0000</pubDate>
		</item>
		<item>
			<title>udpipe R package updated</title>
			<link>http://www.bnosac.be/index.php/blog/102-udpipe-r-package-updated</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/102-udpipe-r-package-updated</guid>
			<description><![CDATA[<p>An update of the udpipe R package (<a href="https://bnosac.github.io/udpipe/en">https://bnosac.github.io/udpipe/en</a>) landed safely on CRAN last week. Originally the udpipe R package was put on CRAN in 2017 wrapping the UDPipe (v1.2 C++) tokeniser/lemmatiser/parts of speech tagger and dependency parser. It now has many more functionalities next to just providing this parser.</p>
<p>The current release (0.8.4-1 on CRAN: <a href="https://cran.r-project.org/package=udpipe">https://cran.r-project.org/package=udpipe</a>) makes sure default models which are used are the ones trained on version 2.5 of universal dependencies. Other features of the release are detailed in the <a href="https://github.com/bnosac/udpipe/blob/master/NEWS.md">NEWS item</a>. This is what dependency parsing looks like on some sample text.</p>
<pre>library(udpipe)<br />x &lt;- udpipe("The package provides a dependency parsers built on data from universaldependencies.org", "english")<br />View(x)<br />library(ggraph)<br />library(ggplot2)<br />library(igraph)<br />library(textplot)<br />plt &lt;- textplot_dependencyparser(x, size = 4, title = "udpipe R package - dependency parsing")<br />plt</pre>
<p><img src="http://www.bnosac.be/images/bnosac/blog/udpipe-parser-plot.png" alt="udpipe parser plot" width="700" height="357" style="display: block; margin: 10px auto;" /></p>
<p style="text-align: justify;">During the years, the toolkit has now also incorporated many functionalities for commonly used data manipulations on texts which are enriched with the output of the parser.&nbsp; Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations,&nbsp; information retrieval metrics, handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.</p>
<h3>Many add-on R packages</h3>
<p>The udpipe package is loosely coupled with other NLP R packages which <a href="http://www.bnosac.be">BNOSAC </a>released in the last 4 years on CRAN. Loosely coupled means that none of the packages have hard dependencies of one another making it easy to install and maintain and allowing you to use only the packages and tools that you want.</p>
<blockquote>
<p>Hereby a small list of loosely coupled packages by <a href="http://www.bnosac.be">BNOSAC </a>which contain functions and documentation where the udpipe package is used as a preprocessing step.</p>
</blockquote>
<p>- <a href="https://CRAN.R-project.org/package=BTM">BTM</a>: Biterm Topic Modelling <br />- <a href="https://CRAN.R-project.org/package=crfsuite">crfsuite</a>: Build named entity recognition models using conditional random fields<br />- <a href="https://CRAN.R-project.org/package=nametagger">nametagger</a>: Build named entity recognition models using markov models<br />- torch.ner: Named Entity Recognition using torch<br />- <a href="https://CRAN.R-project.org/package=word2vec">word2vec</a>: Training and applying the word2vec algorithm<br />- <a href="https://CRAN.R-project.org/package=ruimtehol">ruimtehol:</a> Text embedding techniques using Starspace<br />- <a href="https://CRAN.R-project.org/package=textrank">textrank</a>: Text summarisation and keyword detection using textrank<br />- brown: Brown word clustering on texts<br />- <a href="https://CRAN.R-project.org/package=sentencepiece">sentencepiece</a>: Byte Pair Encoding and Unigram tokenisation using sentencepiece<br />- <a href="https://CRAN.R-project.org/package=tokenizers.bpe">tokenizers.bpe</a>: Byte Pair Encoding tokenisation using YouTokenToMe<br />- <a href="https://CRAN.R-project.org/package=text.alignment">text.alignment</a>: Find text similarities using Smith-Waterman <br />- <a href="https://CRAN.R-project.org/package=textplot">textplot</a>: Visualise complex relations in texts</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/textplot-example.png" alt="textplot example" width="500" height="261" style="display: block; margin-left: auto; margin-right: auto;" /></p>
<h3>Model building example</h3>
<p>To showcase the loose integration, let's use the udpipe package alongside the word2vec package to build a udpipe model by ourselves on the German GSD treebank which is described at <a href="https://universaldependencies.org/treebanks/de_gsd/index.html">https://universaldependencies.org/treebanks/de_gsd/index.html</a> and contains a set of CC BY-SA licensed annotated texts from news articles, wiki entries and reviews.<br />More information at <a href="https://universaldependencies.org">https://universaldependencies.org</a>.</p>
<h4>Download the treebank.</h4>
<pre>library(utils)<br />settings &lt;- list()<br />settings$ud.train&nbsp;&nbsp;&nbsp; &lt;- "<a href="https://raw.githubusercontent.com/UniversalDependencies/UD_German-GSD/r2.6/de_gsd-ud-train.conllu">https://raw.githubusercontent.com/UniversalDependencies/UD_German-GSD/r2.6/de_gsd-ud-train.conllu</a>"<br />settings$ud.dev&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;- "<a href="https://raw.githubusercontent.com/UniversalDependencies/UD_German-GSD/r2.6/de_gsd-ud-dev.conllu">https://raw.githubusercontent.com/UniversalDependencies/UD_German-GSD/r2.6/de_gsd-ud-dev.conllu</a>"<br />settings$ud.test&nbsp;&nbsp;&nbsp;&nbsp; &lt;- "<a href="https://raw.githubusercontent.com/UniversalDependencies/UD_German-GSD/r2.6/de_gsd-ud-test.conllu">https://raw.githubusercontent.com/UniversalDependencies/UD_German-GSD/r2.6/de_gsd-ud-test.conllu</a>"<br />## Download the conllu files<br />download.file(url = settings$ud.train, destfile = "train.conllu")<br />download.file(url = settings$ud.dev,&nbsp;&nbsp; destfile = "dev.conllu")<br />download.file(url = settings$ud.test,&nbsp; destfile = "test.conllu")</pre>
<h4>Build a word2vec model using out R package <a href="https://cran.r-project.org/package=word2vec">word2vec</a></h4>
<ul>
<li>Create wordvectors on the downloaded training dataset as these are used for training the dependency parser</li>
<li>Save the word vectors to disk</li>
<li>Inspect a bit the word2vec model by showing similarities to some German words</li>
</ul>
<pre>library(udpipe)<br />library(word2vec)<br />txt &lt;- udpipe_read_conllu("train.conllu")<br />txt &lt;- paste.data.frame(txt, term = "token", group = c("doc_id", "paragraph_id", "sentence_id"), collapse = " ")<br />txt &lt;- txt$token<br />w2v &lt;- word2vec(txt, type = "skip-gram", dim = 50, window = 10, min_count = 2, negative = 5, iter = 15, threads = 1)<br />write.word2vec(w2v, file = "wordvectors.vec", type = "txt", encoding = "UTF-8")<br />predict(w2v, c("gut", "freundlich"), type = "nearest", top = 20)</pre>
<h4>And train the model</h4>
<ul>
<li>Using the hyperparameters for the tokeniser, parts of speech tagger &amp; lemmatizer and the dependency parser as shown here: <a href="https://github.com/bnosac/udpipe/tree/master/inst/models-ud-2.5">https://github.com/bnosac/udpipe/tree/master/inst/models-ud-2.5</a></li>
<li>Note that model training <span style="text-decoration: underline;">takes a while (8hours up to 3days)</span> depending on the size of the treebank and your hyperparameter settings. This example was run on a Windows i5 CPU laptop with 1.7Ghz, so no GPU needed, which makes this model building process still accessible for anyone with a simple PC.</li>
</ul>
<pre>print(Sys.time())<br />m &lt;- udpipe_train(file = "de_gsd-ud-2.6-20200924.udpipe", <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; files_conllu_training = "train.conllu", <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; files_conllu_holdout&nbsp; = "dev.conllu",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; annotation_tokenizer = list(dimension = 64, epochs = 100, segment_size=200, initialization_range = 0.1, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; batch_size = 50, learning_rate = 0.002, learning_rate_final=0, dropout = 0.1, <br />                                              early_stopping = 1),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; annotation_tagger = list(models = 2, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; templates_1 = "lemmatizer", <br />                                           guesser_suffix_rules_1 = 8, guesser_enrich_dictionary_1 = 4, <br />                                           guesser_prefixes_max_1 = 4, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; use_lemma_1 = 1,provide_lemma_1 = 1, use_xpostag_1 = 0, provide_xpostag_1 = 0, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; use_feats_1 = 0, provide_feats_1 = 0, prune_features_1 = 1, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; templates_2 = "tagger", <br />                                           guesser_suffix_rules_2 = 8, guesser_enrich_dictionary_2 = 4, <br />                                           guesser_prefixes_max_2 = 0, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; use_lemma_2 = 1, provide_lemma_2 = 0, use_xpostag_2 = 1, provide_xpostag_2 = 1, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; use_feats_2 = 1, provide_feats_2 = 1, prune_features_2 = 1),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; annotation_parser = list(iterations = 30, <br />                                           embedding_upostag = 20, embedding_feats = 20, embedding_xpostag = 0, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; embedding_form = 50, embedding_form_file = "wordvectors.vec", <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; embedding_lemma = 0, embedding_deprel = 20, learning_rate = 0.01, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; learning_rate_final = 0.001, l2 = 0.5, hidden_layer = 200, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; batch_size = 10, transition_system = "projective", transition_oracle = "dynamic", <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; structured_interval = 8))<br />print(Sys.time())</pre>
<p>You can see the logs of this run <a href="https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-train.html#Example_on_UD_26_on_German_GSD">here</a>. Now your model is ready, you can use it on your own terms and you can start using it to annotate your text.</p>
<pre>model &lt;- udpipe_load_model("de_gsd-ud-2.6-20200924.udpipe")<br />texts &lt;- data.frame(doc_id = c("doc1", "doc2"), text = c("Die Wissenschaft ist das beste, was wir haben.", "Von dort war Kraftstoff in das Erdreich gesickert."), stringsAsFactors = FALSE)<br />anno  &lt;- udpipe(texts, model, trace = 10)<br />View(anno)</pre>
<p><img src="http://www.bnosac.be/images/bnosac/blog/udpipe-parser-table.png" alt="udpipe parser table" style="display: block; margin-left: auto; margin-right: auto;" /></p>
<p>Enjoy!</p>
<blockquote>
<p>Thanks to&nbsp;Slav Petrov, Wolfgang Seeker, Ryan McDonald, Joakim Nivre, Daniel Zeman, Adriane Boyd for creating and distributing the&nbsp;<a href="https://github.com/UniversalDependencies/UD_German-GSD">UD_German-GSD treebank</a> and to the UDPipe authors in particular Milan Straka.</p>
</blockquote>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Mon, 12 Oct 2020 19:18:49 +0000</pubDate>
		</item>
		<item>
			<title>finding contour lines</title>
			<link>http://www.bnosac.be/index.php/blog/101-finding-contour-lines</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/101-finding-contour-lines</guid>
			<description><![CDATA[<p>Finally, the R package you all have been waiting for has arrived - <a href="https://cran.r-project.org/package=image.ContourDetector">image.ContourDetector</a> developed at <a href="https://github.com/bnosac/image">https://github.com/bnosac/image</a>. It detects contour lines in images alongside the 'Unsupervised Smooth Contour Detection' algorithm available at <a href="http://www.ipol.im/pub/art/2016/175/">http://www.ipol.im/pub/art/2016/175</a>.</p>
<p>Have you always wanted to be able to draw like you are in art school? Let me show how to quickly do this.</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/example-contourlines.png" alt="example contourlines" style="margin: 10px auto; display: block;" /></p>
<p>If you want to reproduce this, the following snippets show how. Steps are as follows</p>
<h3>1. Install the packages from CRAN</h3>
<pre>install.packages("image.ContourDetector")<br />install.packages("magick")<br />install.packages("sp")</pre>
<h3>2. Get an image, put it into grey scale, pass the pixels to the function an off you go.</h3>
<pre>library(magick)<br />library(image.ContourDetector)<br />library(sp)<br />img &lt;- image_read("<a href="https://cdn.mos.cms.futurecdn.net/9sUwFGNJvviJks7jNQ7AWc-1200-80.jpg&quot;">https://cdn.mos.cms.futurecdn.net/9sUwFGNJvviJks7jNQ7AWc-1200-80.jpg"</a>)<br />mat &lt;- image_data(img, channels = "gray")<br />mat &lt;- as.integer(mat, transpose = TRUE)<br />mat &lt;- drop(mat)<br />contourlines &lt;- image_contour_detector(mat)<br />plt &lt;- plot(contourlines)<br />class(plt)<br /><br /><img src="http://www.bnosac.be/images/bnosac/blog/example-contourlines-linesonly.png" alt="example contourlines linesonly" width="700" height="403" style="margin: 10px auto; display: block;" /></pre>
<h3>3.&nbsp;If you want to have the same image as shown at the top of the article:</h3>
<p>Put the 3 images (original, combined, contour lines only) together in 1 plot using the excellent <a href="https://cran.r-project.org/web/packages/magick/vignettes/intro.html">magick R package</a>:</p>
<pre>plt &lt;- image_graph(width = image_info(img)$width, height = image_info(img)$height)<br />plot(contourlines)<br />dev.off()</pre>
<pre>plt_combined &lt;- image_graph(width = image_info(img)$width, height = image_info(img)$height)<br />plot(img)<br />plot(contourlines, add = TRUE, col = "red", lwd = 5)<br />dev.off()</pre>
<pre>combi &lt;- image_append(c(img, plt_combined, plt))<br />combi<br />image_write(combi, "example-contourlines.png", format = "png")</pre>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Mon, 03 Aug 2020 19:01:35 +0000</pubDate>
		</item>
		<item>
			<title>word2vec in R</title>
			<link>http://www.bnosac.be/index.php/blog/100-word2vec-in-r</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/100-word2vec-in-r</guid>
			<description><![CDATA[<p>Learn how to apply word2vec in R on your text in this pdf presentation available at&nbsp;<a href="http://www.bnosac.be/index.php/blog/100-word2vec-in-R">https://www.bnosac.be/index.php/blog/100-word2vec-in-R</a>. Where we focus on our R package word2vec available at <a href="https://github.com/bnosac/word2vec">https://github.com/bnosac/word2vec</a></p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/word2vec-github.png" alt="word2vec github" style="margin: 10px auto; display: block;" />You can view the presentation below.</p>
<p style="text-align: justify;"><span style="color: #808000;"><strong>NEW, since 2020, you can now access courses Text Mining with R and Advanced R programming online through our online school,&nbsp;</strong><a href="http://www.bnosac.be/index.php/contact/get-in-touch" style="color: #808000;">let us know here</a><strong>&nbsp;if you want to obtain access.</strong></span></p>
<p>{aridoc engine="pdfjs" width="100%" height="550"}images/bnosac/blog/R_word2vec.pdf{/aridoc}</p>
<ul>
<li><strong>In case you are reading this at a blog aggregator, head to <a href="http://www.bnosac.be/index.php/blog/100-word2vec-in-R">https://www.bnosac.be/index.php/blog/100-word2vec-in-R</a> directly to see the presentation. </strong></li>
<li>If you are reading this, you might as well be interested in applying doc2vec: head to&nbsp;<a href="http://www.bnosac.be/index.php/blog/103-doc2vec-in-r">http://www.bnosac.be/index.php/blog/103-doc2vec-in-r</a>&nbsp;for details.</li>
</ul>
<p><br />Enjoy.</p>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Wed, 01 Jul 2020 21:05:30 +0000</pubDate>
		</item>
		<item>
			<title>Text Plots</title>
			<link>http://www.bnosac.be/index.php/blog/99-text-plots</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/99-text-plots</guid>
			<description><![CDATA[<p>A few weeks ago, we pushed R package <a href="https://CRAN.R-project.org/package=textplot">textplot</a>&nbsp;to CRAN and it was accepted for release last week.&nbsp;The package contains straightforward functionalities for the visualisation of text, namely of&nbsp;</p>
<ul>
<li>text cooccurences</li>
<li>text clusters (in casu biterm clusters)</li>
<li>dependency parsing results</li>
<li>text correlations and&nbsp;text frequencies</li>
</ul>
<p>Some examples of these plots are shown in the gif.</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/examples-textplot.gif" width="700" height="420" alt="examples textplot" style="margin: 10px auto; display: block;" /></p>
<p>More details can be found in the pdf presentation shown below.</p>
<p>{aridoc engine="pdfjs" width="100%" height="550"}images/bnosac/blog/textplot-examples.pdf{/aridoc}</p>
<p><br />Enjoy.</p>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Sat, 02 May 2020 19:37:38 +0000</pubDate>
		</item>
		<item>
			<title>Biterm topic modelling for short texts</title>
			<link>http://www.bnosac.be/index.php/blog/98-biterm-topic-modelling-for-short-texts</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/98-biterm-topic-modelling-for-short-texts</guid>
			<description><![CDATA[<p>A few weeks ago, we published an update of the <a href="https://CRAN.R-project.org/package=BTM">BTM</a> (Biterm Topic Models for text) package on <a href="https://cran.r-project.org">CRAN</a>.</p>
<p><strong>Biterm Topic Models are especially usefull if you want to find topics in collections of short texts</strong>. Short texts are typically a twitter message, a short answer on a survey, the title of an email, search questions, ... . For these types of short texts traditional topic models like Latent Dirichlet Allocation are less suited as most information is available in short word combinations. The R package <a href="https://CRAN.R-project.org/package=BTM">BTM</a> finds topics in such short texts by <strong>explicitely modelling word-word co-occurrences (biterms) in a short window</strong>.</p>
<p>The update which was pushed to CRAN a few weeks ago now allows to explicitely provide a set of biterms to cluster upon. Let us show an example on clustering a subset of R package descriptions on CRAN. The resulting cluster visualisation looks like this.</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/biterm-topic-model-example.png" alt="biterm topic model example" style="margin-top: 5px; margin-right: auto; margin-bottom: 5px; display: block;" /></p>
<p>If you want to reproduce this, the following snippets show how to do this. Steps are as follows</p>
<h3>1. Get some data of R packages and their description in plain text</h3>
<pre>## Get list of packages in the NLP/Machine Learning Task Views<br />library(ctv)<br />pkgs &lt;- available.views()<br />names(pkgs) &lt;- sapply(pkgs, FUN=function(x) x$name)<br />pkgs &lt;- c(pkgs$NaturalLanguageProcessing$packagelist$name, pkgs$MachineLearning$packagelist$name)<br /><br />## Get package descriptions of these packages<br />library(tools)<br />x &lt;- CRAN_package_db()<br />x &lt;- x[, c("Package", "Title", "Description")]<br />x$doc_id &lt;- x$Package<br />x$text&nbsp;&nbsp; &lt;- tolower(paste(x$Title, x$Description, sep = "\n"))<br />x$text&nbsp;&nbsp; &lt;- gsub("'", "", x$text)<br />x$text&nbsp;&nbsp; &lt;- gsub("&lt;.+&gt;", "", x$text)<br />x        &lt;- subset(x, Package %in% pkgs)</pre>
<h3>2. Use the <a href="https://CRAN.R-project.org/package=udpipe">udpipe R package</a> to perform Parts of Speech tagging on the package title and descriptions and use udpipe as well for extracting cooccurrences of nouns, adjectives and verbs within 3 words distance.</h3>
<pre>library(udpipe)<br />library(data.table)<br />library(stopwords)<br />anno    &lt;- udpipe(x, "english", trace = 10)<br />biterms &lt;- as.data.table(anno)<br />biterms &lt;- biterms[, cooccurrence(x = lemma,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; relevant = upos %in% c("NOUN", "ADJ", "VERB") &amp; <br />                                             nchar(lemma) &gt; 2 &amp; !lemma %in% stopwords("en"),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; skipgram = 3),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; by = list(doc_id)]</pre>
<h3>3. Build the biterm topic model with 9 topics and provide the set of biterms to cluster upon</h3>
<pre>library(BTM)<br />set.seed(123456)<br />traindata &lt;- subset(anno, upos %in% c("NOUN", "ADJ", "VERB") &amp; !lemma %in% stopwords("en") &amp; nchar(lemma) &gt; 2)<br />traindata &lt;- traindata[, c("doc_id", "lemma")]<br />model&nbsp;&nbsp;&nbsp;&nbsp; &lt;- BTM(traindata, biterms = biterms, k = 9, iter = 2000, background = TRUE, trace = 100)</pre>
<h3>4. Visualise the biterm topic clusters using the textplot package available at <a href="https://github.com/bnosac/textplot">https://github.com/bnosac/textplot</a>. This creates the plot show above.</h3>
<pre>library(textplot)<br />library(ggraph)<br />plot(model, top_n = 10,<br />&nbsp;&nbsp;&nbsp;&nbsp; title = "BTM model", subtitle = "R packages in the NLP/Machine Learning task views",<br />&nbsp;&nbsp;&nbsp;&nbsp; labels = c("Garbage", "Neural Nets / Deep Learning", "Topic modelling", <br />                "Regression/Classification Trees/Forests", "Gradient Descent/Boosting", <br />                "GLM/GAM/Penalised Models", "NLP / Tokenisation",<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "Text Mining Frameworks / API's", "Variable Selection in High Dimensions"))</pre>
<p>&nbsp;</p>
<p>Enjoy!</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Mon, 13 Apr 2020 15:00:32 +0000</pubDate>
		</item>
		<item>
			<title>Corona in Belgium</title>
			<link>http://www.bnosac.be/index.php/blog/97-corona-in-belgium</link>
			<guid isPermaLink="true">http://www.bnosac.be/index.php/blog/97-corona-in-belgium</guid>
			<description><![CDATA[<p>I lost a few hours this afternoon when digging into the Corona virus data mainly caused by reading this article at this&nbsp;<a href="https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca">website</a>&nbsp;which gives a nice view on how to be aware of potential issues which can arise when collecting data and to be aware of hidden factors and it also shows Belgium.</p>
<ul>
<li>As a Belgian, I was interested to see how Corona might impact our lives in the next weeks and out of curiosity I was interested to see how we are doing compared to other countries regarding containment of the Corona virus outspread - especially since we still do not have a government in Belgium after elections 1 year ago.&nbsp;</li>
<li>In what follows, I'll be showing some graphs using data available at&nbsp;<a href="https://github.com/CSSEGISandData/COVID-19">https://github.com/CSSEGISandData/COVID-19</a>&nbsp;(it provides up-to-date statistics on Corona cases). If you want to reproduce this, pull the repository and just execute the following R code shown.</li>
</ul>
<h3>Data</h3>
<p>Let's see first if the data is exactly what is shown at our National Television.</p>
<pre><code>library(data.table)</code><br /><code>library(lattice)</code><br /><code>x &lt;- list.files("csse_covid_19_data/csse_covid_19_daily_reports/", pattern = ".csv", full.names = TRUE)</code><br /><code>x &lt;- data.frame(file = x, date = substr(basename(x), 1, 10), stringsAsFactors = FALSE)</code><br /><code>x &lt;- split(x$file, x$date)</code><br /><code>x &lt;- lapply(x, fread)</code><br /><code>x &lt;- rbindlist(x, fill = TRUE, idcol = "date")</code><br /><code>x$date &lt;- as.Date(x$date, format = "%m-%d-%Y")</code><br /><code>x &lt;- setnames(x,&nbsp;</code><br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; old = c("date", "Country/Region", "Province/State", "Confirmed", "Deaths", "Recovered"),</code><br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; new = c("date", "region", "subregion", "confirmed", "death", "recovered"))</code><br /><code>x &lt;- subset(x, subregion %in% "Hubei" | <br />               region %in% c("Belgium", "France", "Netherlands", "Spain", "Singapore", "Germany", "Switzerland", "Italy"))</code><br /><code>x$area &lt;- ifelse(x$subregion %in% "Hubei", x$subregion, x$region)</code><br /><code>x &lt;- x[!duplicated(x, by = c("date", "area")), ]</code><br /><code>x &lt;- x[, c("date", "area", "confirmed", "death", "recovered")]<br />subset(x, area %in% "Belgium" &amp; confirmed &gt; 1)<br /></code></pre>
<p><span style="font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">Yes, the data from </span><a href="https://github.com/CSSEGISandData/COVID-19" style="font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">https://github.com/CSSEGISandData/COVID-19</a><span style="font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">&nbsp;&nbsp;</span><span style="font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">looks correct indeed. Same numbers as reported on the Belgian Television.&nbsp;</span></p>
<table>
<thead>
<tr><th style="text-align: left;">date</th><th style="text-align: left;">area</th><th style="text-align: right;">confirmed</th><th style="text-align: right;">death</th><th style="text-align: right;">recovered</th></tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">2020-03-01</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-02</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-03</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">13</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-04</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">23</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-05</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">50</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-06</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">109</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-07</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">169</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-08</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">200</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-09</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">239</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-10</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">267</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-11</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">314</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">1</td>
</tr>
</tbody>
</table>
<h3>Exponential number of cases of Corona</h3>
<ul>
<li>Now is the outbreak really <strong>exponential</strong>? Let's make some graphs.</li>
</ul>
<p>What is clear when looking at the plots is that indeed infections happen at a exponential scale except in Singapore where the government managed to completely isolate the Corona cases, while in Belgium and other European countries the government lacked the opportunity to isolate the Corona cases and we are now in a phase of trying to slow down to reduce and spread the impact.</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/corona1.png" alt="corona1" /></p>
<p>You can reproduce the plot as follows</p>
<pre>trellis.par.set(strip.background = list(col = "lightgrey"))<br />xyplot(confirmed ~ date | area, data = x, type = "b", pch = 20,&nbsp;<br />       scales = list(y = list(relation = "free", rot = 0), x = list(rot = 45, format = "%A %d/%m")),&nbsp;<br />       layout = c(5, 2),&nbsp;main = sprintf("Confirmed cases of Corona\n(last date in this graph is %s)", max(x$date)))</pre>
<h3 style="font-size: 13px;"><span style="font-size: 1.5em;">Compare to other countries - onset</span></h3>
<p>It is clear that the onset of Corona is different in each country. Let's define the onset (day 0) as the day where 75 persons had Corona in the country. That will allow us to compare different countries. In Belgium we started to have more than 75 patients with Corona on Friday 2020-03-06.&nbsp; In the Netherlands that was one day earlier.&nbsp;</p>
<table>
<thead>
<tr><th style="text-align: left;">date</th><th style="text-align: left;">area</th><th style="text-align: right;">confirmed</th></tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">2020-01-22</td>
<td style="text-align: left;">Hubei</td>
<td style="text-align: right;">444</td>
</tr>
<tr>
<td style="text-align: left;">2020-02-17</td>
<td style="text-align: left;">Singapore</td>
<td style="text-align: right;">77</td>
</tr>
<tr>
<td style="text-align: left;">2020-02-23</td>
<td style="text-align: left;">Italy</td>
<td style="text-align: right;">155</td>
</tr>
<tr>
<td style="text-align: left;">2020-02-29</td>
<td style="text-align: left;">Germany</td>
<td style="text-align: right;">79</td>
</tr>
<tr>
<td style="text-align: left;">2020-02-29</td>
<td style="text-align: left;">France</td>
<td style="text-align: right;">100</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-01</td>
<td style="text-align: left;">Spain</td>
<td style="text-align: right;">84</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-04</td>
<td style="text-align: left;">Switzerland</td>
<td style="text-align: right;">90</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-05</td>
<td style="text-align: left;">Netherlands</td>
<td style="text-align: right;">82</td>
</tr>
<tr>
<td style="text-align: left;">2020-03-06</td>
<td style="text-align: left;">Belgium</td>
<td style="text-align: right;">109</td>
</tr>
</tbody>
</table>
<p>Reproduce as follows:</p>
<pre>x &lt;- x[order(x$date, x$area, decreasing = TRUE), ]<br />x &lt;- x[, days_since_case_onset := as.integer(date - min(date[confirmed &gt; 75])), by = list(area)]<br />x &lt;- x[, newly_confirmed := as.integer(confirmed - shift(confirmed, n = 1, type = "lead")), by = list(area)]<br />onset &lt;- subset(x, days_since_case_onset == 0, select = c("date", "area", "confirmed"))<br />onset[order(onset$date), ]</pre>
<h3><span style="font-size: 1.5em;">Compare to other countries - what can we expect?</span></h3>
<ul>
<li>Now are we doing better than other countries in the EU?</li>
</ul>
<p>Following plot shows the log of the number of people diagnosed as having Corona since the onset date shown above. It looks like Belgium has learned a bit from the issues in Italy but it still hasn't learned the way to deal with the virus outbreak the same as e.g. Singapore has done (a country which learned from the SARS outbreak).</p>
<p>Based on the blue line, <strong>we can expect Belgium to have next week between roughly 1100 confirmed cases (log(1100)=7) or if we follow the trend of France that would be roughly 3000&nbsp;(log(3000)=8) patients with Corona</strong>. We hope that it is only the first.</p>
<p><span style="text-align: center;"><img src="http://www.bnosac.be/images/bnosac/blog/corona2.png" alt="corona2" width="800" height="615" style="display: block; margin-left: auto; margin-right: auto;" />&nbsp;</span></p>
<p>Reproduce as follows:</p>
<pre>xyplot(log(confirmed) ~ days_since_case_onset | "Log(confirmed cases) of Corona since onset of sick person nr 75", <br /> groups = area,<br /> data = subset(x, days_since_case_onset &gt;= 0 &amp; <br />                  area %in% c("Hubei", "France", "Belgium", "Singapore", "Netherlands", "Italy")), <br /> xlab = "Days since Corona onset (confirmed case 75)", ylab = "Log of number of confirmed cases",<br /> auto.key = list(space = "right", lines = TRUE),<br /> type = "b", pch = 20, lwd = 2)&nbsp;</pre>
<h3><span style="font-size: 1.17em;">Compared to the Netherlands</span></h3>
<ul>
<li>Now, are we doing better than The Netherlands?</li>
</ul>
<p>Currently it looks like we are. But time will tell. Given the trend shown above, I can only hope everyone in Belgium follows the government guidelines as strict as possible.</p>
<p><img src="http://www.bnosac.be/images/bnosac/blog/corona3.png" alt="corona3" style="display: block; margin-left: auto; margin-right: auto;" /></p>
<p>&nbsp;</p>
<p>Reproduce as follows:</p>
<pre>xyplot(newly_confirmed ~ date | "Newly confirmed cases of Corona", groups = area,<br /> data = subset(x, area %in% c("Belgium", "Netherlands") &amp; date &gt; as.Date("2020-03-01")), <br /> xlab = "Date", ylab = "Number of new Corona cases",<br /> scales = list(x = list(rot = 45, format = "%A %d/%m", at = seq(as.Date("2020-03-01"), Sys.Date(), by = "day"))), <br /> auto.key = list(space = "right", lines = TRUE),<br /> type = "b", pch = 20, lwd = 2)</pre>]]></description>
			<author>info@bnosac.be (Super User)</author>
			<category>Blog</category>
			<pubDate>Thu, 12 Mar 2020 14:14:36 +0000</pubDate>
		</item>
	</channel>
</rss>
