<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-bloggers</title>
	<atom:link href="https://www.r-bloggers.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.r-bloggers.com</link>
	<description>R news and tutorials contributed by hundreds of R bloggers</description>
	<lastBuildDate>Fri, 22 May 2026 13:34:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.5.18</generator>

<image>
	<url>https://i0.wp.com/www.r-bloggers.com/wp-content/uploads/2016/08/cropped-R_single_01-200.png?fit=32%2C32&#038;ssl=1</url>
	<title>R-bloggers</title>
	<link>https://www.r-bloggers.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">11524731</site>	<item>
		<title>Repost: ctrlvee: Extract external R code and insert inline</title>
		<link>https://www.r-bloggers.com/2026/05/repost-ctrlvee-extract-external-r-code-and-insert-inline/</link>
		
		<dc:creator><![CDATA[Stephen Turner]]></dc:creator>
		<pubDate>Fri, 22 May 2026 13:34:48 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=51c6de9f15cc0593fa890ee28b39bc84</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Reposted from the original at https://blog.stephenturner.us/p/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin. Ever find yourself looking through a pkgdown page or a Quarto<br />
book, copying and pasting code chunks from your brow...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/repost-ctrlvee-extract-external-r-code-and-insert-inline/">Repost: ctrlvee: Extract external R code and insert inline</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://gettinggeneticsdone.blogspot.com/2026/05/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin.html"> Getting Genetics Done</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><mark><b>Reposted from the original at <a href="https://blog.stephenturner.us/p/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin" rel="nofollow" target="_blank">https://blog.stephenturner.us/p/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin</a>.</b></mark> </p><p><span>Ever find yourself looking through a pkgdown page or a Quarto 
book, copying and pasting code chunks from your browser into your IDE? I
 do, and it’s a minor annoyance.</span><span data-state="closed" style="min-width: 0px;"><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" href="https://blog.stephenturner.us/p/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin#footnote-1" id="footnote-anchor-1" rel="nofollow" target="_blank">1</a></span></p><p><span>My friend and colleague VP Nagraj published a new R package called </span><strong>ctrlvee</strong><span> that makes this a lot easier.</span></p><ul><li><p><strong><span>CRAN: </span><a href="https://cran.r-project.org/package=ctrlvee" rel="nofollow" target="_blank">https://cran.r-project.org/package=ctrlvee</a></strong></p></li><li><p><strong><span>GitHub: </span><a href="https://github.com/vpnagraj/ctrlvee" rel="nofollow" target="_blank">https://github.com/vpnagraj/ctrlvee</a></strong></p></li></ul><p><span>It
 does one thing. Put your cursor anywhere in an R script in Positron or 
RStudio, call the add-in, provide a URL, and a few milliseconds later 
you’ll have all the code from that page in your editor, separated by 
chunk boundaries (along with some metadata and a note to </span><a href="https://blog.stephenturner.us/p/pick-a-license-not-any-license" rel="nofollow" target="_blank">check the license!</a><span>).</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img can-restack" data-component-name="Image2ToDOM" href="https://substackcdn.com/image/fetch/$s_!7WR6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfbd671-9318-421e-93b9-8ea4f5ef9e9a_1410x782.png" rel="nofollow" target="_blank"><div class="image2-inset"><picture><source type="image/webp"></source><img loading="lazy" alt="" class="sizing-large" data-attrs="{"src":"https://substack-post-media.s3.amazonaws.com/public/images/dbfbd671-9318-421e-93b9-8ea4f5ef9e9a_1410x782.png","srcNoWatermark":null,"fullscreen":false,"imageSize":"large","height":782,"width":1410,"resizeWidth":1200,"bytes":182436,"alt":null,"title":null,"type":"image/png","href":null,"belowTheFold":false,"topImage":true,"internalRedirect":"https://blog.stephenturner.us/i/197973095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfbd671-9318-421e-93b9-8ea4f5ef9e9a_1410x782.png","isProcessing":false,"align":"center","offset":false}" height="665 .531914893617" src="https://i1.wp.com/substackcdn.com/image/fetch/$s_!7WR6!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfbd671-9318-421e-93b9-8ea4f5ef9e9a_1410x782.png?resize=450%2C665&#038;ssl=1" width="450" data-recalc-dims="1" /></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"></div></div></div></a></figure></div><p><span>The package README provides a demonstration using the “Data Validation and QA” chapter of my </span><em>Data Science Team Training</em><span> book (</span><strong><a href="https://dstt.stephenturner.us/" rel="nofollow" target="_blank">dstt.stephenturner.us</a></strong><span>).</span></p><ol><li><p><span>Install the package: </span><code>install.packages(&quot;ctrlvee&quot;)</code></p></li><li><p><span>Run the add-in. In Positron you’ll open the command palette, search for Run RStudio Addin, then </span><em>extract external R code and insert inline</em><span>. You’ll get a modal asking you for a URL. </span></p></li><li><p><span>Paste one in. E.g., </span><strong>https://dstt.stephenturner.us/validation.html</strong></p></li><li><p>The R code from the website appears in your editor <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p></li></ol><p>Here’s a demo.</p><div class="videoScrollTarget-SzB20Y" data-component-name="VideoEmbedPlayer" id="media-32b2a9b2-d551-43a0-bffc-647ab75b032e"><div class="videoEmbed-_FycLU"><div aria-label="Video player" class="with-preview video-player-with-background video-player-wrapper" role="region"><div class="video-player video-player video-player-with-background videoPlayer-vlcedM" style="padding-bottom: 60%;"><video class="video-P2qgwZ" controls="" crossorigin="anonymous" data-video-id="32b2a9b2-d551-43a0-bffc-647ab75b032e" poster="https://substack-video.s3.amazonaws.com/video_upload/post/197973095/32b2a9b2-d551-43a0-bffc-647ab75b032e/transcoded-00001.png?refresh=Fri May 22 2026 09:32:43 GMT-0400 (Eastern Daylight Time)" preload="metadata"></video><div class="pencraft pc-position-absolute pc-reset buttonContainer-tH3LP9 video-player-button"></div></div></div></div></div><div class="subscribe-widget is-signed-up is-fully-subscribed" data-component-name="SubscribeWidget"><div class="pencraft pc-reset button-wrapper"><div class="pencraft pc-display-flex pc-justifyContent-center pc-reset"></div></div></div><p><span>Here’s what the extracted/inserted code looks like, from </span><a href="https://dstt.stephenturner.us/validation.html" rel="nofollow" target="_blank">this source</a><span>.</span></p><pre># -----------------------------------------------------------------
# Chunks fetched by ctrlvee from: https://dstt.stephenturner.us/validation.html
# Strategy: Rendered HTML page
# Date: 2026-05-16 05:14:44
# Chunks: 8
# NOTE: Check the source license before reusing this code.
# -----------------------------------------------------------------

flu &lt;- data.frame(
    week = c(1, 2, 3, 4, 4),
    county = c(&quot;Fairfax&quot;, &quot;Arlington&quot;, NA, &quot;Loudoun&quot;, &quot;Loudoun&quot;),
    disease = c(&quot;Flu&quot;, &quot;Flu&quot;, &quot;Flu&quot;, &quot;Flu&quot;, &quot;Flu&quot;),
    cases = c(23, 41, 18, -5, 12),
    rate = c(2.1, 3.8, 1.6, NA, 1.1)
)

flu

# ---- chunk boundary ----

if (any(flu$cases &lt; 0, na.rm = TRUE)) {
    stop(&quot;Negative case counts detected. Inspect raw data before proceeding.&quot;)
}

# ---- chunk boundary ----

stopifnot(
    &quot;Negative case counts&quot; = all(flu$cases &gt;= 0, na.rm = TRUE),
    &quot;Missing county values&quot; = !anyNA(flu$county),
    &quot;Duplicate records&quot; = !anyDuplicated(flu[, c(&quot;week&quot;, &quot;county&quot;)])
)

# ---- chunk boundary ----

install.packages(&quot;pointblank&quot;)

# ---- chunk boundary ----

library(pointblank)

agent &lt;- create_agent(tbl = flu, label = &quot;Weekly flu surveillance&quot;) |&gt;
    col_vals_gte(
        columns = cases,
        value = 0,
        label = &quot;Case counts must be non-negative&quot;
    ) |&gt;
    col_vals_not_null(
        columns = c(week, county),
        label = &quot;Week and county cannot be missing&quot;
    ) |&gt;
    rows_distinct(
        columns = c(week, county),
        label = &quot;No duplicate week/county records&quot;
    ) |&gt;
    interrogate()

agent

# ---- chunk boundary ----

create_agent(tbl = flu, label = &quot;Weekly flu surveillance — extended&quot;) |&gt;
    col_is_numeric(
        columns = c(cases, rate),
        label = &quot;Case count and rate must be numeric&quot;
    ) |&gt;
    col_vals_in_set(
        columns = disease,
        set = c(&quot;Flu&quot;, &quot;COVID-19&quot;, &quot;RSV&quot;),
        label = &quot;Disease must be from the approved list&quot;
    ) |&gt;
    col_vals_between(
        columns = week,
        left = 1,
        right = 52,
        label = &quot;Week must be between 1 and 52&quot;
    ) |&gt;
    col_vals_gte(
        columns = rate,
        value = 0,
        na_pass = TRUE,
        label = &quot;Rate must be non-negative (NAs allowed)&quot;
    ) |&gt;
    interrogate()

# ---- chunk boundary ----

if (!all_passed(agent)) {
    stop(&quot;Data validation failed. Review the agent report before proceeding.&quot;)
}

# ---- chunk boundary ----

library(readr)
library(pointblank)

flu &lt;- read_csv(&quot;data/flu-2024.csv&quot;)

# Validate immediately after reading
agent &lt;- create_agent(tbl = flu, label = &quot;flu-2024 validation&quot;) |&gt;
    col_vals_gte(columns = cases, value = 0, label = &quot;No negative counts&quot;) |&gt;
    col_vals_not_null(columns = c(week, county), label = &quot;No missing keys&quot;) |&gt;
    rows_distinct(columns = c(week, county), label = &quot;No duplicate records&quot;) |&gt;
    interrogate()

if (!all_passed(agent)) {
    stop(&quot;Validation failed — see agent report above.&quot;)
}</pre><div class="subscribe-widget is-signed-up is-fully-subscribed" data-component-name="SubscribeWidget"><div class="pencraft pc-reset button-wrapper"><div class="pencraft pc-display-flex pc-justifyContent-center pc-reset"></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><br /></div><p> </p><div class="blogger-post-footer">Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution (CC BY) License.</div>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://gettinggeneticsdone.blogspot.com/2026/05/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin.html"> Getting Genetics Done</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/repost-ctrlvee-extract-external-r-code-and-insert-inline/">Repost: ctrlvee: Extract external R code and insert inline</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401398</post-id>	</item>
		<item>
		<title>Functions over Idioms &#8211; Writing R in Python with rfuns</title>
		<link>https://www.r-bloggers.com/2026/05/functions-over-idioms-writing-r-in-python-with-rfuns/</link>
		
		<dc:creator><![CDATA[Jonathan Carroll]]></dc:creator>
		<pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://jcarroll.com.au/2026/05/22/functions-over-idioms-rfuns/</guid>

					<description><![CDATA[<p>If you’ve read any of my past posts you know I like to program in several<br />
different languages, some of which I like more than others. Sometimes a problem<br />
calls for a particular language to be used, and with that comes adjusting one’s<br />
brain to thinking in that ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/functions-over-idioms-writing-r-in-python-with-rfuns/">Functions over Idioms – Writing R in Python with rfuns</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://jcarroll.com.au/2026/05/22/functions-over-idioms-rfuns/"> rstats on Irregularly Scheduled Programming</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>If you’ve read any of my past posts you know I like to program in several
different languages, some of which I like more than others. Sometimes a problem
calls for a particular language to be used, and with that comes adjusting one’s
brain to thinking in that language and using the appropriate idioms to leverage
that language’s features. But what if I don’t want to?</p>
<div class="float">
<img src="https://i2.wp.com/jcarroll.com.au/2026/05/22/functions-over-idioms-rfuns/images/IDontWantTo.gif?w=450&#038;ssl=1" alt="I don’t want to" data-recalc-dims="1" />
<div class="figcaption">I don’t want to</div>
</div>
<p>The line between R and Python has been heavily blurred the last few years,
particularly with <a href="https://rstudio.github.io/reticulate/" rel="nofollow" target="_blank">{reticulate}</a> enabling
us to use Python within R code, RStudio rebranding as <a href="https://posit.co/" rel="nofollow" target="_blank">Posit</a>
and taking on a strong Python development effort, releasing
<a href="https://posit.co/products/ide/positron" rel="nofollow" target="_blank">Positron</a> as a multi-language IDE, and
<a href="https://quarto.org/" rel="nofollow" target="_blank">Quarto</a> being a multi-language rethink of Rmarkdown.</p>
<p>I occasionally <em>need</em> to use Python directly &#8211; an SDK wrapping an API exists and
I don’t particularly want to spend a lot of time writing my own R version,
especially before I know what I want to get out of the endpoints. At this point
I tend to bump up against my muscle-memory from R and try to use functions I’m
familiar with from R, but which don’t actually exist in Python. Now, that might
sometimes be because the pattern I’m trying to encode simply has a different name
in Python; instead of an <code>sapply(x, f)</code></p>
<pre>sapply(c(2, 3, 4, 5), \(x) x ^ 2)
## [1]  4  9 16 25</pre>
<p>I should reach for <code>map</code>, in which case I am reminded that this produces a lazy
iterator that doesn’t show me the results</p>
<pre>map(lambda x: x ** 2, [2, 3, 4, 5])
## &lt;map object at 0x10d7fbee0&gt;</pre>
<p>and so I need to wrap it into a list to get the values out</p>
<pre>list(map(lambda x: x ** 2, [2, 3, 4, 5]))
## [4, 9, 16, 25]</pre>
<p>Or, I could use a list comprehension which <em>isn’t</em> lazy</p>
<pre>[v ** 2 for v in [2, 3, 4, 5]]
## [4, 9, 16, 25]</pre>
<p>That’s the <em>idiom</em> that I should be reaching for. Sure.</p>
<p>Other times there’s a package I need to use and a slightly different way of
approaching the problem. In R I love the <code>table()</code> function for getting
histogram-like counts of the unique values of a vector</p>
<pre>table(c(&quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;b&quot;, &quot;a&quot;))
## 
## a b c 
## 3 2 1</pre>
<p>which in Python looks like</p>
<pre>from collections import Counter

sorted(Counter([&quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;b&quot;, &quot;a&quot;]).items())
## [(&#39;a&#39;, 3), (&#39;b&#39;, 2), (&#39;c&#39;, 1)]</pre>
<p>Probably Pythonistas remember that idiom and the package to import and the
<code>.items()</code> extractor and the fact that they maybe want to sort the result. But I
kept coming back to a question I ask myself: <em>what if I don’t want to</em>? Why is
there not a function that wraps this idiom? If there was, why not just call it
“table”? Admittedly, it’s far from the catchiest, most memorable, or most useful
name, but it’s immediately recognisable to an R user (ditto for “sapply”).</p>
<p>One approach I considered here was to just call R from Python. That <em>can</em> be done,
but I doubt I or anyone else wants to deal with that every time we want to iterate
over a list. There’s a package on the Python package index which seems to support
this nicely: <a href="https://pypi.org/project/r-functions/" class="uri" rel="nofollow" target="_blank">https://pypi.org/project/r-functions/</a> but it’s wrappers around
individual R files, via RScript. I’m thinking more along the lines of ‘native
Python with an R interface’.</p>
<p>Python is an object-oriented language, but it <em>has</em> functions, so why not make one</p>
<pre>from collections import Counter

def table(x):
    return dict(sorted(Counter(x).items()))

table([&quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;b&quot;, &quot;a&quot;])
## {&#39;a&#39;: 3, &#39;b&#39;: 2, &#39;c&#39;: 1}
def sapply(x, func):
    return [func(v) for v in x]
  
sapply([2, 3, 4, 5], lambda x: x ** 2)
## [4, 9, 16, 25]</pre>
<p>and have a nicer function interface to apply these idioms? I thought about this
a bit longer, and realised there’s <strong>lots</strong> of functions I use in R that I wish
I could use in Python. An idiom for finding the index of elements of a ‘vector’
(list in Python) which are true (<code>TRUE</code> in R, <code>True</code> in Python) is</p>
<pre>[i for i, v in enumerate(x) if v]</pre>
<p>but I just want to call <code>which(x)</code></p>
<pre>which(c(FALSE, FALSE, TRUE, FALSE , TRUE))
## [1] 3 5</pre>
<p>so why not define this</p>
<pre>def which(x):
    return [i for i, v in enumerate(x) if v]
  
which([False, False, True, False, True])
## [2, 4]</pre>
<p>(remembering that Python is 0-indexed).</p>
<p>How far could one take this? Quite a long way!</p>
<p>I thought more about what differences would need to be accounted for, and one that
immediately came to mind was that R is vectorised. If I was to recreate R’s
character counting function <code>nchar(s)</code> as essentially <code>len(s)</code>, I’d need to consider
whether I wanted it to work on a single string or a ‘vector’ of strings</p>
<p>In R:</p>
<pre>nchar(c(&quot;these&quot;, &quot;all&quot;, &quot;have&quot;, &quot;different&quot;, &quot;lengths&quot;))
## [1] 5 3 4 9 7</pre>
<p>But in Python, <code>len()</code> expects a single value, so it calculates the length of
the list</p>
<pre>len([&quot;these&quot;, &quot;all&quot;, &quot;have&quot;, &quot;different&quot;, &quot;lengths&quot;])
## 5</pre>
<p>The ‘proper’ way to do it is to map over the list</p>
<pre>[len(s) for s in [&quot;these&quot;, &quot;all&quot;, &quot;have&quot;, &quot;different&quot;, &quot;lengths&quot;]]
## [5, 3, 4, 9, 7]</pre>
<p>but again, why do I need to use an idiom for this? What if I just made a decorator
to change a regular function to a vectorised one by applying this list
comprehension internally when it’s passed a list (or a tuple), and which otherwise
just evaluates the function with the argument?</p>
<pre>import functools

def make_vec(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        if isinstance(args[0], (list, tuple)):
            return [func(xi, *args[1:], **kwargs) for xi in args[0]]
        return func(*args, **kwargs)
    return wrapper

@make_vec
def my_len(s):
    return len(s)

my_len([&quot;these&quot;, &quot;all&quot;, &quot;have&quot;, &quot;different&quot;, &quot;lengths&quot;])
## [5, 3, 4, 9, 7]</pre>
<p>and I could name it… “nchar”!</p>
<p>The other use-case that came to mind was <a href="https://fosstodon.org/deck/@eliocamp@mastodon.social/116531644286276585" rel="nofollow" target="_blank">Elio venting</a>
(and referencing a post to which <a href="https://jcarroll.com.au/2025/12/05/haskell-is-a-great-language-for-data-science/" rel="nofollow" target="_blank">I also wrote a sort of response</a>)
that they needed to list the files in the current directory</p>
<blockquote class="mastodon-embed" data-embed-url="https://mastodon.social/@eliocamp/116531644254157709/embed" style="background: #FCF8FF; border-radius: 8px; border: 1px solid #C9C4DA; margin: 0; max-width: 540px; min-width: 270px; overflow: hidden; padding: 0;">
<a href="https://mastodon.social/@eliocamp/116531644254157709" style="align-items: center; color: #1C1A25; display: flex; flex-direction: column; font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Oxygen, Ubuntu, Cantarell, 'Fira Sans', 'Droid Sans', 'Helvetica Neue', Roboto, sans-serif; font-size: 14px; justify-content: center; letter-spacing: 0.25px; line-height: 20px; padding: 24px; text-decoration: none;" rel="nofollow" target="_blank"> <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32" viewBox="0 0 79 75"><path d="M63 45.3v-20c0-4.1-1-7.3-3.2-9.7-2.1-2.4-5-3.7-8.5-3.7-4.1 0-7.2 1.6-9.3 4.7l-2 3.3-2-3.3c-2-3.1-5.1-4.7-9.2-4.7-3.5 0-6.4 1.3-8.6 3.7-2.1 2.4-3.1 5.6-3.1 9.7v20h8V25.9c0-4.1 1.7-6.2 5.2-6.2 3.8 0 5.8 2.5 5.8 7.4V37.7H44V27.1c0-4.9 1.9-7.4 5.8-7.4 3.5 0 5.2 2.1 5.2 6.2V45.3h8ZM74.7 16.6c.6 6 .1 15.7.1 17.3 0 .5-.1 4.8-.1 5.3-.7 11.5-8 16-15.6 17.5-.1 0-.2 0-.3 0-4.9 1-10 1.2-14.9 1.4-1.2 0-2.4 0-3.6 0-4.8 0-9.7-.6-14.4-1.7-.1 0-.1 0-.1 0s-.1 0-.1 0 0 .1 0 .1 0 0 0 0c.1 1.6.4 3.1 1 4.5.6 1.7 2.9 5.7 11.4 5.7 5 0 9.9-.6 14.8-1.7 0 0 0 0 0 0 .1 0 .1 0 .1 0 0 .1 0 .1 0 .1.1 0 .1 0 .1.1v5.6s0 .1-.1.1c0 0 0 0 0 .1-1.6 1.1-3.7 1.7-5.6 2.3-.8.3-1.6.5-2.4.7-7.5 1.7-15.4 1.3-22.7-1.2-6.8-2.4-13.8-8.2-15.5-15.2-.9-3.8-1.6-7.6-1.9-11.5-.6-5.8-.6-11.7-.8-17.5C3.9 24.5 4 20 4.9 16 6.7 7.9 14.1 2.2 22.3 1c1.4-.2 4.1-1 16.5-1h.1C51.4 0 56.7.8 58.1 1c8.4 1.2 15.5 7.5 16.6 15.6Z" fill="currentColor"/></svg>
<div style="color: #787588; margin-top: 16px;">
Post by <span class="citation">@eliocamp</span><span class="citation">@mastodon.social</span>
</div>
<div style="font-weight: 500;">
View on Mastodon
</div>
</a>
</blockquote>
<script data-allowed-prefixes="https://mastodon.social/" async src="https://mastodon.social/embed.js"></script>
<p>with the idiom</p>
<pre>import os

[os.path.join(path, f) for f in os.listdir(path)]</pre>
<p>The supplied suggestions included</p>
<pre>from pathlib import Path

list(Path(path).iterdir())</pre>
<p>(just rolls off the tongue, doesn’t it?) which returns a list of <code>PosixPath()</code>
objects and is hardly easy to parse visually.</p>
<p>So, why not have a function?!?</p>
<pre>import os

def list_files(path):
    return [os.path.join(path, f) for f in os.listdir(path)]

path = &quot;path/to/files&quot;

list_files(path)
## [&#39;path/to/files/file1.txt&#39;, &#39;path/to/files/file2.txt&#39;, &#39;path/to/files/file3.csv&#39;]</pre>
<p>I would have liked to call this <code>list.files()</code> but, since Python strictly uses
the dot for method calling, it can’t be that.</p>
<p>This then raises the question of “should I support the arguments already in the R
functions?” In this case, should it support a <code>recursive</code> argument? Yes, that
adds complexity, but it’s surely do-able. At this point I reached for some AI
assistance and had Claude help me to implement as many functions as we could think
of, supporting as many common arguments as possible. This involved extending the
decorator to support vectorising other arguments (which also need to be careful
about dots).</p>
<p>On testing it out, it looked like we had something viable.</p>
<p>One last piece I wanted to support, though: the <code>which()</code> example above extracts
the elements of a <em>logical</em> vector which are <code>True</code>, but in order to build that vector
in the first place, I would naturally leverage R’s vectorisation as an array
language. The two steps involved here are to first compute the comparison resulting
in a logical vector, then to use <code>which()</code> to identify the indices of those which are
true</p>
<pre>which(c(&quot;c&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;b&quot;) == &quot;a&quot;)
## [1] 3 5</pre>
<p>The vectorisation decorator above doesn’t help here, because it’s at the point of
<code>==</code> that we want to vectorise</p>
<pre>[&#39;c&#39;, &#39;b&#39;, &#39;a&#39;, &#39;c&#39;, &#39;a&#39;, &#39;b&#39;] == &#39;a&#39;
## False</pre>
<p>This is <code>False</code> because the character <code>'a'</code> is not equal to the given list.</p>
<p>The appropriate idiom is once again to use a list comprehension</p>
<pre>which(x == &#39;a&#39; for x in [&#39;c&#39;, &#39;b&#39;, &#39;a&#39;, &#39;c&#39;, &#39;a&#39;, &#39;b&#39;])
## [2, 4]</pre>
<p>The solution I’m fond of is to create a new ‘Vec’ class which wraps binary operators
with a list comprehension, again abstracting away this detail. This means
implementing <code>__eq__</code>, <code>__add__</code>, <code>__and__</code> and lots of other binary operations,
but with that, and a wrapper to create such an object, the comparison operators
can be vectorised</p>
<pre>vals = vec([&#39;c&#39;, &#39;b&#39;, &#39;a&#39;, &#39;c&#39;, &#39;a&#39;, &#39;b&#39;])
which(vals == &#39;a&#39;)
## [2, 4]</pre>
<p>Not pristine, but quite clean, if you ask me.</p>
<p>With all these pieces in place, adding implementations for common base R functions
including most arguments and a way to vectorise lists, I wrapped everything up
into a Python package (my first) to learn how to do it.</p>
<p>The workflow isn’t particularly painful, with my biggest complication being
different versions of Python supporting different requirements in <code>pyproject.toml</code>,
and so some GitHub Actions are failing because of that.</p>
<p>As part of building out the implementations I had Claude add tests for each of the
functions with some expected values &#8211; if I <em>do</em> want to improve some of the idioms
internally, I want to ensure I don’t change the values produced. That works for
having any testing at all, but how can I be sure that I’m reproducing what I
would get if I was working in R? One option was to just run all of the test
functions by hand and confirm that the values look similar enough, accounting for
list vs vector and 0 vs 1 indexing. Instead, Claude managed to write an adaptor
for <code>pytest</code> which does the realignment of e.g. <code>list_files</code> to <code>list.files</code>
(and similarly for arguments), realigns the indexing where needed, and runs all
existing tests directly in R via <code>rpy2</code> (skipping over some for which I don’t
have tests yet). I’m disabling automated testing of this because I suspect it
could get flaky dealing with both R <em>and</em> Python on GitHub Actions, but I can
confirm that all the current tests pass.</p>
<p>I wanted to have a documentation website similar to what we have via {pkgdown} and
came across <a href="https://github.com/machow/quartodoc" rel="nofollow" target="_blank">quartodoc</a> which is what the
<a href="https://rstudio.github.io/pins-python/" rel="nofollow" target="_blank">Python version of {pins} uses</a>. Getting
that to work required downgrading a specific Python dependency, but was otherwise
painless.</p>
<p>I have a working package locally &#8211; how do I share it? This seemed like the perfect
opportunity to learn what the release process looks like for Python. I have a
handful of packages on CRAN and one on Bioconductor, and the process there is
far from frictionless, with the side-effect that there’s some trust you can place
on the interoperability of packages and minimal (automated) code checking. While
Python is more ‘wild west’ in terms of what can be uploaded, it’s really nice to see
that they do have an <a href="https://test.pypi.org/" rel="nofollow" target="_blank">entirely separate test server</a>
where you can upload your package and see how it looks. I’m reminded of the quote</p>
<blockquote>
<p>Everybody has a testing environment. Some people are lucky enough to have a totally separate environment to run production in.</p>
</blockquote>
<p>Given that it’s not currently possible to run 100% of the CRAN checks locally
(and even some that you <em>can</em> give a different result to what’s on their systems)
this does make me a little jealous. I wonder whether the decrease in load from
rejecting failing submissions would offset supporting a test submission server.</p>
<p>All went well pushing to the test server (via an authentication key) and I managed
to build up the courage to push to the production instance…
<a href="https://pypi.org/project/rfuns/" rel="nofollow" target="_blank">it’s live!</a></p>
<div class="float">
<img src="https://i2.wp.com/jcarroll.com.au/2026/05/22/functions-over-idioms-rfuns/images/rfuns_logo_small.png?w=300&#038;ssl=1" alt="rfuns logo - R functions in Python… are fun" data-recalc-dims="1" />
<div class="figcaption">rfuns logo &#8211; R functions in Python… are fun</div>
</div>
<p>and the <a href="https://jonocarroll.github.io/rfuns/" rel="nofollow" target="_blank">documentation site</a> isn’t too bad,
either (in my opinion).</p>
<p>This means that you can now run</p>
<pre>uv add rfuns</pre>
<p>(or the equivalent in whatever virtual environment management configuration you’re
using, e.g. <code>pip install rfuns</code>) and start using some R functions directly in
Python!</p>
<p>Depending on how you like to manage your imports, you can import everything</p>
<pre>from rfuns import *

which([False, False, True, False, True])
## [2, 4]</pre>
<p>or, if you prefer to namespace</p>
<pre>import rfuns as r

r.which([False, False, True, False, True])
## [2, 4]</pre>
<p>The list of functions currently imported, grouped into sections is:</p>
<div id="strings" class="section level3">
<h3>Strings</h3>
<ul>
<li><code>nchar(x)</code></li>
<li><code>nzchar(x)</code></li>
<li><code>paste(*args, sep=&quot; &quot;, collapse=None)</code></li>
<li><code>paste0(*args, collapse=None)</code></li>
<li><code>grepl(pattern, x, ignore_case=False, fixed=False)</code></li>
<li><code>grep(pattern, x, ignore_case=False, fixed=False, value=False, invert=False)</code></li>
<li><code>gsub(pattern, replacement, x, ignore_case=False, fixed=False)</code></li>
<li><code>sub(pattern, replacement, x, ignore_case=False, fixed=False)</code></li>
<li><code>trimws(x, which=&quot;both&quot;, whitespace=r&quot;[ \t\r\n]&quot;)</code></li>
<li><code>toupper(x)</code></li>
<li><code>tolower(x)</code></li>
<li><code>startsWith(x, prefix)</code></li>
<li><code>endsWith(x, suffix)</code></li>
<li><code>strsplit(x, split, fixed=False)</code></li>
<li><code>substr(x, start, stop)</code></li>
<li><code>chartr(old, new, x)</code></li>
<li><code>formatC(x, digits=6, format=&quot;g&quot;, width=None)</code></li>
</ul>
</div>
<div id="vectors" class="section level3">
<h3>Vectors</h3>
<ul>
<li><code>which(x)</code></li>
<li><code>which_min(x)</code></li>
<li><code>which_max(x)</code></li>
<li><code>diff(x, lag=1)</code></li>
<li><code>cumsum(x)</code></li>
<li><code>cumprod(x)</code></li>
<li><code>cummax(x)</code></li>
<li><code>cummin(x)</code></li>
<li><code>rev(x)</code></li>
<li><code>duplicated(x)</code></li>
<li><code>setdiff(x, y)</code></li>
<li><code>intersect(x, y)</code></li>
<li><code>union(x, y)</code></li>
<li><code>unique(x)</code></li>
<li><code>seq_along(x)</code></li>
<li><code>seq_len(n)</code></li>
<li><code>seq(from_=0, to=None, by=None, length_out=None)</code> (<code>from</code> is a reserved keyword)</li>
<li><code>sign(x)</code></li>
<li><code>r_range(x)</code> (renamed to not conflict with <code>range()</code>)</li>
</ul>
</div>
<div id="math" class="section level3">
<h3>Math</h3>
<ul>
<li><code>sign(x)</code></li>
<li><code>trunc(x)</code></li>
<li><code>ceiling(x)</code></li>
<li><code>floor(x)</code></li>
<li><code>sqrt(x)</code></li>
<li><code>log(x, base=None)</code></li>
<li><code>log2(x)</code></li>
<li><code>log10(x)</code></li>
<li><code>exp(x)</code></li>
<li><code>abs(x)</code></li>
<li><code>var(x, na_rm=False)</code></li>
<li><code>sd(x, na_rm=False)</code></li>
<li><code>mean(x, na_rm=False)</code></li>
<li><code>median(x, na_rm=False)</code></li>
<li><code>quantile(x, probs=None, na_rm=False)</code></li>
<li><code>scale(x, center=True, scale_=True)</code></li>
<li><code>round(x, digits=0)</code></li>
</ul>
</div>
<div id="files" class="section level3">
<h3>Files</h3>
<ul>
<li><code>list_files(path=&quot;.&quot;, pattern=None, all_files=False, full_names=False, recursive=False, ignore_case=False, include_dirs=False, no_dot=False)</code></li>
<li><code>file_exists(path)</code></li>
<li><code>dir_exists(path)</code></li>
<li><code>basename(path)</code></li>
<li><code>dirname(path)</code></li>
<li><code>file_path(*args)</code></li>
</ul>
</div>
<div id="table" class="section level3">
<h3>Table</h3>
<ul>
<li><code>table(x)</code></li>
<li><code>prop_table(x)</code></li>
<li><code>margin_table(x)</code></li>
</ul>
</div>
<div id="functional" class="section level3">
<h3>Functional</h3>
<ul>
<li><code>lapply(x, func)</code></li>
<li><code>sapply(x, func)</code></li>
<li><code>vapply(x, func, expected_type)</code></li>
<li><code>tapply(x, index, func)</code></li>
<li><code>rapply(x, func)</code></li>
<li><code>Filter(func, x)</code></li>
<li><code>Map(func, *args)</code></li>
<li><code>Reduce(func, x, init=None, accumulate=False)</code></li>
</ul>
</div>
<div id="inspect" class="section level3">
<h3>Inspect</h3>
<ul>
<li><code>head(x, n=6)</code></li>
<li><code>tail(x, n=6)</code></li>
<li><code>length(x)</code></li>
<li><code>nrow(x)</code></li>
<li><code>ncol(x)</code></li>
<li><code>dim(x)</code></li>
<li><code>summary(x)</code></li>
<li><code>rstr(x)</code> (renamed to not conflict with <code>str()</code>)</li>
</ul>
</div>
<div id="utils" class="section level3">
<h3>Utils</h3>
<ul>
<li><code>vec(x)</code></li>
</ul>
<p>Some of these are vectorised</p>
<pre>nchar([&quot;these&quot;, &quot;all&quot;, &quot;have&quot;, &quot;different&quot;, &quot;lengths&quot;])
## [5, 3, 4, 9, 7]
grepl(&quot;ar&quot;, [&quot;frog&quot;, &quot;carpet&quot;, &quot;basket&quot;, &quot;dart&quot;])
## [False, True, False, True]
sqrt([36, 81, 9])
## [6.0, 9.0, 3.0]</pre>
<p>while others (approximately, up to 0-indexing) preserve the R behaviour, such as
how <code>seq()</code> works</p>
<pre>seq(5)
## [0, 1, 2, 3, 4]
seq(from_=0, to=10, by=2)
## [0, 2, 4, 6, 8, 10]</pre>
<p>(note that <code>from</code> is a keyword in Python, so the argument here is now <code>from_</code>)
and set operations</p>
<pre>setdiff([5, 2, 4, 1], [2, 1])
## [5, 4]</pre>
<p>whereas this does not preserve order</p>
<pre>set([5, 2, 4, 1]) -  set([2, 1])
## {4, 5}</pre>
<p>Doing <em>all</em> of this myself would have taken quite some time, so I’m grateful to
be able to direct an agent towards accomplishing some of the tedious parts of this
project. I still drove the decision making and made sure to verify outputs, so I
don’t consider this a ‘vibe-coded’ project.</p>
<p>I’m not recommending you use this in production at all &#8211; I’ve taken whatever
idiom I could find (or generate) for the internals of all of these, and haven’t
paid any attention to their performance. The goal was to make it easier for me
to work interactively in a REPL when I’m reaching for particular functions. That
being said, I’ll gladly do my best to understand the Pythonic versions as best
as I can so that I can better appreciate native Python and use the idioms when
my helper package isn’t available (or unsuitable). I’d say it’s fair to argue
that R users using Python <em>should</em> learn how to do things in a Pythonic way, but
I also just want to get some small things done occasionally, so I’m happy this
now exists.</p>
<p>If you’re working with non-R colleagues then introducing these abstractions —
while they may make your life simpler in the moment — will probably result in
confusion as you’re hiding away the implementation and giving it a name they
won’t recognise. That’s precisely what functions are for (with helpful names),
of course, but unless this package becomes popular, I’ll bet that the inline
idioms are more welcomed in a codebase.</p>
<p>I’d love to hear what people think about this, although I’m entirely fine with me
being the sole user of it. Should I just force my muscle-memory to take on the
Python idioms? Am I going to be punished for <a href="https://ghostbusters.fandom.com/wiki/Cross_the_Streams" rel="nofollow" target="_blank">‘crossing the
streams’</a> of two
incompatible languages? Would this be helpful to you? Are there other
considerations I’ve missed? As always, I can be found on
<a href="https://fosstodon.org/@jonocarroll" rel="nofollow" target="_blank">Mastodon</a> and the comment section below.</p>
<p>Shoutouts to Elio Campitelli and Michael Sumner for feedback on a draft of this
post.</p>
<br />
<details>
<summary>
<tt>devtools::session_info()</tt>
</summary>
<pre>## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.5.3 (2026-03-11)
##  os       macOS Tahoe 26.3.1
##  system   aarch64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Australia/Adelaide
##  date     2026-05-22
##  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
##  quarto   1.7.31 @ /usr/local/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.23    2026-01-18 [1] CRAN (R 4.5.2)
##  bookdown      0.46    2025-12-05 [1] CRAN (R 4.5.2)
##  bslib         0.10.0  2026-01-26 [1] CRAN (R 4.5.2)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.5.0)
##  cli           3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
##  devtools      2.4.6   2025-10-03 [1] CRAN (R 4.5.0)
##  digest        0.6.39  2025-11-19 [1] CRAN (R 4.5.2)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.5.0)
##  evaluate      1.0.5   2025-08-27 [1] CRAN (R 4.5.0)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
##  fs            1.6.7   2026-03-06 [1] CRAN (R 4.5.2)
##  glue          1.8.1   2026-04-17 [1] CRAN (R 4.5.2)
##  htmltools     0.5.9   2025-12-04 [1] CRAN (R 4.5.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.5.0)
##  jsonlite      2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
##  knitr         1.51    2025-12-20 [1] CRAN (R 4.5.2)
##  lattice       0.22-9  2026-02-09 [1] CRAN (R 4.5.3)
##  lifecycle     1.0.5   2026-01-08 [1] CRAN (R 4.5.2)
##  magrittr      2.0.4   2025-09-12 [1] CRAN (R 4.5.0)
##  Matrix        1.7-4   2025-08-28 [1] CRAN (R 4.5.3)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.5.0)
##  otel          0.2.0   2025-08-29 [1] CRAN (R 4.5.0)
##  pkgbuild      1.4.8   2025-05-26 [1] CRAN (R 4.5.0)
##  pkgload       1.5.0   2026-02-03 [1] CRAN (R 4.5.2)
##  png           0.1-9   2026-03-15 [1] CRAN (R 4.5.2)
##  purrr         1.2.2   2026-04-10 [1] CRAN (R 4.5.2)
##  R6            2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
##  Rcpp          1.1.1   2026-01-10 [1] CRAN (R 4.5.2)
##  remotes       2.5.0   2024-03-17 [1] CRAN (R 4.5.0)
##  reticulate    1.45.0  2026-02-13 [1] CRAN (R 4.5.2)
##  rlang         1.1.7   2026-01-09 [1] CRAN (R 4.5.2)
##  rmarkdown     2.30    2025-09-28 [1] CRAN (R 4.5.0)
##  rstudioapi    0.18.0  2026-01-16 [1] CRAN (R 4.5.2)
##  sass          0.4.10  2025-04-11 [1] CRAN (R 4.5.0)
##  sessioninfo   1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
##  usethis       3.2.1   2025-09-06 [1] CRAN (R 4.5.0)
##  vctrs         0.7.1   2026-01-23 [1] CRAN (R 4.5.2)
##  withr         3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
##  xfun          0.56    2026-01-18 [1] CRAN (R 4.5.2)
##  yaml          2.3.12  2025-12-10 [1] CRAN (R 4.5.2)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/bin/python
##  libpython:      /Users/jono/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/libpython3.12.dylib
##  pythonhome:     /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV:/Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV
##  virtualenv:     /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/bin/activate_this.py
##  version:        3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ]
##  numpy:          /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/lib/python3.12/site-packages/numpy
##  numpy_version:  2.4.6
##  
##  NOTE: Python version was forced by VIRTUAL_ENV
## 
## ──────────────────────────────────────────────────────────────────────────────</pre>
</details>
<p><br /></p>
</div>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://jcarroll.com.au/2026/05/22/functions-over-idioms-rfuns/"> rstats on Irregularly Scheduled Programming</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/functions-over-idioms-writing-r-in-python-with-rfuns/">Functions over Idioms – Writing R in Python with rfuns</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401387</post-id>	</item>
		<item>
		<title>Zero Sum Problems</title>
		<link>https://www.r-bloggers.com/2026/05/zero-sum-problems/</link>
		
		<dc:creator><![CDATA[R on kieranhealy.org]]></dc:creator>
		<pubDate>Thu, 21 May 2026 19:50:40 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/</guid>

					<description><![CDATA[<p>Over at Daring Fireball, John Gruber makes a passing observation about the Apple Sports app:</p>
<p>    I’ve got some gripes about certain specific aspects of Apple Sports. Like, where does one even start to explain how much is wrong with their zero-sum v...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/zero-sum-problems/">Zero Sum Problems</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/"> R on kieranhealy.org</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Over at <a href="https://daringfireball.net/" rel="nofollow" target="_blank">Daring Fireball</a>, John Gruber makes a passing observation about the Apple Sports app:</p>



<blockquote>
    <p>I’ve got some gripes about certain specific aspects of Apple Sports. Like, where does one even <em>start</em> to explain how much is wrong with <a href="https://daringfireball.net/misc/2026/05/apple-sports-team-stats-wtf.png" rel="nofollow" target="_blank">their zero-sum visualization of team stats</a>? Has anyone ever even seen a presentation like that before? <a href="https://kieranhealy.org/" rel="nofollow" target="_blank">Anyone</a>?</p>

</blockquote>

<p>That “Anyone” link lands over here. Hi everyone! The team stats image <em>is</em> quite confusing. It’s a summary of a game between the San Antonio Spurs and the Oklahoma City Thunder. I don’t know much about basketball, but I do know a bit about data visualization and in a pleasing coincidence my former student <a href="https://www.linkedin.com/in/joshua-fink" rel="nofollow" target="_blank">Josh Fink</a> is the A-VP of Basketball Data Science for the Spurs. Here is the image that John objected to:</p>
<figure><a href="https://i0.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/apple-sports-team-stats-wtf.png?ssl=1" rel="nofollow" target="_blank">
    <img src="https://i0.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/apple-sports-team-stats-wtf.png?w=578&#038;ssl=1"
         alt="Confusing Apple Sports team stats visualization." data-recalc-dims="1"/></a><figcaption>
            <p>I had to look at it for a while as well.</p>
        </figcaption>
</figure>
<p>I just finished driving a very long way up the side of the country, so I’m kind of tired. But even allowing for that, boy, this way of representing things really is quite confusing. Not being an Apple Sports user I had to look at it for a bit to understand what was happening. But, now that it has given me a headache, I can kind of see why whoever designed this ended up in the undoubtedly bad place they did.</p>
<p>Before I get to why I have some sympathy for the designer, <em>why</em> did I find this representation of these numbers so disorienting? It’s not just just because I’ve been driving for nine hours. John is right to call the picture a “Zero Sum” representation. The design <em>strongly</em> suggests to the viewer that, within each row, we’re looking at each team’s share of a total. Each pair of black and blue lines seem to be vying for control of their whole row, with the longest line being the “winner” in each case.</p>
<p>This sort of representation would make perfect sense for a measure that really
<em>was</em> zero sum. Take an example from a properly good sport, like rugby. There,
like in basketball, to a first approximation a team either has the ball or it
doesn’t.<sup id="fnref:1"><a href="https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/#fn:1" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">1</a></sup> But there’s no shot clock in rugby, and possession routinely gets
turned over without the game stopping. So, knowing that Team A had 65%
possession is not only informative, it also immediately entails that Team B had
35%. You could show that with a representation like one of the rows above.</p>
<p>Literally none of the measures in the Basketball data above are zero-sum in this way. Both teams could shoot 100% from the free throw line, or zero percent. But because the first three measures shown are percentages, this reinforces the zero-sum impression given by the lines. It certainly did that in my case. But then, starting with Assists, the remaining rows are just absolute numbers. When I started looking at the absolute numbers, I got confused a second time by the length of the lines. “Oh so it’s not a share, it’s the value” I thought—but no, they do correspond in terms of relative proportions to the teams share within each row. But they’re not really <em>shares</em> they’re just <em>magnitudes</em>. But they have to be shown in a fixed space and we want to make them relatively comparable somehow so …  Argh.</p>
<p>It would be nice if there were One Weird Trick to fully fix this figure. But I’m not sure that there is. For example, at a minimum we could redraw these numbers to reflect the fact that they’re not zero-sum. Keep each measure as a row (i.e. on the y-axis) but have the lines, or columns, be side by side within each category instead of facing off. Like this:</p>
<figure><a href="https://i1.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/gruber-stats1.png?ssl=1" rel="nofollow" target="_blank">
    <img src="https://i1.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/gruber-stats1.png?w=578&#038;ssl=1"
         alt="Team Stats side by side for each measure." data-recalc-dims="1"/></a><figcaption>
            <p>Team Stats side by side for each measure.</p>
        </figcaption>
</figure>
<p>This view at least lets you immediately see who “won” each measure. The viewer
can just directly compare the length of the bars in each category. <a href="https://socviz.co/01-look-at-data.html#visual-tasks-and-decoding-graphs" rel="nofollow" target="_blank">People are
really good at doing that
accurately.</a>
In that sense it’s much less confusing than the original. But there’s still a
lot wrong with it. The core problem is that when we draw a graph like this,
we’re usually putting <em>the same kind of thing</em> (e.g. countries, or religious
groups, or sports teams) on the y-axis, and then seeing how different their
scores are on some single measure (e.g. GDP, or number of adherents, or average
points scored per game), which we put on the x-axis. Maybe we use color to break
things out by some third measure as well.<sup id="fnref:2"><a href="https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/#fn:2" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">2</a></sup> In
this case, I’ve just labeled the x-axis as generically as possible. “Value”
covers the range of all the measures. The lowest value is 5, in Largest Lead.
The highest is 88, in Free Throw %. But these numbers are not meaningfully
comparable. The graph encourages us to compare across as well as within
categories. But while within-category comparisons are meaningful, the
between-category ones are not. There were way more Bench Points than Blocks in
the game. But that is not a useful thing to know.</p>
<p>Knowing who won each measure isn’t nothing. It can be informative about how the game went, maybe especially when a team won the game but “lost” on a number of the measures. If you really wanted to lean in to that aspect, you could sort of justify the zero-sum view, and maybe look for a way to sort and order by “how much” a team “won” each category. But again, what’s the right denominator for those measures? For instance, do we care about a team’s share of all Defensive Rebounds in the game? Or do we care about the share of Defensive Rebounds a team won relative to every opportunity it had to make a Defensive Rebound? How meaningful is ordering our rows by those kinds of shares? Even worse, some measures (notably Fouls) are <em>bad</em> to “win”, so we’d have to do something about those.</p>
<figure><a href="https://i2.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/gruber-stats2.png?ssl=1" rel="nofollow" target="_blank">
    <img src="https://i2.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/gruber-stats2.png?w=578&#038;ssl=1"
         alt="Team Stats side by side and ordered from absolute highest to lowest, whatever that means." data-recalc-dims="1"/></a><figcaption>
            <p>Team Stats side by side and ordered from absolute highest to lowest, whatever that means.</p>
        </figcaption>
</figure>
<p>Our fundamental problem is that we just have two cases (the teams) and fifteen
different measures, or variables. Each variable, except for the three
percentages, is in effect on its own scale. There’s no direct way to make
comparisons across them. Sure, some of these measures are probably going to be
associated with one another—e.g. Turnovers and Points Off Turnovers—but the
numeric values aren’t directly comparable in general. If you know a lot about
basketball you might have some informative rules of thumb about each one of
these measures, or some of them in combination. But at that point the lines in
this particular graph are not going to be doing any work for you; you’ll just
end up looking directly at the numbers. If we had data on all these measures for
every NBA game for a whole season then we could of course do much more with
them, because then each measure would have a distribution across all games and
across all teams.</p>
<p>As it is, the purpose of the “Stats” screen in Apple Sports is just to summarize
information from a single game. The other thing I could think of to do with the
numbers as kind of graph is something like this:</p>
<figure><a href="https://i1.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/gruber-stats3.png?ssl=1" rel="nofollow" target="_blank">
    <img src="https://i1.wp.com/kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/gruber-stats3.png?w=578&#038;ssl=1"
         alt="A back-to-back column chart." data-recalc-dims="1"/></a><figcaption>
            <p>A back-to-back column chart.</p>
        </figcaption>
</figure>
<p>This is <em>marginally</em> more helpful than the one before just because, again, it
gets rid of the unhelpful zero-sum look of the original. As I hope you can
immediately see, it creates many other difficulties. It also doesn’t do away
with the core problem. That problem is principally one of information design
rather than data visualization. What I mean is that what we’re trying to
organize is, in effect, fifteen pairs of related but fundamentally distinct
numbers. If we had fifteen <em>cases</em> and two <em>variables</em> things would be simple. But
with fifteen variables and two cases … well, this is not the kind of thing you
can make a single effective and non-confusing graph out of. That’s why I kind of
sympathize with the designer. In a constrained space they have to show thirty
numbers (thirty two, including the score). Lots of information. A straight table
seems like it would be boring. Surely there’s some way to thematically integrate
the numbers in a visually appealing manner that brings out some of the
relationships across the rows. That’s what graphs do; it seems like the right
thing to reach for. But at its heart this information is not a graph. It just
sort of looks like one, and that ends up confusing people.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Modulo some measurement decisions about how to determine when possession is turned over while the ball is in play. <a href="https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/#fnref:1" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
<li id="fn:2">
<p><a href="https://socviz.co/05-more-on-geoms.html#fig-ch-05-organdata-06" rel="nofollow" target="_blank">Here’s an
example</a> of a graph with a categorical measure on the y-axis, a continuous measure on the x-axis, and an additional categorical feature shown with color. <a href="https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/#fnref:2" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
</ol>
</div>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://kieranhealy.org/blog/archives/2026/05/21/zero-sum-problems/"> R on kieranhealy.org</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/zero-sum-problems/">Zero Sum Problems</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401384</post-id>	</item>
		<item>
		<title>Conformalized TabICL: Prediction Intervals for a State-Of-The-Art Tabular Foundation Model in Python and R</title>
		<link>https://www.r-bloggers.com/2026/05/conformalized-tabicl-prediction-intervals-for-a-state-of-the-art-tabular-foundation-model-in-python-and-r/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/05/21/r/python/Conformalized-TabICL-nnetsauce</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Prediction Intervals for Tabular Regression in Python and R via Conformalized TabICL; comparison with RidgeCV</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/conformalized-tabicl-prediction-intervals-for-a-state-of-the-art-tabular-foundation-model-in-python-and-r/">Conformalized TabICL: Prediction Intervals for a State-Of-The-Art Tabular Foundation Model in Python and R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/21/r/python/Conformalized-TabICL-nnetsauce"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>A few days ago, I presented <a href="https://thierrymoudiki.github.io/blog/2026/05/17/r/python/conformalized-tabpfn" rel="nofollow" target="_blank">Conformalized TabPFN: Prediction Intervals for a Pretrained Transformer for Tabular Data in Python and R</a>. Today, it’s about <a href="https://github.com/soda-inria/tabicl" rel="nofollow" target="_blank">TabICL</a>, another state-of-the-art tabular foundation model. <code>TabICL</code> requires no token, as you’ll notice in the following Python and R code.</p>

<h1 id="1---python-version">1 &#8211; Python version</h1>

<pre>!pip install tabicl nnetsauce # scikit-learn matplotlib numpy

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeCV
from sklearn.metrics import mean_squared_error
from tabicl import TabICLRegressor
import nnetsauce as ns
import numpy as np
import matplotlib.pyplot as plt
from time import time

# ── data ───────────────────────────────────────────────────
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# ── base models ────────────────────────────────────────────
models = {
    &quot;TabICL&quot;: TabICLRegressor(),
    &quot;RidgeCV&quot;: RidgeCV(),
}

results = {}
for name, reg in models.items():
    start = time()
    conf = ns.PredictionInterval(reg, level=95)
    conf.fit(X_train, y_train)
    pi = conf.predict(X_test, return_pi=True)
    print(f&quot;{name:10s}  time={time() - start:.1f}s&quot;)

    coverage = np.mean((pi.lower &lt;= y_test) &#038; (pi.upper &gt;= y_test))
    width    = np.mean(pi.upper - pi.lower)
    rmse     = np.sqrt(mean_squared_error(y_test, pi.mean))

    results[name] = {&quot;pi&quot;: pi, &quot;coverage&quot;: coverage,
                     &quot;width&quot;: width, &quot;rmse&quot;: rmse}
    print(f&quot;{name:10s}  RMSE={rmse:.1f}  &quot;
          f&quot;coverage={coverage:.3f}  avg_width={width:.1f}&quot;)

# ── plot side-by-side ──────────────────────────────────────
fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharey=True)
colors = {&quot;TabICL&quot;: &quot;orange&quot;, &quot;RidgeCV&quot;: &quot;steelblue&quot;}
max_idx = 50

for ax, (name, res) in zip(axes, results.items()):
    pi = res[&quot;pi&quot;]
    x  = range(max_idx)
    ax.fill_between(x, pi.lower[:max_idx], pi.upper[:max_idx],
                     alpha=0.35, color=colors[name], label=&quot;95% PI&quot;)
    ax.plot(x, pi.mean[:max_idx], &quot;k--&quot;, lw=1.5, label=&quot;predicted&quot;)
    ax.plot(x, y_test[:max_idx], &quot;k.&quot;, ms=6, alpha=0.4, label=&quot;observed&quot;)
    ax.set_title(
        f&quot;{name}  |  cov={res['coverage']:.3f}  width={res['width']:.1f}&quot;
    )
    ax.legend(fontsize=8)

plt.suptitle(&quot;Conformalized TabICL vs RidgeCV — diabetes dataset&quot;)
plt.tight_layout()
plt.show()

Checkpoint 'tabicl-regressor-v2-20260212.ckpt' not cached.
 Downloading from Hugging Face Hub (jingang/TabICL).




tabicl-regressor-v2-20260212.ckpt:   0%|          | 0.00/114M [00:00&lt;?, ?B/s]


TabICL      time=21.8s
TabICL      RMSE=54.4  coverage=0.955  avg_width=226.1
RidgeCV     time=0.0s
RidgeCV     RMSE=53.9  coverage=0.955  avg_width=211.5
</pre>

<p><img src="https://i2.wp.com/thierrymoudiki.github.io/images/2026-05-21/2026-05-21-Conformalized-TabICL-nnetsauce_3_3.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>

<h1 id="2---r-version">2 - R version</h1>

<pre> %load_ext rpy2.ipython # in a Colab notebook, use this

%R install.packages(&quot;reticulate&quot;)

%%R  # in Colab/Jupyter with rpy2; remove this line for pure R

library(reticulate)

# pip install tabicl nnetsauce scikit-learn matplotlib numpy

sklearn_ds  &lt;- import(&quot;sklearn.datasets&quot;)
sklearn_ms  &lt;- import(&quot;sklearn.model_selection&quot;)
sklearn_m   &lt;- import(&quot;sklearn.metrics&quot;)
sklearn_lm  &lt;- import(&quot;sklearn.linear_model&quot;)
tabicl      &lt;- import(&quot;tabicl&quot;)
ns          &lt;- import(&quot;nnetsauce&quot;)
np          &lt;- import(&quot;numpy&quot;)
plt         &lt;- import(&quot;matplotlib.pyplot&quot;)

# ── data ───────────────────────────────────────────────────
d       &lt;- sklearn_ds$load_diabetes(return_X_y = TRUE)
X &lt;- d[[1]]; y &lt;- d[[2]]
sp      &lt;- sklearn_ms$train_test_split(X, y,
             test_size = 0.2, random_state = 42L)
X_train &lt;- sp[[1]]; X_test &lt;- sp[[2]]
y_train &lt;- sp[[3]]; y_test &lt;- sp[[4]]

# ── helper: fit + evaluate ─────────────────────────────────
eval_model &lt;- function(reg, name) {
  conf &lt;- ns$PredictionInterval(reg, level = 95L)
  conf$fit(X_train, y_train)
  pi   &lt;- conf$predict(X_test, return_pi = TRUE)

  cov  &lt;- np$mean((pi$lower &lt;= y_test) * (pi$upper &gt;= y_test))
  wid  &lt;- np$mean(pi$upper - pi$lower)
  rmse &lt;- sqrt(sklearn_m$mean_squared_error(y_test, pi$mean))

  cat(sprintf(&quot;%-10s  RMSE=%.1f  coverage=%.3f  avg_width=%.1f\n&quot;,
              name, rmse, cov, wid))
  invisible(pi)
}

# ── run both models ────────────────────────────────────────
pi_tabicl  &lt;- eval_model(tabicl$TabICLRegressor(),  &quot;TabICL&quot;)
pi_ridge   &lt;- eval_model(sklearn_lm$RidgeCV(),       &quot;RidgeCV&quot;)

# ── plot ───────────────────────────────────────────────────
max_idx &lt;- 50L
x_range &lt;- np$array(0:(max_idx - 1))

plot_pi &lt;- function(pi, title, col) {
  x_fill &lt;- np$concatenate(list(x_range, x_range[max_idx:1]))
  y_fill &lt;- np$concatenate(list(
    pi$upper[1:max_idx], pi$lower[max_idx:1]))
  plt$fill(x_fill, y_fill, alpha=0.35, fc=col, ec=&quot;None&quot;, label=&quot;95% PI&quot;)
  plt$plot(x_range, pi$mean[1:max_idx], &quot;k--&quot;, lw=1.5, label=&quot;predicted&quot;)
  plt$plot(x_range, y_test[1:max_idx], &quot;k.&quot;, ms=6L, alpha=0.4, label=&quot;observed&quot;)
  plt$title(title); plt$legend(fontsize=8L)
}

fig &lt;- plt$figure(figsize=c(12, 4))
plt$subplot(1L, 2L, 1L); plot_pi(pi_tabicl, &quot;Conformalized TabICL&quot;, &quot;orange&quot;)
plt$subplot(1L, 2L, 2L); plot_pi(pi_ridge,  &quot;Conformalized RidgeCV&quot;, &quot;steelblue&quot;)
plt$suptitle(&quot;Conformalized TabICL vs RidgeCV — diabetes dataset&quot;)
plt$tight_layout()
plt$show()

    WARNING: The R package &quot;reticulate&quot; only fixed recently
    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.
    TabICL      RMSE=54.4  coverage=0.955  avg_width=226.1
RidgeCV     RMSE=53.9  coverage=0.955  avg_width=211.5
</pre>

<p><img src="https://i2.wp.com/thierrymoudiki.github.io/images/2026-05-21/2026-05-21-Conformalized-TabICL-nnetsauce_7_1.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>

<p>Probably a dataset that’s too <em>easy</em> for a Transformer. Conformalizing simple models helps them, in general, to obtain coverage rates close to the nominal level, as we see for RidgeCV here.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/21/r/python/Conformalized-TabICL-nnetsauce"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/conformalized-tabicl-prediction-intervals-for-a-state-of-the-art-tabular-foundation-model-in-python-and-r/">Conformalized TabICL: Prediction Intervals for a State-Of-The-Art Tabular Foundation Model in Python and R</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401359</post-id>	</item>
		<item>
		<title>The Atlas-Learn Approach to the Manifold Hypothesis</title>
		<link>https://www.r-bloggers.com/2026/05/the-atlas-learn-approach-to-the-manifold-hypothesis/</link>
		
		<dc:creator><![CDATA[R Works]]></dc:creator>
		<pubDate>Wed, 20 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://rworks.dev/posts/atlas-learn-sphere/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>The manifold hypothesis, the idea that real-world high-dimensional data concentrates near a low-dimensional curved subspace, is foundational to modern machine learning. Many popular manifold learning methods such as UMAP, t-SNE, Isomap, and diff...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/the-atlas-learn-approach-to-the-manifold-hypothesis/">The Atlas-Learn Approach to the Manifold Hypothesis</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://rworks.dev/posts/atlas-learn-sphere/"> R Works</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p>The <strong>manifold hypothesis</strong>, the idea that real-world high-dimensional data concentrates near a low-dimensional curved subspace, is foundational to modern machine learning. Many popular manifold learning methods such as UMAP, t-SNE, Isomap, and diffusion maps do achieve dimensionality reduction by embedding data into a flat Euclidean space <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5ED"/>, but they do not attempt to directly learn the underlying manifold. In contrast, the 2025 paper by <a href="https://arxiv.org/pdf/2510.17772" rel="nofollow" target="_blank">Robinett et al.</a>, <em>Atlas-based manifold representations for interpretable Riemannian machine learning</em>, offers a proof of concept for directly tackling the manifold hypothesis based on fundamental ideas from differential geometry. It provides an algorithm for learning a low dimensional manifold from point data by constructing an atlas of charts. The paper is also notable for the design of an efficient data structure for working with the learned atlas and for the extensive supplementary materials that include a <a href="https://anonymous.4open.science/r/atlas_graph_learning-6DE0" rel="nofollow" target="_blank">GitHub Repository</a> containing several practical Python algorithms for doing calculations on manifolds, and an extraordinary amount of implementation detail.</p>
<p>Reading through Robinett et al., however, requires a fairly deep background in the theory of differential geometry. This post is an attempt to provide an on-ramp to Robinett et al. by discussing the relatively simple example of the two dimensional sphere, <img src="https://latex.codecogs.com/png.latex?S%5E2"/> embedded in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%5E3%7D"/>. It implements the <strong>Atlas-Learn</strong> data structures and algorithms in <code>R</code>, uses them to learn <img src="https://latex.codecogs.com/png.latex?S%5E2"/> and then goes on to validate the Atlas-Learn algorithm for the sphere via three independent methods: 1) use numerical integration along the manifold to trace a great circle on the sphere, 3) recover the radius of curvature of the sphere from the atlas, and 4) verify the Gauss-Bonnet Theorem for the sphere.</p>
<p>The <code>R</code> code was mostly worked out by Claude Sonnet 4.3 in the context of participating in the Posit beta test for its AI Assistant. I found the integration of the AI engine into the RStudio IDE an effective means of communicating with Claude and managing the project workflow.</p>
<section id="atlas-learn-theory-and-algorithm" class="level1">
<h1>Atlas-Learn: Theory and Algorithm</h1>
<p>This section provides some minimal theoretical background for understanding the Atlas-Learn algorithm. A <em>smooth manifold</em> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BM%7D"/> of intrinsic dimension <img src="https://latex.codecogs.com/png.latex?d"/> embedded in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5ED"/> can be described by an <strong>atlas</strong> — a finite collection of <em>charts</em> <img src="https://latex.codecogs.com/png.latex?%5C%7B(%5Cvarphi_j,%20%5COmega_j)%5C%7D_%7Bj=1%7D%5E%7Bk%7D"/> such that the open sets <img src="https://latex.codecogs.com/png.latex?U_j%20=%20%5Cvarphi_j%5E%7B-1%7D(%5COmega_j)"/> cover <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BM%7D"/> and each chart map <img src="https://latex.codecogs.com/png.latex?%5Cvarphi_j%20:%20U_j%20%5Cto%20%5Cmathbb%7BR%7D%5Ed"/> is a smooth bijection onto its image.</p>
<p>Normally, the definition of a smooth manifold also requires that any two charts be smoothly compatible, where two charts <img src="https://latex.codecogs.com/png.latex?(%5Cmathcal%7BU%7D_1,%20%5Cphi_1)"/> and <img src="https://latex.codecogs.com/png.latex?(%5Cmathcal%7BU%7D_2,%20%5Cphi_2)"/> are said to be iff <img src="https://latex.codecogs.com/png.latex?%5Cphi_1(%5Cmathcal%7BU%7D_1%20%5Ccap%20%5Cmathcal%7BU%7D_2)"/> and <img src="https://latex.codecogs.com/png.latex?%5Cphi_2(%5Cmathcal%7BU%7D_1%20%5Ccap%20%5Cmathcal%7BU%7D_2)"/> are both open in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Em"/> and the transition map <img src="https://latex.codecogs.com/png.latex?%5Cphi_%7B21%7D%20=%20%5Cphi_2%20%5Ccirc%20%5Cphi_1%5E%7B-1%7D%20:%20%5Cphi_1(%5Cmathcal%7BU%7D_1%20%5Ccap%20%5Cmathcal%7BU%7D_2)%20%5Cto%20%5Cphi_2(%5Cmathcal%7BU%7D_1%20%5Ccap%20%5Cmathcal%7BU%7D_2)"/> is a diffeomorphism (e.g. see [2]). Robinett et al. relax the smoothly compatible requirement and define transition maps <img src="https://latex.codecogs.com/png.latex?%5Cpsi_%7Bij%7D"/> separately from coordinate chart images. They then approximate a differentiable atlas by ensuring that the discrepancy between coordinate charts and transition maps <img src="https://latex.codecogs.com/png.latex?%7C%7C%7B%5Cphi(%5Cxi)%20-%20%5Cphi(%5Cpsi(xi)))%7D%7C%7C_%5Cmathbb%7BR%5ED%7D"/> goes to 0 as the number of charts and the number of points sampled goes to infinity.</p>
<p>In the Atlas-Learn algorithm the manifold is a surface (<img src="https://latex.codecogs.com/png.latex?d%20=%202"/>) embedded in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5E3"/>, and both the covering sets and the chart maps are <em>learned</em> from a finite point cloud <img src="https://latex.codecogs.com/png.latex?%5C%7Bx_i%5C%7D_%7Bi=1%7D%5E%7BN%7D%20%5Csubset%20%5Cmathcal%7BM%7D"/>. The algorithm proceeds to construct an atlas in four basic steps.</p>
<p>The Atlas-Learn algorithm proceeds in four steps for each chart:</p>
<ol type="1">
<li>The point cloud comprising the data, the sphere in our case, is partitioned into k-medoids.</li>
<li>Local PCA is used to find the tangent plane and the normal plane for each point.</li>
<li>Quadratic regression is performed to find the curvature coefficients, K</li>
<li>The minimum ellipsoidal region enclosing the chart is estimated.</li>
</ol>
</section><section id="step-1-partitioning-via-k-medoids" class="level2">
<h2 class="anchored" data-anchor-id="step-1-partitioning-via-k-medoids">Step 1: Partitioning via k-medoids</h2>
<p>The point cloud is partitioned into <img src="https://latex.codecogs.com/png.latex?k"/> clusters using the <img src="https://latex.codecogs.com/png.latex?k"/>-medoids algorithm (PAM). Unlike <img src="https://latex.codecogs.com/png.latex?k"/>-means, PAM selects actual data points as cluster centers (medoids), which makes the partition robust to outliers and avoids projection artefacts. Each point <img src="https://latex.codecogs.com/png.latex?x_i"/> receives a chart label <img src="https://latex.codecogs.com/png.latex?j(i)%20%5Cin%20%5C%7B1,%5Cldots,k%5C%7D"/>, and the points belonging to chart <img src="https://latex.codecogs.com/png.latex?j"/> together with their centroid are</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmathcal%7BX%7D_j%20=%20%5C%7B%20x_i%20:%20j(i)%20=%20j%20%5C%7D,%20%5Cqquad%0A%5Cbar%7Bx%7D_j%20=%20%5Cfrac%7B1%7D%7B%7C%5Cmathcal%7BX%7D_j%7C%7D%20%5Csum_%7Bx%20%5Cin%20%5Cmathcal%7BX%7D_j%7D%20x.%0A"/></p>
</section>
<section id="step-2-local-pca-and-tangent-plane-estimation" class="level2">
<h2 class="anchored" data-anchor-id="step-2-local-pca-and-tangent-plane-estimation">Step 2: Local PCA and tangent-plane estimation</h2>
<p>For each cluster <img src="https://latex.codecogs.com/png.latex?j"/>, the centered data matrix <img src="https://latex.codecogs.com/png.latex?%5Cwidetilde%7BX%7D_j%20=%20%5Cmathcal%7BX%7D_j%20-%20%5Cbar%7Bx%7D_j%20%5Cin%20%5Cmathbb%7BR%7D%5E%7B%7C%5Cmathcal%7BX%7D_j%7C%20%5Ctimes%203%7D"/> is decomposed via the thin SVD:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cwidetilde%7BX%7D_j%20=%20U%20%5CSigma%20V%5E%5Ctop,%20%5Cqquad%20V%20=%20%5Bv_1%20%5Cmid%20v_2%20%5Cmid%20v_3%5D.%0A"/></p>
<p>The first two right singular vectors span the <strong>local tangent plane</strong>:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AL_j%20=%20%5Bv_1%20%5Cmid%20v_2%5D%20%5Cin%20%5Cmathbb%7BR%7D%5E%7B3%20%5Ctimes%202%7D,%0A"/></p>
<p>while the third singular vector <img src="https://latex.codecogs.com/png.latex?m_j%20=%20v_3%20%5Cin%20%5Cmathbb%7BR%7D%5E3"/> estimates the <strong>local surface normal</strong> (the direction of least variance). Each centered point is then decomposed into tangent and normal components:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctau_i%20=%20%5Cwidetilde%7Bx%7D_i%5E%5Ctop%20L_j%20%5Cin%20%5Cmathbb%7BR%7D%5E2,%20%5Cqquad%0A%5Cnu_i%20%20=%20%5Cwidetilde%7Bx%7D_i%5E%5Ctop%20m_j%20%5Cin%20%5Cmathbb%7BR%7D.%0A"/></p>
</section>
<section id="step-3-quadratic-chart-map" class="level2">
<h2 class="anchored" data-anchor-id="step-3-quadratic-chart-map">Step 3: Quadratic chart map</h2>
<p>On a smooth surface the normal offset <img src="https://latex.codecogs.com/png.latex?%5Cnu"/> is a smooth function of the tangent coordinates <img src="https://latex.codecogs.com/png.latex?%5Ctau"/>. Atlas-Learn approximates this by a <strong>degree-2 polynomial</strong> (capturing local curvature):</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cnu%20%5Capprox%20K_j%20%5C,%20%5Cphi(%5Ctau),%20%5Cqquad%0A%5Cphi(%5Ctau)%20=%20%5Cbigl%5B%5Ctau_1%5E2,%5C;%20%5Ctau_1%5Ctau_2,%5C;%20%5Ctau_2%5E2%5Cbigr%5D%5E%5Ctop,%0A"/></p>
<p>where <img src="https://latex.codecogs.com/png.latex?K_j%20%5Cin%20%5Cmathbb%7BR%7D%5E%7B1%20%5Ctimes%203%7D"/> is estimated by ordinary least squares with a small ridge penalty <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon%20=%2010%5E%7B-10%7D"/>:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AK_j%20=%20%5Cboldsymbol%7B%5Cnu%7D%5E%5Ctop%20%5CPhi%0A%20%20%20%20%20%20%5Cbigl(%5CPhi%5E%5Ctop%20%5CPhi%20+%20%5Cvarepsilon%20I_3%5Cbigr)%5E%7B-1%7D,%20%5Cqquad%0A%5CPhi%20=%20%5Cbigl%5B%5Cphi(%5Ctau_1)%20%5C;%5Ccdots%5C;%20%5Cphi(%5Ctau_%7B%7C%5Cmathcal%7BX%7D_j%7C%7D)%5Cbigr%5D%5E%5Ctop%0A%20%20%20%20%20%20%5Cin%20%5Cmathbb%7BR%7D%5E%7B%7C%5Cmathcal%7BX%7D_j%7C%20%5Ctimes%203%7D.%0A"/></p>
<p>The resulting <strong>inverse chart map</strong> <img src="https://latex.codecogs.com/png.latex?%5Cvarphi_j%5E%7B-1%7D%20:%20%5Cmathbb%7BR%7D%5E2%20%5Cto%20%5Cmathbb%7BR%7D%5E3"/> reconstructs an ambient point from local coordinates <img src="https://latex.codecogs.com/png.latex?%5Cxi%20%5Cin%20%5Cmathbb%7BR%7D%5E2"/>:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cvarphi_j%5E%7B-1%7D(%5Cxi)%20%5C;=%5C;%20%5Cbar%7Bx%7D_j%20+%20L_j%5C,%5Cxi%20+%20m_j%5C,K_j%5C,%5Cphi(%5Cxi).%0A"/></p>
<p>Its <strong>Jacobian</strong> <img src="https://latex.codecogs.com/png.latex?J_j(%5Cxi)%20=%20%5Cpartial%5Cvarphi_j%5E%7B-1%7D/%5Cpartial%5Cxi%20%5Cin%20%5Cmathbb%7BR%7D%5E%7B3%20%5Ctimes%202%7D"/>, required for geodesic integration, is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AJ_j(%5Cxi)%20%5C;=%5C;%20L_j%20%5C;+%5C;%20m_j%5C,K_j%5C,%5Cfrac%7B%5Cpartial%5Cphi%7D%7B%5Cpartial%5Cxi%7D(%5Cxi),%20%5Cqquad%0A%5Cfrac%7B%5Cpartial%5Cphi%7D%7B%5Cpartial%5Cxi%7D(%5Cxi)%20=%20%5Cbegin%7Bpmatrix%7D%0A%20%202%5Cxi_1%20&#038;%200%20%20%20%20%20%20%5C%5C%0A%20%20%5Cxi_2%20%20&#038;%20%5Cxi_1%20%20%5C%5C%0A%20%200%20%20%20%20%20%20&#038;%202%5Cxi_2%0A%5Cend%7Bpmatrix%7D.%0A"/></p>
</section>
<section id="step-4-ellipsoidal-chart-domains" class="level2">
<h2 class="anchored" data-anchor-id="step-4-ellipsoidal-chart-domains">Step 4: Ellipsoidal chart domains</h2>
<p>Each chart is assigned an <strong>ellipsoidal domain</strong> <img src="https://latex.codecogs.com/png.latex?%5COmega_j%20%5Csubset%20%5Cmathbb%7BR%7D%5E2"/> defined by</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5COmega_j%20=%20%5Cbigl%5C%7B%5C,%20%5Cxi%20%5Cin%20%5Cmathbb%7BR%7D%5E2%20:%20%5Cxi%5E%5Ctop%20A_j%5C,%5Cxi%20%5Cleq%201%20%5C,%5Cbigr%5C%7D,%0A"/></p>
<p>where <img src="https://latex.codecogs.com/png.latex?A_j"/> is a rescaled inverse covariance of the projected points:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AA_j%20=%20%5Cfrac%7B%5Coperatorname%7BCov%7D(%5Ctau)%5E%7B-1%7D%7D%0A%20%20%20%20%20%20%20%20%20%20%20%7B%5Crho%20%5Ccdot%20%5Cmax_i%20%5Cbigl%5C%7B%5Ctau_i%5E%5Ctop%5Coperatorname%7BCov%7D(%5Ctau)%5E%7B-1%7D%5Ctau_i%5Cbigr%5C%7D%7D,%0A%5Cqquad%20%5Crho%20=%20%5Ctexttt%7Bellipsoid%5C_scale%7D.%0A"/></p>
<p>Setting <img src="https://latex.codecogs.com/png.latex?%5Crho%20%3E%201"/> (default <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%201.1"/>) inflates each domain slightly beyond the convex hull of its own cluster, so that neighboring charts <strong>overlap</strong> and transitions are always possible. On <img src="https://latex.codecogs.com/png.latex?S%5E2"/> specifically, because the sphere is isotropic and the <img src="https://latex.codecogs.com/png.latex?k"/>-medoids partition tends to produce roughly equal-area, near-circular patches, the learned ellipsoids <img src="https://latex.codecogs.com/png.latex?%5COmega_j"/> are close to circles (<img src="https://latex.codecogs.com/png.latex?A_j%20%5Capprox%20%5Clambda%20I_2"/> for some scalar <img src="https://latex.codecogs.com/png.latex?%5Clambda"/>). Each chart is assigned a domain <img src="https://latex.codecogs.com/png.latex?%5COmega_j%20=%20%5C%7B%5Cxi%20%5Cin%20%5Cmathbb%7BR%7D%5E2%20:%20%5Cxi%5E%5Ctop%20A_j%5Cxi%20%5Cleq%201%5C%7D"/> where <img src="https://latex.codecogs.com/png.latex?A_j"/> is a rescaled inverse covariance of the projected tangent-plane coordinates. Setting the scale factor <img src="https://latex.codecogs.com/png.latex?%5Crho%20%3E%201"/> (default <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%201.1"/>) inflates domains slightly so that neighboring charts overlap and transitions are always possible.</p>
<hr />
</section>

<section id="the-atlas-learn-implementation" class="level1">
<h1>The Atlas-Learn Implementation</h1>
<p>Here are the required <code>R</code> packages.</p>
<div class="cell">
<details class="code-fold">
<summary>Required Packages</summary>
<pre>library(tidyverse)
library(cluster)    # pam() for k-medoids partitioning
library(RANN)       # nn2() for fast k-nearest-neighbor queries
library(plotly)     # interactive 3D visualization
library(purrr)      # map() / imap() for list operations</pre>
</details>
</div>
<p>This block of code contains all of the functions for the Atlas Learn implementation.</p>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>#| label: atlas-functions

# ===========================================================================
# PART 1: Quadratic feature helpers
# ---------------------------------------------------------------------------
# These functions implement the d=2 specialization of the general quadratic
# feature map. For general d, phi(xi) would have choose(d+1, 2) components.
# For d=2 it has exactly 3: [xi1^2, xi1*xi2, xi2^2].
# ===========================================================================

# Maps xi in R^2 to the three quadratic monomials used to model surface curvature.
# General d would give choose(d+1,2) monomials; for d=2 this is exactly 3.
quad_features &lt;- function(xi) {
  c(xi[1]^2, xi[1] * xi[2], xi[2]^2)
}

# Jacobian of quad_features w.r.t. xi: a 3x2 matrix (d=2 specialization).
# Row i is the gradient of the i-th monomial:
#   d/dxi [xi1^2]     = [2*xi1,   0   ]
#   d/dxi [xi1*xi2]   = [xi2,     xi1 ]
#   d/dxi [xi2^2]     = [0,       2*xi2]
quad_jacobian &lt;- function(xi) {
  matrix(
    c(2 * xi[1],   xi[2],         0,
              0,   xi[1],  2 * xi[2]),
    nrow = 3, ncol = 2, byrow = TRUE
  )
}

# ===========================================================================
# PART 2: Core chart operations
# ---------------------------------------------------------------------------
# Each chart stores:
#   mean  : R^3 — chart center (centroid of the cluster)
#   L     : R^{3x2} — orthonormal tangent basis (columns = v1, v2 from SVD)
#   M     : R^{3x1} — unit surface normal (v3 from SVD; scalar normal because D-d=1)
#   K     : R^{1x3} — quadratic curvature coefficients (1 row because D-d=1)
#   ell_A : R^{2x2} — ellipsoidal domain matrix (2x2 because d=2)
# ===========================================================================

# Evaluate the inverse chart map: xi in R^2  -&gt;  point in R^3.
# Formula: x = mean + L*xi + M * (K * phi(xi))
#   - mean + L*xi : linear move along the tangent plane
#   - M*(K*phi)   : quadratic normal correction encoding surface curvature
chart_eval &lt;- function(chart, xi) {
  q &lt;- quad_features(xi)          # 3-vector: [xi1^2, xi1*xi2, xi2^2]
  as.vector(
    chart$mean +
    chart$L %*% xi +              # 3x2 * 2x1 = 3x1 tangent contribution
    chart$M * as.numeric(chart$K %*% q)   # 3x1 * scalar normal correction
  )
}

# Jacobian of the inverse chart map at xi: a 3x2 matrix.
# J(xi) = L + M * (K * dQ(xi))
#   dQ(xi) is the 3x2 Jacobian of the monomial map (quad_jacobian).
#   K * dQ  is 1x3 * 3x2 = 1x2; then M * (K*dQ) is 3x1 * 1x2 = 3x2.
# This is the key object for geodesic integration: it maps tangent vectors
# in R^2 chart coordinates to ambient R^3 velocity vectors.
chart_jacobian &lt;- function(chart, xi) {
  dQ &lt;- quad_jacobian(xi)                   # 3x2 monomial Jacobian
  chart$L + chart$M %*% (chart$K %*% dQ)   # 3x2 total Jacobian J(xi)
}

# Project an ambient R^3 point onto this chart's tangent-plane coordinates (R^2).
# The linear projection xi = L^T * (x - mean) is exact for points on the tangent
# plane; for points on the actual surface it is a first-order approximation.
chart_project &lt;- function(chart, x) {
  as.vector(t(chart$L) %*% (x - chart$mean))
}

# Test whether tangent coords xi lie within the chart's ellipsoidal domain.
# The domain is {xi in R^2 : xi^T * A * xi &lt;= 1}, a 2D Mahalanobis ball.
# This is O(d^2) = O(4) for d=2.
chart_in_domain &lt;- function(chart, xi) {
  as.numeric(t(xi) %*% chart$ell_A %*% xi) &lt;= 1
}

# ===========================================================================
# PART 3: atlas_learn() — the main learning function
# ---------------------------------------------------------------------------
# Input : X (N x 3 matrix of surface points), k (number of charts)
# Output: an S3 object of class &quot;atlas&quot; containing k atlas_chart objects
#
# The algorithm runs four steps per chart:
#   (a) k-medoids partitioning
#   (b) local PCA to find the tangent plane and normal
#   (c) quadratic regression for the curvature coefficients K
#   (d) ellipsoidal domain construction
#
# d=2, D=3 specializations that are hard-coded here:
#   - SVD takes nv=3 (the full 3x3 right-singular-vector matrix)
#   - L = V[,1:2] is 3x2; M = V[,3] is 3x1 (one normal vector, not a matrix)
#   - nu (normal offset) is a scalar per point, not a vector
#   - K is fitted as a 1x3 row vector (one row per normal direction)
#   - ell_A is 2x2 (domain is a 2D ellipse)
# ===========================================================================

atlas_learn &lt;- function(X, k, ellipsoid_scale = 1.1) {
  # X              : N x 3 matrix of surface points
  # k              : number of charts
  # ellipsoid_scale: inflate domains by this factor so adjacent charts overlap

  message(&quot;Fitting k-medoids (k = &quot;, k, &quot;) ...&quot;)
  # PAM (Partitioning Around Medoids) is preferred over k-means for surfaces:
  # medoids are actual data points, making the partition robust to outliers.
  km &lt;- cluster::pam(X, k, diss = FALSE)

  charts &lt;- seq_len(k) |&gt;
    purrr::map(\(j) {

      # --- (a) Extract the cluster and center it ----------------------------
      idx &lt;- which(km$clustering == j)
      Xj  &lt;- X[idx, , drop = FALSE]   # N_j x 3 matrix of cluster points
      m   &lt;- colMeans(Xj)             # chart center (3-vector)
      Xc  &lt;- sweep(Xj, 2, m)         # centered cluster: N_j x 3

      # --- (b) Local PCA: tangent plane and normal -------------------------
      # SVD of the centered cluster reveals local geometry:
      #   - first two right singular vectors (v1, v2) span the tangent plane
      #   - third right singular vector (v3) is the surface normal
      # d=2, D=3: we always take nv=3 because D=3 (fully determined).
      sv  &lt;- svd(Xc, nu = 0, nv = 3)

      # L: 3x2 tangent basis (orthonormal columns). General form: R^{D x d}.
      L   &lt;- sv$v[, 1:2, drop = FALSE]

      # M: 3x1 unit normal. d=2, D=3 specialization: D-d=1, so there is
      # exactly ONE normal direction and M is a column vector, not a matrix.
      M   &lt;- sv$v[, 3, drop = FALSE]

      # Project each centered point into tangent / normal coordinates.
      # tau: N_j x 2 — tangent coordinates (2D because d=2)
      # nu:  N_j x 1 — normal offsets (scalar because D-d=1)
      tau &lt;- Xc %*% L    # N_j x 2
      nu  &lt;- Xc %*% M    # N_j x 1 (scalar per point; would be N_j x (D-d) in general)

      # --- (c) Quadratic curvature regression ------------------------------
      # Fit: nu ~ K * phi(tau)  where phi(tau) = [tau1^2, tau1*tau2, tau2^2]
      # K is 1x3 here (one output because D-d=1; would be (D-d)x3 in general).
      #
      # Design matrix Q: N_j x 3, each row is phi(tau_i)
      Q   &lt;- t(apply(tau, 1, quad_features))   # N_j x 3

      # Ridge-regularized least squares: K = nu^T * Q * (Q^T*Q + esp*I)^{-1}
      # The ridge penalty eps=1e-10 prevents singularity when clusters are
      # nearly collinear in tangent coordinates.
      K   &lt;- t(nu) %*% Q %*% solve(crossprod(Q) + 1e-10 * diag(3))  # 1 x 3

      # --- (d) Ellipsoidal domain ------------------------------------------
      # Domain: {xi in R^2 : xi^T A xi &lt;= 1}, a Mahalanobis ball.
      # A_raw = Cov(tau)^{-1}: inverse covariance of the tangent coordinates.
      # This adapts the domain shape to the data spread in each direction.
      A_raw &lt;- solve(cov(tau))

      # Scale A so the outermost point lands on the boundary, then inflate
      # by ellipsoid_scale (default 1.1) to ensure overlap with neighbors.
      qvals &lt;- apply(tau, 1, \(xi) as.numeric(t(xi) %*% A_raw %*% xi))
      ell_A &lt;- A_raw / (ellipsoid_scale * max(qvals))   # 2x2 positive-definite

      # Pack all chart data into an S3 object
      structure(
        list(mean = m, L = L, M = M, K = K, ell_A = ell_A, idx = idx,
             n_points = length(idx)),
        class = &quot;atlas_chart&quot;
      )
    })

  # Return the full atlas as an S3 object
  structure(
    list(
      charts        = charts,
      k             = k,
      clustering    = km$clustering,
      n_points      = nrow(X),
      ambient_dim   = ncol(X),   # D = 3
      intrinsic_dim = 2L         # d = 2 (hard-coded for surfaces in R^3)
    ),
    class = &quot;atlas&quot;
  )
}

# ===========================================================================
# PART 4: S3 print / summary methods
# ===========================================================================

print.atlas_chart &lt;- function(x, ...) {
  cat(sprintf(
    &quot;&lt;atlas_chart&gt;  %d points | mean [%s] | cond(ell_A) = %.1f\n&quot;,
    x$n_points,
    paste(round(x$mean, 3), collapse = &quot;, &quot;),
    kappa(x$ell_A)
  ))
  invisible(x)
}

# Returns a per-chart summary tibble
summary.atlas &lt;- function(object, ...) {
  purrr::imap_dfr(object$charts, \(ch, i)
    tibble::tibble(
      chart      = i,
      n_points   = ch$n_points,
      mean_norm  = round(sqrt(sum(ch$mean^2)), 4),
      cond_ell_A = round(kappa(ch$ell_A), 1)
    )
  )
}

print.atlas &lt;- function(x, ...) {
  cat(sprintf(
    &quot;&lt;atlas&gt;  k = %d | ambient R^%d | intrinsic R^%d | %d points\n\n&quot;,
    x$k, x$ambient_dim, x$intrinsic_dim, x$n_points
  ))
  print(summary(x))
  invisible(x)
}

# ===========================================================================
# PART 5: Chart lookup
# ===========================================================================

# Return the index of the nearest chart whose domain contains x.
# Falls back to the globally nearest chart center if none qualify.
# Checking only the 6 nearest charts (by Euclidean center distance) is
# sufficient in practice and avoids an O(k) domain test at every step.
find_chart &lt;- function(atlas, x) {
  dists      &lt;- map_dbl(atlas$charts, \(ch) sum((x - ch$mean)^2))
  candidates &lt;- order(dists)[seq_len(min(6L, atlas$k))]
  in_domain  &lt;- purrr::keep(
    candidates,
    \(i) chart_in_domain(atlas$charts[[i]],
                         chart_project(atlas$charts[[i]], x))
  )
  if (length(in_domain) &gt; 0L) in_domain[[1L]] else which.min(dists)
}

# ===========================================================================
# PART 6: Quasi-Euclidean retraction
# ---------------------------------------------------------------------------
# Advances one step from (chart_idx, xi) in the ambient direction tau_r3 (R^3).
#
# The pseudoinverse J^+ = (J^T J)^{-1} J^T is the key d=2, D=3 quantity:
#   - J is 3x2 (D x d)
#   - J^T J is 2x2 (d x d) — the pullback metric g; invertible for full-rank J
#   - J^+ is 2x3 (d x D) — pulls ambient vectors back to chart coordinates
#
# In general D &gt; d, J^+ is the minimum-norm left inverse of J.
# ===========================================================================

retract_step &lt;- function(atlas, chart_idx, xi, tau_r3) {
  chart  &lt;- atlas$charts[[chart_idx]]
  J      &lt;- chart_jacobian(chart, xi)        # 3x2 (D x d for d=2, D=3)
  Jp     &lt;- solve(crossprod(J)) %*% t(J)    # 2x3 pseudoinverse: (J^T J)^{-1} J^T
  xi_new &lt;- xi + as.vector(Jp %*% tau_r3)   # advance in chart coordinates

  if (chart_in_domain(chart, xi_new)) {
    # Still inside the same chart: evaluate and return
    return(list(
      chart_idx = chart_idx,
      xi        = xi_new,
      x         = chart_eval(chart, xi_new)
    ))
  }

  # Step crossed a chart boundary: find a new host chart and re-project
  x_cand &lt;- chart_eval(chart, xi_new)       # approximate ambient position
  ci_new &lt;- find_chart(atlas, x_cand)       # nearest chart that contains x_cand
  xi2    &lt;- chart_project(atlas$charts[[ci_new]], x_cand)   # re-project
  list(
    chart_idx = ci_new,
    xi        = xi2,
    x         = chart_eval(atlas$charts[[ci_new]], xi2)
  )
}

# ===========================================================================
# PART 7: Geodesic path integration
# ---------------------------------------------------------------------------
# Traces a geodesic from x_start in direction direction_r3 (ambient R^3).
# Returns a tibble with columns x, y, z, step, chart_idx.
#
# At each step the ambient velocity vector is re-projected into the current
# chart's tangent plane and pushed back to ambient.  This keeps the direction
# of motion consistent across chart boundaries — without it, the fixed tau
# vector drifts after every chart transition and the path wanders off the
# intended geodesic.
# ===========================================================================

atlas_geodesic &lt;- function(atlas, x_start, direction_r3,
                           n_steps = 100L, step_size = 0.02) {
  ci  &lt;- find_chart(atlas, x_start)
  ch  &lt;- atlas$charts[[ci]]
  xi  &lt;- chart_project(ch, x_start)

  # Project the initial direction onto the starting chart's tangent plane,
  # then normalize to unit speed.  This ensures tau_r3 is always in T_x M.
  J      &lt;- chart_jacobian(ch, xi)
  Jp     &lt;- solve(crossprod(J)) %*% t(J)          # 2x3 pseudoinverse
  tau_r3 &lt;- as.vector(J %*% (Jp %*% direction_r3))  # project onto tangent plane
  tau_r3 &lt;- tau_r3 / sqrt(sum(tau_r3^2)) * step_size  # unit-speed scaling

  steps     &lt;- vector(&quot;list&quot;, n_steps + 1L)
  chart_ids &lt;- integer(n_steps + 1L)
  steps[[1L]]     &lt;- chart_eval(ch, xi)
  chart_ids[[1L]] &lt;- ci

  for (i in seq_len(n_steps)) {
    res    &lt;- retract_step(atlas, ci, xi, tau_r3)
    ci     &lt;- res$chart_idx
    xi     &lt;- res$xi
    steps[[i + 1L]]     &lt;- res$x
    chart_ids[[i + 1L]] &lt;- ci

    # Re-project tau_r3 into the current chart's tangent plane and normalize.
    # This is the identity-transport approximation: it holds the direction
    # constant in the current chart's frame rather than applying the
    # Christoffel correction that true parallel transport would require.
    J_new  &lt;- chart_jacobian(atlas$charts[[ci]], xi)
    Jp_new &lt;- solve(crossprod(J_new)) %*% t(J_new)
    tau_r3 &lt;- as.vector(J_new %*% (Jp_new %*% tau_r3))
    tau_r3 &lt;- tau_r3 / sqrt(sum(tau_r3^2)) * step_size
    J      &lt;- J_new
  }

  do.call(rbind, steps) |&gt;
    as_tibble(.name_repair = &quot;minimal&quot;) |&gt;
    set_names(c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)) |&gt;
    mutate(step = row_number(), chart_idx = chart_ids)
}

# ===========================================================================
# PART 8: Sphere-specific helpers
# ---------------------------------------------------------------------------
# These are ground-truth functions for S^2 used only for validation;
# they play no role in the Atlas-Learn algorithm itself.
# ===========================================================================

# Great-circle geodesic distance between two unit vectors (in radians).
# d(p, q) = arccos(p . q), clamped to [-1, 1] to avoid NaN from floating point.
sphere_dist &lt;- function(p, q) {
  acos(pmax(pmin(sum(p * q), 1.0), -1.0))
}

# Unit tangent vector at p pointing toward q along the great circle.
# Computed by projecting q onto the tangent plane at p and normalizing.
geodesic_direction_sphere &lt;- function(p, q) {
  v   &lt;- q - sum(p * q) * p   # remove radial component
  nrm &lt;- sqrt(sum(v^2))
  if (nrm &lt; 1e-12) return(NULL)
  v / nrm
}

# Run the Atlas geodesic from p toward q for the exact number of steps
# needed to cover the true great-circle distance, then measure endpoint error.
atlas_endpoint_error &lt;- function(atlas, p, q, step_size = 0.02) {
  d   &lt;- sphere_dist(p, q)
  dir &lt;- geodesic_direction_sphere(p, q)

  if (d &lt; 1e-6 || is.null(dir)) {
    return(tibble(true_dist = d, endpoint_error = 0, n_steps = 0L))
  }

  n_steps  &lt;- max(1L, round(d / step_size))
  path     &lt;- atlas_geodesic(atlas, p, dir,
                              n_steps = n_steps, step_size = step_size)

  endpoint &lt;- as.numeric(path[nrow(path), c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)])
  # Re-project endpoint onto S^2 before measuring angle error.
  # This separates geodesic-direction error from on-manifold drift.
  endpoint &lt;- endpoint / sqrt(sum(endpoint^2))

  tibble(
    true_dist      = d,
    endpoint_error = sphere_dist(endpoint, q),
    n_steps        = as.integer(n_steps)
  )
}</pre>
</details>
</div>
</section>
<section id="construct-the-atlas" class="level1">
<h1>Construct the Atlas</h1>
<p>This section of code constructs the sphere point cloud by sampling 2,000 points uniformly from <img src="https://latex.codecogs.com/png.latex?S%5E2"/> using the standard parametrization <img src="https://latex.codecogs.com/png.latex?(%5Ctheta,%20%5Cphi)"/> with <img src="https://latex.codecogs.com/png.latex?%5Ccos%5Cphi"/> drawn uniformly on <img src="https://latex.codecogs.com/png.latex?%5B-1,1%5D"/>, which avoids the pole-crowding that arises from drawing <img src="https://latex.codecogs.com/png.latex?%5Cphi"/> uniformly.</p>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>#| code-summary: &quot;Show the code&quot;

#| label: generate-sphere

set.seed(4471)
N     &lt;- 2000L
theta &lt;- runif(N, 0, 2 * pi)
phi   &lt;- acos(runif(N, -1, 1))   # uniform area measure on S^2

sphere_pts &lt;- tibble(
  x = sin(phi) * cos(theta),
  y = sin(phi) * sin(theta),
  z = cos(phi)
)

X_sphere &lt;- as.matrix(sphere_pts)</pre>
</details>
</div>
<p>This block of code fits an atlas with <img src="https://latex.codecogs.com/png.latex?k%20=%2025"/> charts to the sampled sphere. The fit is cached to <code>atlas_s2_cache.rds</code> to avoid unnecessarily repeating the this time consuming process during development sessions.</p>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>#| label: learn-atlas

atlas_cache &lt;- &quot;atlas_s2_cache.rds&quot;

if (file.exists(atlas_cache)) {
  message(&quot;Loading cached atlas from &quot;, atlas_cache)
  atlas_s2 &lt;- readRDS(atlas_cache)
} else {
  message(&quot;Cache not found — fitting atlas (k = 25) ...&quot;)
  atlas_s2 &lt;- atlas_learn(X_sphere, k = 25L)
  saveRDS(atlas_s2, atlas_cache)
  message(&quot;Atlas saved to &quot;, atlas_cache)
}

# Attach chart labels to the point-cloud tibble for visualization
sphere_pts &lt;- sphere_pts |&gt;
  mutate(chart = factor(atlas_s2$clustering))</pre>
</details>
</div>
</section><section id="the-atlas-data-structure" class="level2">
<h2 class="anchored" data-anchor-id="the-atlas-data-structure">The atlas data structure</h2>
<p>In this section we provide an overview of the structure of the Atlas. The<code>atlas_learn()</code> function returns an S3 object of class <code>&quot;atlas&quot;</code> containing a nested list of 25 <code>&quot;atlas_chart&quot;</code> objects, one per cluster.</p>
<pre>atlas_s2
├── k              integer          number of charts (25)
├── n_points       integer          total data points (2000)
├── ambient_dim    integer          dimension of embedding space (3 = D)
├── intrinsic_dim  integer          dimension of the manifold (2 = d)
├── clustering     integer[2000]    chart label for every data point
└── charts         list[25]         one atlas_chart per cluster
    └── charts[[j]]
        ├── mean     numeric[3]     centroid of cluster j in R³
        ├── L        numeric[3×2]   orthonormal tangent basis (columns = v₁, v₂)
        ├── M        numeric[3×1]   unit surface normal = v₃  [For d=2: single vector]
        ├── K        numeric[1×3]   quadratic curvature coefficients  [for d=2: 1 row]
        ├── ell_A    numeric[2×2]   ellipsoidal domain matrix  [d=2 specialisation: 2×2]
        ├── idx      integer[n_j]   row indices into the N×3 data matrix
        └── n_points integer        cluster size n_j</pre>
<p>Here are the values for each field of the <img src="https://latex.codecogs.com/png.latex?k%20=%2025"/> charts.</p>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>purrr::imap_dfr(atlas_s2$charts, function(ch, j) {
  tibble(
    Chart  = j,
    `n pts` = ch$n_points,
    `centre (x,y,z)` = paste(round(ch$mean, 2), collapse = &quot;, &quot;),
    # For the unit sphere, M should align with the radial direction (M·r̂ ≈ 1)
    `M · r̂`  = round(abs(sum(ch$M * ch$mean / sqrt(sum(ch$mean^2)))), 3),
    # Quadratic curvature coefficients
    k1 = round(ch$K[1], 3),
    k2 = round(ch$K[2], 3),
    k3 = round(ch$K[3], 3),
    # Semi-axes of the 2D ellipsoidal domain (1/sqrt of eigenvalues of ell_A)
    `ell axis a` = round(1 / sqrt(eigen(ch$ell_A, only.values=TRUE)$values[2]), 3),
    `ell axis b` = round(1 / sqrt(eigen(ch$ell_A, only.values=TRUE)$values[1]), 3)
  )
}) |&gt; print(n = 25)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 25 × 9
   Chart `n pts` `centre (x,y,z)`    `M · r̂`     k1     k2     k3 `ell axis a`
   &lt;int&gt;   &lt;int&gt; &lt;chr&gt;                 &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;        &lt;dbl&gt;
 1     1      82 0.91, 0.3, -0.16      0.999 -0.216 -0.078 -0.068        0.498
 2     2      70 0.35, -0.9, -0.06     1     -0.136  0     -0.163        0.486
 3     3      84 -0.64, 0.71, -0.13    1      0.182 -0.031  0.101        0.44 
 4     4      73 -0.04, 0.95, -0.12    1      0.162 -0.024  0.112        0.463
 5     5      81 -0.75, 0.4, 0.46      1      0.115 -0.056  0.184        0.431
 6     6      73 0.28, 0.6, -0.7       0.999 -0.203  0.016 -0.114        0.518
 7     7      78 -0.93, 0.07, -0.22    1      0.169 -0.087  0.055        0.46 
 8     8      86 -0.08, -0.3, 0.91     1     -0.237  0.051  0.099        0.53 
 9     9      73 -0.18, -0.87, -0.38   1      0.111 -0.036  0.127        0.421
10    10      85 0.87, 0.07, 0.4       1     -0.148 -0.002 -0.13         0.471
11    11      75 -0.36, 0.21, 0.87     1      0.2   -0.034  0.054        0.556
12    12      67 -0.79, -0.52, -0.21   1      0.178 -0.001  0.109        0.431
13    13      84 0.04, 0.05, -0.96     1      0.182  0.012  0.081        0.446
14    14      66 -0.39, -0.83, 0.31    1     -0.189 -0.006 -0.066        0.511
15    15      84 0.28, -0.55, -0.74    1      0.132 -0.066  0.105        0.431
16    16      87 -0.52, 0.43, -0.68    1     -0.251  0.009  0.026        0.511
17    17     100 0.83, -0.44, -0.2     1      0.144  0.082  0.133        0.526
18    18      68 0.07, -0.8, 0.55      1     -0.24   0.072 -0.088        0.474
19    19      91 0.59, 0.71, 0.24      0.999  0.229 -0.002 -0.003        0.537
20    20      93 -0.81, -0.26, 0.45    1      0.17   0.084  0.115        0.455
21    21      95 0.41, 0.27, 0.83      1      0.155 -0.087  0.056        0.532
22    22      66 -0.46, -0.32, -0.78   1     -0.179 -0.025 -0.072        0.525
23    23      70 -0.12, 0.78, 0.54     1      0.183  0.045  0.1          0.495
24    24      93 0.67, 0.07, -0.69     1      0.192 -0.001  0.116        0.453
25    25      76 0.62, -0.45, 0.58     1     -0.215 -0.039  0.028        0.512
# &#x2139; 1 more variable: `ell axis b` &lt;dbl&gt;</pre>
</div>
</div>
<p>Key things to notice:</p>
<ul>
<li><em><code>M · r̂</code></em> is uniformly <img src="https://latex.codecogs.com/png.latex?%5Capprox%201"/>, confirming the SVD-learned normal aligns with the sphere’s outward radial direction.</li>
<li><em>k1, k2, k3</em> encode local curvature. The Hessian <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Bpmatrix%7D2k_1%20&#038;%20k_2%20%5C%5C%20k_2%20&#038;%202k_3%5Cend%7Bpmatrix%7D"/> should have eigenvalues <img src="https://latex.codecogs.com/png.latex?%5Capprox%20%5Cpm%201"/> for a unit sphere (developed further in the curvature proof section).</li>
<li><em><code>axes a/b</code></em> semi-axes being nearly equal (near-circular domains) reflects the sphere’s isotropy.</li>
</ul>
<p>The following plot shows how the charts cover the sphere. Each point is colored by the chart it belongs to. With <img src="https://latex.codecogs.com/png.latex?k%20=%2025"/> charts, the sphere is tiled into roughly equal-area patches.</p>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>#| label: atlas-chart-plot

n_charts &lt;- length(unique(sphere_pts$chart))
chart_colors &lt;- colorRampPalette(RColorBrewer::brewer.pal(9, &quot;Set1&quot;))(n_charts)

plot_ly(
  data   = sphere_pts,
  x = ~x, y = ~y, z = ~z,
  type   = &quot;scatter3d&quot;,
  mode   = &quot;markers&quot;,
  color  = ~chart,
  colors = chart_colors,
  marker = list(size = 2.5, opacity = 0.6),
  text   = ~paste0(&quot;Chart &quot;, chart),
  hoverinfo = &quot;text&quot;
) |&gt;
  layout(
    title  = &quot;S² coloured by Atlas-Learn chart membership (k = 25)&quot;,
    scene  = list(
      xaxis = list(title = &quot;x&quot;),
      yaxis = list(title = &quot;y&quot;),
      zaxis = list(title = &quot;z&quot;),
      aspectmode = &quot;cube&quot;
    ),
    showlegend = FALSE
  )</pre>
</details>
<div class="cell-output-display">
<div class="plotly html-widget html-fill-item" id="htmlwidget-b25e2fe42113f49c5c85" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-b25e2fe42113f49c5c85">{"x":{"visdat":{"15e5137488d1f":["function () ","plotlyVisDat"]},"cur_data":"15e5137488d1f","attrs":{"15e5137488d1f":{"x":{},"y":{},"z":{},"mode":"markers","marker":{"size":2.5,"opacity":0.59999999999999998},"text":{},"hoverinfo":"text","color":{},"colors":["#E41A1C","#AA3B50","#705C83","#377EB8","#3E8E93","#459E6E","#4DAF4A","#658E67","#7E6E85","#984EA3","#BA5E6C","#DC6E36","#FF7F00","#FFA910","#FFD421","#FFFF33","#E1C62F","#C38E2B","#A65628","#C1645A","#DB728C","#F781BF","#D789B2","#B891A5","#999999"],"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"scatter3d"}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"title":"S² coloured by Atlas-Learn chart membership (k = 25)","scene":{"xaxis":{"title":"x"},"yaxis":{"title":"y"},"zaxis":{"title":"z"},"aspectmode":"cube"},"showlegend":false,"hovermode":"closest"},"source":"A","config":{"modeBarButtonsToAdd":["hoverclosest","hovercompare"],"showSendToCloud":false},"data":[{"x":[0.85828767384000682,0.9956897805207271,0.99323115081949587,0.8676198323669565,0.97921027852975862,0.97106691843995907,0.97099414275410367,0.9264296214191039,0.92602836371821051,0.94631824868215364,0.83836565837582477,0.94559710052508428,0.99718662155994309,0.68628531865677778,0.91477144170169356,0.91237874278582864,0.75568331345719564,0.92614493733463188,0.93421786431192033,0.96380406691522136,0.91485148947180195,0.97227986280172096,0.80436992454472001,0.97506008197248228,0.91681537697953963,0.96962598816887402,0.89790344126080424,0.89999940077217899,0.90414486383917303,0.76528130584656462,0.93718361192964683,0.88591715120099523,0.99417155670331492,0.9299646682336784,0.92722896150263834,0.97901864850868292,0.90825719836036189,0.97579090889948694,0.99678531531469616,0.99431267307705939,0.95427991071964613,0.97566576838449459,0.89460957495896865,0.6759952351808477,0.71068212412284482,0.97895803877417142,0.95962452099787965,0.89688507496210923,0.97629297134894011,0.69693350868881077,0.95642167148182344,0.87517180110813952,0.93210489544591046,0.98204848419001833,0.91556261845095865,0.70392247313588108,0.98881379159875138,0.95881626498295125,0.86916214186023888,0.86234405361916466,0.98065112518938247,0.99366895666272659,0.93911410213545521,0.95327058002271003,0.82654787539156971,0.92822519140821025,0.92320251468450876,0.91933791339216109,0.9975388650668302,0.80527095047761243,0.92623314022636938,0.93326028006307804,0.8852262176292649,0.76286489759342901,0.82127884561860653,0.88077505178087911,0.806696310029414,0.82202542553712199,0.83966487399880796,0.97356441717513487,0.95170249834950316,0.7322740680649571],"y":[0.23192239371423373,0.019844078959082558,0.0079721947325498022,0.37903009879356697,-0.0080366849489469181,0.23751006450572096,0.2187894986419246,0.30494599438712727,0.34941322455810792,0.0420586889110173,0.48506679431793531,0.21973423322251232,0.049127659893130136,0.65998770821924413,0.08813475659545919,0.33139482008981286,0.56633416961669669,0.34267143015266521,0.32653129263233599,0.26064823914992397,0.36487358820267934,0.22886919343528656,0.58091815876658859,0.21881463355556519,0.37420855687473548,0.21193540580838574,0.13583714759075396,0.30121221302896312,0.39411376892148936,0.5311285365424655,0.046833171166568342,0.45076971123291409,-0.076961769141358785,0.31821185314788586,0.37181264332709119,0.19864517515247815,0.32883874808176228,0.20076487179974165,0.072154069815583199,0.071531059327439514,0.29809070636610496,0.16726556992549366,0.44643608905280741,0.64913906240558839,0.58457006266936007,0.13079865456786718,-0.022491580969319053,0.36414756947076776,0.19139922680828059,0.67634311889245224,0.26259931524496899,0.46061132233351709,0.19243756622582514,0.14639625624468192,0.16702800637157356,0.67675358559970755,-0.052413204281694513,0.15525172004178062,0.44106949578501053,0.48051702965035326,0.19407485679895559,-0.025474930892972297,0.13707262769885203,0.2929825735102185,0.49680575433104468,0.34714850081783838,0.37940836702330666,0.3487224384979315,0.053867690655229998,0.5481968282936146,0.24692473608161561,0.13664279062350754,0.20939695631873159,0.59832318888226133,0.56989042770673115,0.29750786452079331,0.58593157225335557,0.50066215949291482,0.41670486222776143,0.060674825415295931,0.29877070014317492,0.6667726103280055],"z":[-0.45777098229154944,-0.090598418843001169,-0.11588065046817052,-0.32182605657726526,-0.20268853474408383,-0.024859790224581933,0.096444440074265045,-0.22076253546401847,-0.14276508009061206,-0.32048843801021565,-0.24870309187099324,-0.23992288392037159,-0.056615499779582044,-0.30566106457263231,-0.39424037607386714,-0.24029669770970929,-0.32895035808905942,0.15758123621344575,-0.14357679802924386,0.056073309388011722,-0.17295669065788388,-0.047861891798674908,-0.12459180271252987,-0.037121323868632254,0.13934676349163064,-0.12210170691832896,-0.41870954073965538,-0.31507504079490883,-0.16491331765428172,-0.36365780699998135,-0.34567836439236999,-0.10935021098703145,-0.075497032608836931,0.18413835112005486,-0.044741604942828285,0.0454156389459969,-0.25871594343334431,-0.086750033777207108,0.034825642593205045,0.078902570996433496,-0.02217617584392418,-0.14176931092515588,-0.019197051413357265,-0.34878205182030791,-0.39141890639439209,-0.15662972349673512,-0.2803835007362066,-0.2509854775853454,-0.1010859538801015,-0.23841910576447847,0.12766826525330555,0.14802543120458722,-0.30683586327359064,-0.11894919443875539,-0.36585070285946131,-0.21563333738595253,-0.13964290730655188,-0.2378408573567867,-0.22364004794508219,0.15958106843754663,-0.025657759513705795,-0.10942135285586112,-0.31508062127977599,0.073732034303248054,-0.2645801431499421,-0.13373822346329678,-0.06120790727436537,-0.18223737785592661,-0.044883009977638604,-0.22587371198460446,-0.28481633495539416,-0.33219572156667704,-0.41536424774676556,-0.24504389334470025,-0.026943610515445488,-0.36840789718553413,-0.076975684612989426,-0.27131458092480887,-0.3483101450838148,-0.22020647395402188,0.070699528791010449,-0.13852427713572976],"mode":"markers","marker":{"color":"rgba(228,26,28,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(228,26,28,1)"}},"text":["Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1","Chart 1"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"1","textfont":{"color":"rgba(228,26,28,1)"},"error_y":{"color":"rgba(228,26,28,1)"},"error_x":{"color":"rgba(228,26,28,1)"},"line":{"color":"rgba(228,26,28,1)"},"frame":null},{"x":[0.21797452921762139,0.36199219674571237,0.42146103819334213,0.52043856035575786,0.00052536318122974328,0.60580312107566969,0.46317913142595757,0.3363960299072104,0.39966603026347591,0.3733707724485994,0.36360312896169938,0.60142617831602552,0.6000771514587977,0.5367245816472741,0.39476428842571826,0.2213757442891775,0.35906412981019242,0.59361775496227864,0.34296063821515982,0.19722638600895381,0.061002298897287879,0.44730934178708209,0.19891059587328144,0.2986357250669564,0.18370515059096026,0.19295244997773464,0.13393239085812658,0.52123196219193768,0.40377458297140112,0.5921075158949094,0.25419971100262573,0.5086327114235295,0.41080129494532813,0.202664973375456,0.058946790789073993,0.32545651181472579,0.30277740946958337,0.12591639674398231,0.23456280735478563,0.4160805834428169,0.43236083345479315,-0.0066324754135548853,0.37478763875601789,0.34462952211867032,0.60628428505063015,0.51158553719388633,0.19344131856935776,0.4103776691510056,-0.0090196028510151609,0.3813982929592889,0.54119755139737469,0.56933086470290961,0.34154771285449192,0.23087064304598023,0.47044983866678858,0.092909417717534179,0.64347422729976755,0.25270344698198943,0.57217991011057434,0.533096728259848,0.47899619874655014,0.34029377095598556,0.61312380670108646,0.34350768127425674,0.29365959012725146,0.41977286839554412,0.56431951087107757,0.022399319400934588,0.039410886272324801,0.33218886712152979],"y":[-0.96597843163876962,-0.92315542307204057,-0.90635960171050078,-0.82671073704175713,-0.99986634306586331,-0.79559410200354252,-0.86109165640248431,-0.92706301052524342,-0.85315532967413088,-0.9241437919174843,-0.92264295865323098,-0.78130990136668832,-0.79821889621009734,-0.84324689285250798,-0.9146981349993577,-0.95565652428285608,-0.93165383742558694,-0.8026350099867251,-0.90885081132054502,-0.96507919934857944,-0.99729388045755174,-0.82238357895411485,-0.89360022035879993,-0.90666180967711851,-0.97176101422702466,-0.95481756759329939,-0.98017798984967519,-0.78855862438054303,-0.85741650365914068,-0.80566236090097687,-0.95898994058048337,-0.85959343992495474,-0.90237892422703769,-0.9388202032703955,-0.99756862147167147,-0.87266870938102303,-0.94957987754622186,-0.97734877207428716,-0.97167468792542899,-0.87145060719563794,-0.90060945300656137,-0.99102875037061589,-0.91963327405745676,-0.92544933165371945,-0.77160943177288377,-0.79471964356257618,-0.91835678205778759,-0.90454912079914751,-0.99987205540862001,-0.90895624787634743,-0.83614249742776459,-0.81686447722654854,-0.92254343074652534,-0.97205241818801846,-0.81680705391797059,-0.99242093361820749,-0.75552643020391508,-0.96723259213887225,-0.80516081978298193,-0.83812330467734386,-0.86015660265373228,-0.93759292154548934,-0.78550423750255871,-0.93870854283548122,-0.87316856781994567,-0.8182779855371709,-0.77770412909642406,-0.99934725045811235,-0.99916901404456493,-0.90353298775443447],"z":[0.13918611360713837,-0.12940523307770496,0.029713055584579661,-0.21375935571268201,0.01634074654430152,0.0057099345140160431,-0.20972899533808231,0.16550494125112894,-0.3352507236413656,0.080947625916451313,0.12854079296812415,-0.16685739438980809,-0.052478624507784823,0.029349636286497269,-0.086535994894802556,-0.19419934973120689,-0.055624436121433828,-0.058266643434762941,0.23741988837718972,-0.17240618215873824,-0.041032128036022061,-0.35156735032796849,-0.4023844194598496,-0.29796151863411069,-0.14809776796028015,-0.22603708691895005,0.14599048905074599,-0.32633194373920549,0.3190658637322486,-0.017800276633352086,0.1253826175816358,-0.048906879965215783,0.13020972767844805,-0.27846639743074775,-0.037176892161369289,-0.36404310539364815,0.08138732379302388,0.17010126030072578,0.028785243630409359,0.25971289398148667,-0.044347749091684693,0.13348417961969966,0.11751113599166274,0.15739767160266641,-0.19245324237272141,0.32665107725188142,-0.34525537956505997,-0.11567651759833084,0.013210585806518892,-0.16832671081647263,0.089280089363455911,-0.092708102893084204,-0.1796072889119385,0.042577487882226837,-0.3339209277182818,-0.080427175853401353,0.12296638591215021,0.024537330493331119,-0.1559686018154024,-0.11557337269186967,0.17519492143765109,0.071551819797605262,-0.08403743524104354,-0.028787923045456319,-0.38902531703934068,-0.3926982013508678,-0.27698335191234941,-0.028343314770609088,0.010395355522632587,-0.27070038160309196],"mode":"markers","marker":{"color":"rgba(170,59,80,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(170,59,80,1)"}},"text":["Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2","Chart 2"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"2","textfont":{"color":"rgba(170,59,80,1)"},"error_y":{"color":"rgba(170,59,80,1)"},"error_x":{"color":"rgba(170,59,80,1)"},"line":{"color":"rgba(170,59,80,1)"},"frame":null},{"x":[-0.80784434337397881,-0.62836918113759199,-0.55580508785188998,-0.51037844630435547,-0.58119754743399166,-0.72894933354297631,-0.64116582787831033,-0.50173122555134653,-0.76929056854116551,-0.70713814566700584,-0.66186722291835987,-0.6592019554778078,-0.66066718286953963,-0.38548292643820492,-0.6555538113326248,-0.64094212189484123,-0.54553497782691351,-0.82667875514486822,-0.62641882568490181,-0.62474716527985608,-0.73603297401845125,-0.82883797518232893,-0.62109724889318485,-0.54003273837451415,-0.68396857624965446,-0.48155317981038637,-0.82710389287715247,-0.67649029365813695,-0.40007507345765464,-0.77510387806463865,-0.68385096764238762,-0.51525879933826979,-0.81105204224062144,-0.62023985455598574,-0.46897008183358196,-0.71885460467061013,-0.4365579232105144,-0.60179658050697982,-0.88233694523059447,-0.5264531500406634,-0.8390300232870368,-0.3970294321812699,-0.4539572454854105,-0.44290095729510859,-0.75141332893656032,-0.81054617055984746,-0.75051647505408536,-0.77613053345835115,-0.7952886891201506,-0.64410931149836481,-0.76587631258866906,-0.48325032204908386,-0.72292657380466452,-0.79473824122717462,-0.86069679639822294,-0.51708921657932228,-0.81413902875167665,-0.76222825293580598,-0.87867362728850296,-0.82765334597530038,-0.81321208755018937,-0.55624141549033168,-0.67070420655826057,-0.66597514492022825,-0.4976808668439649,-0.51352823760932631,-0.6206782043561011,-0.40565644560206293,-0.49805903048027533,-0.62303170220710924,-0.55678409697652598,-0.76946900792473061,-0.7907481121839951,-0.41661099411889929,-0.42849557231598673,-0.53722620428931411,-0.69833485140313822,-0.54793276903301957,-0.44656020134668467,-0.61837496098550593,-0.70422790366053445,-0.6281595044990933,-0.54270395341419464,-0.76159038867312301],"y":[0.53243092866635189,0.64989058202786121,0.79134437052927209,0.8311133200346742,0.7768664979426797,0.67765866776166417,0.76305359942056783,0.8443925331196529,0.63488698580757208,0.69254216335107122,0.72044725443260016,0.73468416650787838,0.72529052477615219,0.84862908597727049,0.71860683230310773,0.62390553166584972,0.81129714843497147,0.56264870935098232,0.69761999690568477,0.64251804269813728,0.66173249534435963,0.54169050560255771,0.78317359956933985,0.83630577574730169,0.68760299262150482,0.87565814909129847,0.54070803489083019,0.72019956001025442,0.87411763394465536,0.63165998263180079,0.66240911619737497,0.73022548591652015,0.55124978792302826,0.78410993083992486,0.88085022599095875,0.60735182435441937,0.84277384677139155,0.79859396066816291,0.46246452061806714,0.77336697125137521,0.51864221504178376,0.90926461135907177,0.7837533431860948,0.8643944970393499,0.63520332218817399,0.48034183584609197,0.65887640055159191,0.63056767603943464,0.60617421343275479,0.76482761500159624,0.52480339300252588,0.86578430902632086,0.67189178678314232,0.54402949534270617,0.4401749602202234,0.82424772515950095,0.57363717584913954,0.64618888613673697,0.47677383015490526,0.42634497336702243,0.48066165394449728,0.74656561166788216,0.68386708564180776,0.65860795297649277,0.85709931246984383,0.85100203083002324,0.78402064133459548,0.8312304389341747,0.82403440080073587,0.74227059795088013,0.82836984365230226,0.63703917830695389,0.61198132815992534,0.89288049270619674,0.84148300995555136,0.84026992826469693,0.71569172270836545,0.81988128490655776,0.88875938858496906,0.78536070553163206,0.70310657803779031,0.66493305075984255,0.82248233885695221,0.63059124843001468],"z":[-0.2527940329164266,-0.42754462175071251,0.25466603925451647,-0.22082683444023132,-0.24225576408207422,-0.09701339667662974,-0.081581772305071243,-0.18779517384245986,0.071488015819340986,0.14262185990810392,0.2070930534973742,-0.16028710920363648,-0.19357822230085731,-0.3622590065933764,0.23206339869648232,-0.44714101124554873,-0.2102106679230929,0.0053540319204331485,-0.34774386370554561,-0.44369082059711218,-0.14270797371864316,-0.13999645365402091,-0.029619594104588018,0.09464296558871868,-0.24369881255552162,-0.036460128147154949,-0.15340785961598163,-0.15386187424883238,-0.27542385086417198,-0.014820409938693041,-0.30587908858433355,-0.44864140404388309,-0.19575049448758366,0.021774737164378184,0.064575085882097635,-0.33818902820348745,-0.31488001346588135,0.0094107212498785644,-0.087224321439862113,-0.35320052178576594,-0.16443501086905601,-0.12492196168750518,-0.42385553708299978,-0.23803549213334921,0.17859100922942167,-0.33509196667000651,0.051057903096079965,0.0024085198529064625,0.0082899895496666553,-0.012724549975246123,-0.37150352960452432,0.12994866864755764,0.16105463588610305,-0.2691152840852738,-0.25582616962492455,0.2307258755899966,-0.09010012401267882,-0.038052777294069411,0.024887177161872536,-0.36499301716685295,-0.32810131832957257,-0.36501407530158758,-0.2871962334029376,-0.35030368342995627,-0.13302076281979672,0.1099285800009966,-0.0083785797469318017,-0.38012998504564149,-0.27000834885984648,-0.24671006761491288,-0.061602528207003938,-0.045809727627783972,0.014009891543537515,-0.17087921267375333,-0.32908644527196884,0.07303734030574556,0.010667397174984286,0.16602517664432528,0.10339601431041959,-0.028652571141719829,-0.09850989608094099,-0.4040787979029119,0.17027983209118244,-0.14944817591458562],"mode":"markers","marker":{"color":"rgba(112,92,131,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(112,92,131,1)"}},"text":["Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3","Chart 3"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"3","textfont":{"color":"rgba(112,92,131,1)"},"error_y":{"color":"rgba(112,92,131,1)"},"error_x":{"color":"rgba(112,92,131,1)"},"line":{"color":"rgba(112,92,131,1)"},"frame":null},{"x":[0.08732386984206339,-0.22580512344622267,-0.17822338343639457,-0.042613746794336452,-0.087844784514154714,-0.080907739303991616,-0.2658804553818539,-0.21377188528955529,-0.083603685115782603,0.33807161666225449,0.14622337401408486,-0.30305783956064475,-0.014183280315289799,-0.10606045509716983,-0.20991146392707799,0.24514214370888132,0.11848743779659664,0.10175852695064737,-0.22725921245366371,0.10182778511643226,-0.31944440930348289,0.01242793196379346,0.25223243845152066,-0.34542932434751261,-0.058017808470416619,-0.08467839794947285,-0.30431009167243761,0.11438269270370667,-0.22645762274270875,-0.059190243697837465,0.2849495467010974,-0.27373304379745461,-0.27859224075407352,-0.37783042132080713,0.2903014026649115,0.18785334848259816,-0.029158846856530671,0.041395339227985356,-0.034527271156466297,0.16757833495758853,0.064247304217548082,-0.12430587770405202,0.013352075369761799,-0.055810907833129586,0.027256214384781083,0.18132108130565724,0.053003547420055655,0.20899209516193168,-0.24172123450011901,-0.34168407155349656,-0.33880649357711407,-0.053206764941387076,0.26130898407587544,-0.091078606618733626,-0.21459455380615547,-0.12016687584483905,0.18853431916748442,0.31623478546965622,-0.067141129916429482,-0.33624160168904821,0.17121677738089619,0.093701285319188143,-0.26844558334839208,0.33390154271318656,0.10415852323346038,-0.20091029299146343,-0.1843841189040481,0.15885547088762506,-0.22908810431738316,0.21186861959585357,-0.33810934656423114,0.114392126403549,-0.15335590397314647],"y":[0.93692122508253894,0.97372017377953157,0.89659393794625264,0.88039144487222254,0.99489303255257411,0.88733319724070825,0.90448481883621223,0.9767238323036298,0.99524880781642511,0.89568484619088828,0.98441279873986343,0.94494454392303007,0.99490947739889413,0.96966247405381556,0.95715100342436155,0.96716242364202187,0.96228890063927697,0.99248540508294414,0.94218697616269409,0.96902750269642768,0.94760229579389643,0.95000094060597862,0.89929486964887206,0.92878912093324684,0.86402905868195479,0.97854060289408107,0.95013219743473853,0.99300071734820716,0.96648916390802242,0.9778652334001412,0.89744151727166177,0.83463879467888169,0.96011751530463685,0.91879575900335075,0.95308332122302175,0.92898467199195967,0.95684488249935395,0.99347214031116837,0.90896828396812135,0.95038200595303457,0.95700822727667578,0.99209130244105115,0.97974468663487368,0.87699942694339783,0.99908272974119916,0.98191511253542563,0.99099434712921275,0.96156332880910567,0.9547559780473216,0.93034802382988802,0.89540055451769907,0.99797249300912616,0.96524110895358439,0.88794486417199991,0.92583517980515562,0.99091708208754892,0.97795942489787535,0.94718561226412756,0.99609877613335118,0.93489332931884472,0.95321006056506719,0.99528224655305375,0.95388704128261359,0.90570990691956865,0.99427262913060099,0.94811181875113049,0.88284457990447218,0.95390279319540328,0.95556711359124336,0.88440362958823859,0.88955839819269256,0.99165981293379968,0.97049321509341158],"z":[-0.33845702791586507,-0.029682813212275366,-0.40540811046957975,-0.47232930501922948,0.049710638821125003,-0.45397547818720324,-0.33348882431164373,-0.017667384352534964,0.049902248196303976,-0.28888793382793654,-0.097724953666329342,-0.12343319971114389,-0.099769566208124147,0.22024047374725345,-0.19949720287695513,-0.067098253872245397,-0.24486894207075227,-0.06795530067756772,-0.2462457153014837,0.22498177969828245,-0.0022712047211824469,-0.31199961435049772,-0.35727795120328659,0.13427341775968668,-0.50008771196007729,0.18785062525421387,-0.068148187827318754,0.029430850408971355,0.12089516595005989,0.20068906387314203,-0.33675284544005973,-0.47796265874058019,0.023679531179368666,0.11427390761673455,0.085774579085409614,-0.31890217727050169,-0.28913255175575608,-0.1062992583028972,-0.41543293837457901,0.26209071790799515,-0.28285603551194066,0.017403918784111877,0.19980508275330081,-0.47723908862099051,0.033027229830622687,0.054455277509987396,-0.12296677567064768,0.17809623479843142,-0.1732393349520861,-0.13305844506248818,-0.28890830185264343,-0.034927687607705475,-0.0052169365808367929,-0.45084210718050588,-0.31109226495027537,-0.060359410010278232,0.089722760487347883,0.053244496230036013,-0.057265145238488767,-0.11364879272878159,-0.24914934393018479,0.025165826547890922,0.1342999674379827,-0.26115000341087563,-0.023936185985803576,-0.24641232378780831,-0.43195826699957246,-0.2546260012313723,0.18549968162551533,-0.41586284758523107,-0.30719362944364542,-0.059373873285949265,-0.18607763480395079],"mode":"markers","marker":{"color":"rgba(55,126,184,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(55,126,184,1)"}},"text":["Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4","Chart 4"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"4","textfont":{"color":"rgba(55,126,184,1)"},"error_y":{"color":"rgba(55,126,184,1)"},"error_x":{"color":"rgba(55,126,184,1)"},"line":{"color":"rgba(55,126,184,1)"},"frame":null},{"x":[-0.78606200508677559,-0.70607542753722252,-0.5742089148168884,-0.78243261271195208,-0.64320148767003982,-0.70978284732459085,-0.77790473654341763,-0.87684672040471789,-0.73478216532212781,-0.81068856934503231,-0.88868612224799859,-0.83904552222533779,-0.74936935286096351,-0.65460948801795116,-0.83298282535201451,-0.77627574151229739,-0.87551841747588777,-0.76565149794484044,-0.86681413008687824,-0.96404446385029796,-0.91044039274168254,-0.76936657542614595,-0.91606949670961413,-0.70888994014407802,-0.72775725458732288,-0.85890537837833147,-0.71556709502509785,-0.57799226608700449,-0.83432635088120388,-0.65097352461568236,-0.66581703786719515,-0.67988805956736986,-0.92429259235580219,-0.74349839540616891,-0.51916104135883567,-0.59359095171553511,-0.49061550651565883,-0.65544030750452231,-0.54605534485849327,-0.90040041522896885,-0.84223695696591738,-0.84723870107964627,-0.63893416799143332,-0.82588901809539617,-0.74397429147665972,-0.60099899418517211,-0.78574741370265766,-0.63974918057834584,-0.84797929138330785,-0.65230091531189693,-0.88923001451052086,-0.79709365989781411,-0.78356322163239889,-0.86437898739822316,-0.80956049779089845,-0.72505884253543285,-0.64080499748832032,-0.81597587600002752,-0.78463117919063607,-0.88273541811024747,-0.69722843395531664,-0.79074354644766176,-0.49266694426968538,-0.53027304978523082,-0.9380322360581278,-0.70693038063415348,-0.58314209437229347,-0.69435288691487329,-0.89210800143086222,-0.70743719824515361,-0.87456115034197957,-0.53771320476540874,-0.72053475806216316,-0.71945884317389253,-0.94159717924977027,-0.79557711180189339,-0.65066684738467806,-0.95932612775619974,-0.65364956861835888,-0.81463428147287642,-0.85489555415746432],"y":[0.27058863319515314,0.62747196155411444,0.59246261710753023,0.34221532463609078,0.58242242571179881,0.19756154567246942,0.57161907524608857,0.30976948536985549,0.09886189010511498,0.48802822087473574,0.39284588069664922,0.19623030205438455,0.60055873021206352,0.50855049206214231,0.47092307105230574,0.39354650084255588,0.40049291319557034,0.54835770700237507,0.077498223739922059,0.21325975070598716,0.34461400881871362,0.16454317459045081,0.34510699919356136,0.50343272153574958,0.3669737824840803,0.34792380178382115,0.35596515405870371,0.67814128997328083,0.3976128583531755,0.54550132510454175,0.32983736638040945,0.50967002176236853,0.35778377695755237,0.25318831661899116,0.55078295183186143,0.48914358869762409,0.58246548853726632,0.44472947130018686,0.68368899514383741,0.40376285536443807,0.45135110476525925,0.44720694864315991,0.63826388515521981,0.31981360510274831,0.25990153491353796,0.69490261882100868,0.18542160323741741,0.66603157978593652,0.098019426766741122,0.65080976843191451,0.37608559798443103,0.28338120389450905,0.36404274062916603,0.39205034141186557,0.15573789745837083,0.62208182183428495,0.3955424192755409,0.21835104463057114,0.37706352132112936,0.3622852988184157,0.12527094364706262,0.48843027074315087,0.6139109109379316,0.63903532899380189,0.30503864245274254,0.35217964172211774,0.38225272035998142,0.080505151034594477,0.3232115357363281,0.069854926636283746,0.1505980625102821,0.7127867811105677,0.21539551723166844,0.65239812197295466,0.25177609058927075,0.34320970833010161,0.27082578731690005,0.12704782667602046,0.38956284593359558,0.5092401742454371,0.49842551510239153],"z":[0.55577721772715449,0.32823227765038615,0.56504174135625362,0.52027673227712523,0.49706736393272882,0.67614920344203711,0.26099014095962059,0.36767199356108904,0.67105997959151864,0.32343855546787392,0.23645103117451075,0.50744091300293803,0.2788813090883196,0.55934141203761112,0.29046699265018111,0.49246027739718556,0.27032004576176411,0.33627609023824345,0.49257211573421966,0.15855141263455158,0.22877822490409014,0.61725247371941816,0.20424944628030076,0.49399448139593011,0.57939591072499763,0.3758065714500845,0.60104287834838033,0.45392657024785871,0.38184231705963623,0.52788424352183938,0.66924956766888499,0.52724633272737276,0.13294349424540991,0.61895542033016682,0.65353649714961648,0.63905268302187324,0.64809750765562069,0.61042092088609934,0.48414142383262526,0.16203286359086641,0.29482043441385042,0.28669239347800618,0.42939765006303793,0.46435610018670559,0.61559194745495915,0.39485511183738708,0.59010154288262129,0.38356605777516961,0.52088704472407699,0.38852305104956025,0.26044117240235215,0.53323239833116531,0.50348938489332795,0.31487377779558295,0.56600133189931512,0.29547230293974286,0.65796287870034587,0.53526273090392351,0.49211483774706721,0.29921153699979197,0.70581846218556166,0.36899934196844708,0.61675981990993034,0.55717532336711895,0.16445957170799386,0.61336688604205858,0.71681807702407241,0.7151174652390182,0.31571762217208749,0.70331564731895935,0.46093710837885743,0.45033267047256226,0.65911640366539359,0.23823447152972227,0.2236147406511009,0.49926361301913863,0.70942656183615327,0.25209567695856105,0.64883205108344555,0.27757058991119266,0.14396387524902826],"mode":"markers","marker":{"color":"rgba(62,142,147,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(62,142,147,1)"}},"text":["Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5","Chart 5"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"5","textfont":{"color":"rgba(62,142,147,1)"},"error_y":{"color":"rgba(62,142,147,1)"},"error_x":{"color":"rgba(62,142,147,1)"},"line":{"color":"rgba(62,142,147,1)"},"frame":null},{"x":[0.10383887873408955,0.48157435039227775,0.59716695818295529,0.3946098975765403,0.56322208008035779,0.32371651992007772,0.3557492862582981,0.45500329140759671,0.37677792193106174,0.25812356620538568,0.083017516189382509,0.31348591297353401,0.40847199419458546,0.29186162698274365,0.44771319862053049,0.58172550335051743,0.063066314369701906,0.47731304472792935,0.59581540905319474,-0.087914790692266237,0.25683242935922779,-0.0074670290486821894,0.30459792078265863,0.38034610004887864,0.22536434649498752,0.13797729965961258,0.65995481705982217,0.056275460327441885,0.62801209868326557,0.37335348034513216,0.07392935625097409,0.29299382888879139,0.011182126121374416,0.38338520985177921,0.26997194637829802,0.32893131508453671,0.20718723908590395,0.46450259101909597,-0.014410662100171962,0.26303850933143252,0.18590716732960294,0.38598614476298576,0.21534176496312118,0.16727746289273596,0.11543688545254863,0.13927292943250591,0.25272992215416312,0.53554202726984212,0.64591889941240199,-0.031689445099527008,-0.089641892151071809,0.15618769344635716,0.34174497103905155,0.085669820844694322,0.31304199402807009,0.49497804343152213,0.21592464543296155,0.21360066903064845,0.20824569052863598,-0.11317139235432057,-0.020461909454804587,0.44276648726020285,0.32529573207323648,-0.11362216449264692,0.048017163757659416,0.08436451153718609,0.51466321524971459,0.44104438832082543,0.17818114426183548,0.2423128254687304,0.49566329711967089,0.33623625635568322,0.36044110939803242],"y":[0.79566664474821058,0.36968591384765032,0.72437600924591172,0.59075820799322765,0.64235552366518478,0.67640313309353695,0.64701185128535232,0.40445594131760804,0.73841295990137579,0.32817361269906614,0.5335999146470144,0.57956954817484485,0.51947132863647105,0.62049582168027406,0.52115405604725851,0.51053125018907297,0.62422781125507509,0.74677556023876024,0.63771530071907301,0.76692723372210192,0.44747952068178332,0.44729418143489152,0.34399038783991459,0.33426996169202822,0.47357073825385665,0.49863075207487895,0.51330806938335916,0.55594615795830016,0.56585567380492685,0.39415057262907099,0.55470978046024189,0.33826403391546828,0.79370810650121848,0.40824477519794122,0.77320936958355058,0.86917653564626252,0.70481041158553326,0.61364428260185011,0.46789389462924541,0.49283155432150916,0.72588546895562134,0.67524185760789357,0.77852751778356444,0.8753845648755324,0.59278650943645239,0.60728576599993833,0.64483841612279391,0.71942309292938078,0.57694706894127079,0.72141195858623153,0.76534812375263095,0.61912665922104826,0.5961160998204954,0.48429578874321627,0.87190560889849711,0.8030067901915332,0.71221180454250732,0.67238930481027059,0.7212029197800337,0.65581528291657343,0.62740493782695972,0.4239624684612448,0.40485822479359673,0.5331852477027802,0.72897021864327438,0.63543428917713773,0.65496985537325414,0.60082425545500495,0.60870520601933809,0.47771664898133775,0.56550518297807939,0.68811547856973287,0.39963630250923127],"z":[-0.5967680267058314,-0.79461844312027097,-0.34448660537600517,-0.70376684237271558,-0.51976943900808681,-0.66157873021438718,-0.67440203856676806,-0.79333939542993914,-0.55927157821133722,-0.90866622282192111,-0.84165267366915941,-0.75221387995406996,-0.75053325621411204,-0.72787480102851998,-0.72660260228440166,-0.6332087186165154,-0.77869267342612147,-0.46313984924927343,-0.48818356543779368,-0.63568357517942786,-0.85662079229950916,-0.89435572270303976,-0.88819520361721516,-0.86232269881293189,-0.85143506340682518,-0.85576260602101684,-0.54861139738932263,-0.82931112498044979,-0.53423605300486066,-0.83979313215240836,-0.82875298475846637,-0.89427739521488547,-0.60819602245464921,-0.8284636288881303,-0.57381392363458883,-0.36923751141875966,-0.67846571886911999,-0.6384967011399566,-0.88366715237498283,-0.82941413158550847,-0.66221507918089617,-0.62854047585278761,-0.5895106685347854,-0.45356379263103019,-0.79704358708113443,-0.78218095703050483,-0.72132586501538754,-0.44229532033205043,-0.49992084875702858,-0.69178072037175287,-0.63734337734058477,-0.76960222469642769,-0.72653697105124604,-0.87069987412542105,-0.37654391396790732,-0.33192895539104927,-0.66793030546978116,-0.70870824530720711,-0.66068152757361531,-0.74639034736901511,-0.77842427650466561,-0.79007193539291631,-0.8545598308555783,-0.83833376131951809,-0.68285926245152961,-0.76753234025090944,-0.55329581908881664,-0.66670087864622463,-0.77312964759767056,-0.84443549066782009,-0.65918266354128729,-0.64299476519227028,-0.84283630223944783],"mode":"markers","marker":{"color":"rgba(69,158,110,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(69,158,110,1)"}},"text":["Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6","Chart 6"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"6","textfont":{"color":"rgba(69,158,110,1)"},"error_y":{"color":"rgba(69,158,110,1)"},"error_x":{"color":"rgba(69,158,110,1)"},"line":{"color":"rgba(69,158,110,1)"},"frame":null},{"x":[-0.96151498933112001,-0.96821556546481113,-0.93286677210069457,-0.99435675779007482,-0.97611525409026634,-0.91089686297545269,-0.93360386308169696,-0.94233388818847563,-0.82922298810545558,-0.98539703575972559,-0.96680049300150994,-0.99459923036494513,-0.97556626685028802,-0.99702390396625284,-0.92874068231965656,-0.98537669111829373,-0.83816099614763684,-0.81535412604555557,-0.85546200880880874,-0.86403961215384606,-0.89439619061838294,-0.92130422957288149,-0.98869201491662895,-0.93145918896136204,-0.95211221270650348,-0.99472582258103737,-0.93516660453611977,-0.92051082406766471,-0.90617235437086674,-0.85404136466526714,-0.98406559394060511,-0.88710960109845816,-0.97190658089637416,-0.98406595336721614,-0.95785657492616127,-0.868253712765928,-0.97718346366816144,-0.99250789536720374,-0.85001889754844639,-0.95339924488057259,-0.84612542232573573,-0.86018049845040878,-0.95880956630541181,-0.92041939357256275,-0.95704428239561112,-0.94166057543510162,-0.99515626366903109,-0.90805499110628596,-0.98969401533468382,-0.94725855608325271,-0.91482030366812361,-0.90231044987858089,-0.99000343436024463,-0.97061798986010472,-0.98772901506752064,-0.986024090228801,-0.96245772727102408,-0.86990462395919255,-0.98970297955887054,-0.99386752566525172,-0.94464794742694602,-0.96060900672671723,-0.92550561796938846,-0.82895030819349158,-0.99257402010006079,-0.97995700597573132,-0.95032169647236464,-0.92176810385388064,-0.87920808020281549,-0.86230315285126835,-0.90060058567542867,-0.99409878071902347,-0.84446801945679495,-0.94351332820369294,-0.97438361201036749,-0.98966117167207779,-0.82537095970672003,-0.93661480190636859],"y":[0.27162400616401577,-0.22317517922687763,-0.12589278447234084,-0.018725063153571666,0.11691441774757072,-0.065329154755329522,0.1751947558628579,0.3289217798892326,0.067887302523594525,-0.15682804355721952,0.00024270386576435648,0.077932540049628532,0.21292367324581896,0.06947833016365354,0.098438197093208904,0.055887786192222302,0.39588891084737049,-0.059880080954923393,-0.18054404272297828,0.17092893240357671,0.3909278960289338,-0.092734343754216159,-0.01674173205897456,0.029575002062003638,-0.22311354516953641,-0.10224854359834699,0.32505660831587868,0.3380899251849192,0.36471195297337888,0.028717842944080935,-0.17309232012478382,-0.17141804257948143,0.22349775283118323,0.057576820023631306,-0.13705457414557398,0.31802188338420556,0.1467512226521682,-0.10057214480501227,0.039883640944804059,0.089913955192236064,0.021049628511153162,0.17904156061075302,0.28221398884690985,0.30129111620847415,0.28983419701724966,-0.22909979416420059,0.084467317940897901,-0.12699132559798776,-0.068819915781534263,0.24021339211675713,-0.25095042722226957,0.040040394590689199,0.040697339794647631,0.22530641347518815,-0.068074490942309412,0.14723029204284852,0.016759549182991627,0.24219902415249422,-0.087200376671180801,0.095741206221265668,-0.054316840073821038,-0.26038877051654435,-0.19460402918395836,-0.056823557446794043,-0.11841883812321291,0.12395147521269179,0.30117098803684361,0.32501622618406162,-0.083521437410480556,0.37071780071999078,0.34200792322532003,0.012763901301040024,-0.0014497385773381303,0.01280829327838864,0.02565159607649484,0.12275013903393742,0.22654980851328871,0.24722424967942838],"z":[0.04134397860616458,-0.11292235460132349,-0.33750643301755195,0.10442226892337213,0.18311206856742498,-0.4074297565966844,-0.31255499413236965,-0.061784349847584803,-0.55477973120287061,-0.066314754541963283,-0.25553228333592409,-0.068548451177775735,-0.0541661181487142,0.0334080313332379,-0.35742225218564277,-0.16096375975757823,-0.37517744442448009,-0.57585764303803422,-0.48537470074370498,-0.47351752733811736,-0.21732656145468343,-0.37762263976037502,-0.1490228641778229,-0.36264183232560759,-0.20905186049640173,-0.008110069204121783,-0.14071824029088009,-0.19584439042955631,-0.21409543557092536,-0.51941181439906359,0.040668852161616038,-0.42854685895144945,-0.073799407109618048,0.16822339082136756,-0.25244172709062701,-0.38078546710312366,0.15354659548029306,-0.069378104060888277,-0.5252401060424744,-0.28800236200913776,-0.53256800770759583,-0.47752866894006729,-0.032240968663245419,-0.2491019936278463,-0.0078980866819618953,-0.24655353371053923,-0.050291978288441778,-0.39913573674857616,-0.12557696923613548,-0.2121290978975594,-0.31642960524186492,-0.42922327388077991,0.13504416495561614,0.084485133644193397,-0.14056050824001434,0.077972652856260635,-0.27091371454298485,-0.42965751234442001,-0.11350817838683724,-0.05532596912235023,-0.32355824252590537,-0.097098014317452741,-0.32491325447335839,-0.5564283151179551,-0.02781714219599957,-0.15594966569915417,-0.078642921987921027,-0.2114427001215517,-0.46906004007905705,-0.34496606327593327,-0.2682334161363541,0.10772509919479485,-0.53560401638969779,-0.33108691778033955,-0.2234246456064283,-0.07418334484100339,-0.51714404486119736,-0.24825970921665433],"mode":"markers","marker":{"color":"rgba(77,175,74,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(77,175,74,1)"}},"text":["Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7","Chart 7"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"7","textfont":{"color":"rgba(77,175,74,1)"},"error_y":{"color":"rgba(77,175,74,1)"},"error_x":{"color":"rgba(77,175,74,1)"},"line":{"color":"rgba(77,175,74,1)"},"frame":null},{"x":[-0.3599167746767547,-0.23355958536654514,0.14838658894708182,-0.34488478508343873,-0.095756391840087857,-0.16834286316500499,-0.12050563202449165,0.20749764914986099,-0.032598720229986479,-0.3959433364716563,-0.17726337601225781,-0.069911664919915018,-0.30791339571233778,0.28931482908012368,-0.39368246328020717,0.15133592023670925,0.12168890466985878,-0.45288152308937624,0.16633869623673295,0.15637142509223448,-0.36528643382003334,-0.18581958682082866,-0.31965238385326133,0.15308092597573927,0.35937144215511346,-0.32699386470267833,-0.0092401412146512867,0.0044803837193941112,0.10222786988058062,-0.25121580266043153,-0.03264138936532629,-0.16663578125930351,0.27317361753423341,-0.31753947202769078,0.0023642690427844873,-0.28061015384287269,0.0092601691491279968,-0.18484390022355673,-0.042365543631822468,-0.14151097936349402,-0.13170915286232976,0.14716296860819089,0.0037470914567929365,0.092742457560282932,0.27392651688820052,0.29025488771289626,0.10905509447055499,-0.25390902082361039,-0.43123789786464739,0.24805404438416145,0.15865256297110544,0.097405979812016638,0.28781131174899288,-0.15506905728213893,-0.31517398051698459,0.16922277042500858,0.22969626221659872,-0.37885658863285471,-0.13931158154996781,-0.43040735181693779,0.1325684705028442,-0.047089625987290169,-0.45630367399585792,0.22019340543473195,-0.4221426489767281,-0.2795757569935553,-0.1012850637303867,-0.21067834601110458,-0.3072983759803472,-0.18009092909333441,0.34288181441548365,-0.18672429255576717,-0.11824492223498019,-0.050854560064148242,-0.035562547241419981,-0.39644703748016685,-0.03307012600721633,-0.07151453185685662,-0.061798749480164297,-0.17150504436407449,-0.12889001881111672,-0.10585102515971527,-0.1521792554212871,0.057959258413511475,-0.44501204316174714,0.016807322328133031],"y":[-0.556525241795089,-0.43138219209115564,-0.43602970859656226,-0.18769298880622631,-0.10781275395894056,-0.22579168218538853,-0.23272488706660507,-0.23459887649892805,-0.026001289673893615,-0.51831074573436808,-0.51345398394777853,-0.21487743605059176,-0.36087131960946855,-0.24411208858593994,-0.36192972274416846,-0.34523766690847685,-0.2477248748386624,-0.37534481001988357,-0.084059659195420536,-0.13825933756376294,-0.52445660991799603,-0.47993635597247819,-0.39635360649493545,-0.060510600290541144,-0.21610219080697796,-0.28609678663445681,-0.45358145898950264,-0.06765812038320658,-0.38319291186873594,-0.28928094053081904,-0.15277136266319005,-0.60216571903369043,-0.13311691474871445,-0.23341261201793181,0.029925670809208695,-0.56140966658517855,-0.47808491755822852,-0.07314924956552496,-0.43980562023065117,-0.33201658424840502,-0.15261576829200549,-0.09097381232907463,0.06191994981402018,-0.52529563695942449,-0.37121290344764507,-0.40005750069485796,-0.1945952723170836,-0.15119778630806766,-0.46838912006173777,-0.26854539748947504,-0.50220629097603442,-0.45992243273240957,-0.22700220943163843,-0.58515619056895107,-0.30881990792111419,-0.25553188431898477,-0.19094626742275222,-0.46913685168503899,-0.42588318016161014,-0.25568723993854769,-0.44264926800803872,-0.24246239066761635,-0.48398142871239747,-0.31134802304140435,-0.17683871070047771,-0.33780184173148015,-0.19090351819072396,-0.50407641349355803,-0.086471257577074051,-0.11523192025640851,-0.22419633747639098,-0.13296321754476376,-0.56218753371197083,-0.12970352730519435,-0.49070751628211506,-0.36693867902757382,-0.28040494827817442,-0.41765087935847645,-0.092176188011882834,-0.20456653760362628,-0.24200529360427336,-0.27135350397862867,-0.11546976125852684,0.077206417656396603,-0.4181448854705837,-0.40709829921909629],"z":[0.74882546067237854,0.8714122585952282,0.88761450722813606,0.91968789650127292,0.98954894952476025,0.95952008664608002,0.96504793642088771,0.94968841876834631,0.99913024995476007,0.758012430742383,0.83960865996778011,0.974135538097471,0.88031882373616099,0.92558425758033991,0.84499762952327728,0.92623344389721751,0.96115773776546121,0.80871169129386544,0.98247921699658036,0.97797358501702547,0.76909757871180773,0.85739849274978042,0.86065450217574835,0.98635931452736259,0.90782818291336298,0.9006795440800488,0.89116692030802369,0.99769850401207805,0.91799382073804736,0.92369213374331594,0.98772235494107008,0.78078739950433373,0.95270985178649426,0.91906867874786258,0.99954933067783713,0.77850955538451672,0.8782648011110723,0.98004179494455457,0.89709318196401,0.9325983221642673,0.97946981899440289,0.98491970542818308,0.99807408498600125,0.84585065487772226,0.88722333358600736,0.86931357765570283,0.97480237297713757,0.95533734280616045,0.77113261353224516,0.93078061891719699,0.85006952984258533,0.88260038010776043,0.93039488699287176,0.79595591593533754,0.89738265331834555,0.95187557488679886,0.95434749964624643,0.79773322585970163,0.89398870244622231,0.86566364532336593,0.8868411504663527,0.96901731472462416,0.74669199390336871,0.92443348746746778,0.88911397149786353,0.8987365085631609,0.97636938840150833,0.83756886515766382,0.9476763317361474,0.97687709657475352,0.91223246138542891,0.97337290970608592,0.81851286813616753,0.99024785216897726,0.87059832224622369,0.84153773076832294,0.95931195747107267,0.90578883560374379,0.99382315576076508,0.96371082356199622,0.96167603740468621,0.95664143562316895,0.98158453963696957,0.99532903777435422,0.79190854029729962,0.91322970204055309],"mode":"markers","marker":{"color":"rgba(101,142,103,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(101,142,103,1)"}},"text":["Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8","Chart 8"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"8","textfont":{"color":"rgba(101,142,103,1)"},"error_y":{"color":"rgba(101,142,103,1)"},"error_x":{"color":"rgba(101,142,103,1)"},"line":{"color":"rgba(101,142,103,1)"},"frame":null},{"x":[-0.11666413169093415,-0.24736557465378331,0.030757363451849231,-0.25228658282535188,-0.38645888434283659,0.021208717367138273,0.15597462920028832,-0.0016349018419069391,0.075664266627518129,0.019411760692241813,-0.030813963531450033,-0.21909619683109754,0.038198010733356089,-0.39364540681873433,-0.018781844282762652,-0.47312473626085766,-0.068251094509597932,0.019853074203819154,-0.08776932026676286,-0.49509568045389307,-0.11408548239839141,-0.32939047163583357,0.092016857369616789,0.021156115110809177,-0.43031437049947535,-0.19975080819936095,-0.14923425012464203,-0.29884475787151449,-0.33490250732233645,-0.12320749282861912,-0.2962621653580933,-0.48927772408781312,-0.34913676269450222,-0.031617452760191324,-0.050321477802836065,-0.27737357597871004,0.073238017755678758,-0.23667646414253049,-0.16582932898264208,-0.2155728718950351,-0.01051805888434857,-0.27730857644860962,-0.52005162255756932,-0.25072657793765912,0.069401571152157449,-0.32433192249842224,-0.47318745487737601,-0.19761894925305928,0.0081793984227139117,-0.3794082064759326,-0.25782409874269951,-0.35949872357537355,-0.49264870200527588,-0.29565930733920087,-0.13149692313074979,0.096332912479825422,-0.064176546343506544,-0.064890143992612045,-0.23493375469512567,-0.16849223099461147,-0.33420858310287527,-0.44468834946670061,0.017448640144894459,0.11995459952293092,-0.094326122145092409,-0.23166376342155712,0.16782005558833019,-0.30112067250404889,0.089169434544269691,-0.22249803764833118,-0.31193775205589191,-0.15177499622365534,-0.41376288296004715],"y":[-0.99263651204301051,-0.96301951530592955,-0.92987631588994957,-0.85500237305940929,-0.74735915699960243,-0.84244921541247386,-0.88624737234834261,-0.82848551244867363,-0.86631826428012138,-0.96500420044802926,-0.901487644495928,-0.96941490364197225,-0.82502406765850733,-0.90450207303889851,-0.96510234443581044,-0.86165734312512909,-0.82358113419364209,-0.78790589948768308,-0.99459552950618457,-0.78771474313915701,-0.92838011867805204,-0.83269097711301487,-0.94803492179444449,-0.88093150892876171,-0.72382087297531195,-0.83410950814260953,-0.84458120402042369,-0.85886487463114702,-0.92968088400039217,-0.97501463741883276,-0.75089669924657898,-0.79183116033940337,-0.80277771561688416,-0.84599661612926591,-0.96618439797490141,-0.91232760568735727,-0.81588406611687925,-0.94040338375940147,-0.87294122724706091,-0.92355390938653481,-0.82443982166198515,-0.79572973723286766,-0.83520303400728046,-0.66116818968133528,-0.97166323662906873,-0.91242607660599662,-0.73929149175869913,-0.90112794936393081,-0.98269242243717148,-0.90309701150789812,-0.9320545673790821,-0.80710540113339591,-0.76771670325117958,-0.75878225659739074,-0.68102761423143066,-0.95501701476853973,-0.99682001390223174,-0.98386939233538384,-0.74745337551170343,-0.96555686066998569,-0.79230203366352747,-0.84375654167353897,-0.9449162756473799,-0.8483334747552751,-0.83007447417536917,-0.76512580585780565,-0.87079977102593864,-0.72736711774360541,-0.92917444726668408,-0.97028047020430663,-0.78008874460655997,-0.96705198894272892,-0.89167956906958712],"z":[-0.032591952010989064,-0.10678804060444227,-0.36658426281064738,-0.45312517276033748,-0.54046629974618543,-0.53835816122591484,-0.43616225197911246,-0.5600081095471976,-0.49372824886813743,-0.26151496451348061,-0.43170652817934768,-0.11059656925499425,-0.56380510795861472,-0.16407100157812241,-0.2611985970288514,-0.18357452703639851,-0.56307717366144061,-0.61547554703429341,-0.055464214645326082,-0.36658662091940641,-0.35368751455098385,-0.44511532643809909,-0.30456967186182732,-0.47277065832167853,-0.53936340846121311,-0.51416042540222406,-0.51420980971306574,-0.41598429996520275,-0.15340718533843756,-0.18484147405251858,-0.59023967711254932,-0.36552800470963104,-0.48337507201358659,-0.53224999969825137,-0.25289416359737515,-0.30120132677257055,-0.57355852657929052,-0.24418379785493016,-0.45877472404390579,-0.3171380036510526,-0.56585188070312153,-0.53843675460666407,-0.17883568396791813,-0.70710169570520531,-0.22595127020031205,-0.24957455554977051,-0.47910512704402219,-0.38589528342708945,-0.18506404384970659,-0.20115963974967591,-0.25456043984740967,-0.46833912841975672,-0.40976617718115443,-0.58037493145093333,-0.7203540434129535,-0.28046830743551243,-0.047235905658453724,-0.16670419322326777,-0.62138521252199996,-0.19826829526573414,-0.51045284839347005,-0.30054478906095017,-0.32684671785682445,-0.5156948803924023,-0.54961709398776304,-0.60076151834800828,-0.46210841555148358,-0.61665502237156022,-0.35872504580765957,-0.095133760478347443,-0.54236186202615499,-0.20438884804025295,-0.18359690299257625],"mode":"markers","marker":{"color":"rgba(126,110,133,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(126,110,133,1)"}},"text":["Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9","Chart 9"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"9","textfont":{"color":"rgba(126,110,133,1)"},"error_y":{"color":"rgba(126,110,133,1)"},"error_x":{"color":"rgba(126,110,133,1)"},"line":{"color":"rgba(126,110,133,1)"},"frame":null},{"x":[0.84716923297785385,0.79908672476228393,0.87096631232768162,0.87668948786450562,0.99092515073210752,0.95416080693026406,0.74155913049247812,0.96883391349909687,0.97228506023008898,0.74246271398978247,0.98239383442301298,0.81228898823990681,0.86431939404191294,0.74919246998335198,0.81421224086814681,0.96281242335731332,0.75977236333202158,0.76566870385383534,0.71520112768477184,0.73826197632774415,0.91892792084897357,0.78425365329711427,0.89118283594815606,0.93049356452704268,0.90968211316748315,0.99058315013439346,0.91607930102393564,0.99360309671474989,0.83854528694508579,0.99380726877914216,0.87411771531185867,0.94771041518387111,0.79778121873473729,0.73372174156811854,0.93077683377537912,0.6766549856796249,0.74937958491184931,0.91197618472331476,0.83519773466632219,0.82063935808704691,0.990771708420357,0.79251572861579633,0.88978196204969784,0.72624182473681398,0.87542460120840726,0.88185341045436438,0.74761089207018427,0.7224046240256411,0.67341352345443461,0.94915341961282607,0.95339170889456371,0.9614711056346178,0.80329206389622998,0.78025803184513243,0.89259477445381952,0.99392376040276942,0.81060917882646277,0.95716652556586257,0.92789554122400297,0.91563584759137095,0.85726524322851561,0.93102177064279723,0.83407385098415254,0.98019016235885692,0.86140704479514307,0.98795126832390412,0.95757391465460284,0.90336177297209708,0.94820871534420792,0.93751755683592986,0.87235753060561849,0.98988362829485443,0.93602585065514898,0.82231098097576549,0.86619402659815647,0.86337869930232747,0.95185263093767414,0.85346470311599032,0.83197139984331892,0.87324433613405916,0.97587670654147807,0.84250503863527526,0.87299602830207124,0.92148686239063871,0.90517310514931748],"y":[0.20348552728913719,0.11551452450478612,0.1801121955195378,0.26789491443951458,-0.10079121190468354,0.20880154363464523,0.14387572754208594,-0.14075755989185637,0.10665601681200254,0.021108368832741899,0.10425635634198128,0.13060962595135808,0.38364113691549606,-0.111853776241542,-0.011615326959860875,-0.03967896210238215,0.33304407085353049,0.1939727613972532,0.0028872700636324152,-0.10050194582534831,0.013609739778835315,0.18602121152870521,-0.11526357174548979,-0.074989462646009808,-0.16910079427492788,-0.013166134642053675,0.12131768881178502,-0.070152470134958983,0.25871335197013312,0.016213231022295094,0.13298795614860726,0.045058030078184264,0.018730732814802,0.21270999956026568,0.29927952773572875,-0.040436412837378456,0.34517419485573247,-0.11174113350683203,0.34266353159315982,-0.19590662601689732,-0.013351959791644497,0.2332147547013981,0.37956210147664154,0.30315493579995439,0.14854601201838574,-0.084268558380788811,0.32392657255253454,0.16001257587330167,-0.036642439277641055,0.24546491832928627,-0.073881277603268097,0.060128915759437303,0.41021277815803198,-0.133882830075682,0.092509525615633331,-0.070934970158045049,0.20892975526005539,-0.18779794888234683,0.30243566323821092,-0.28986146469574531,0.37947032810706294,-0.27407152339684882,-0.066924723348565571,-0.033057782011154883,0.42254641225114781,-0.11871479230654898,0.054114584506339769,0.12685346852495044,0.026936383961783387,-0.25913681144851131,-0.24450556615506486,0.029766354467221218,-0.19250970836467851,-0.15425525091682857,0.11495315840254045,0.019250502238511661,-0.13317958843772756,-0.030688816601782899,0.16671935394207232,0.365353610125079,-0.17123768932147937,0.089619342125485002,-0.01771554298066276,0.26519774458882095,0.30738928011226424],"z":[0.49081353982910525,0.59001423791050922,0.457140328362584,0.39955957839265471,0.088929619640111909,0.21442730678245425,0.65527843777090322,0.20383365126326694,0.20805349247530111,0.66955474391579628,0.15502569545060413,0.56844324711710226,0.32522524986416107,0.6528394715860486,0.5804511271417141,0.26724112220108509,0.55841526016592979,0.61329112481325865,0.69891273463144898,0.66698471736162901,0.39419062808156019,0.59189721755683422,0.43875672295689577,0.35855028498917829,0.37931962031871086,0.13627793500199925,0.38220509234815842,0.0884958594106139,0.47948848083615297,0.10992835694924007,0.46715353289619094,0.31593471299856896,0.60265602683648467,0.64529594918712985,0.20996725931763652,0.73518904158845544,0.56505310628563166,0.39473200729116803,0.43014700757339591,0.53681620489805937,0.13488197419792408,0.5634977356530726,0.25341797713190317,0.61698451917618513,0.45996287884190684,0.46393250860273844,0.57978403707966208,0.67270166845992219,0.73835733765736222,0.19711610767990367,0.29255051910877239,0.26824956014752399,0.43179548019543301,0.61096054827794444,0.44126698980107909,0.084165245294571048,0.54704763647168875,0.22037280397489681,0.21804250543937098,0.27855219598859549,0.34799220226705085,0.24100469425320625,0.54757820675149571,0.19528038473799822,0.28183759981766349,0.09929294791072614,0.28306149458512664,0.40968976635485882,0.31650381255894905,0.2321829958818854,0.42333127325400705,0.13872406631708159,0.29461096227169031,0.54773165704682469,0.48630615836009383,0.50418909126892697,0.27611549431458116,0.52024628501385461,0.52917695231735706,0.32242994429543625,0.13543377490714203,0.53118135640397668,0.48740547196939599,0.28378181532025348,0.29355319822207104],"mode":"markers","marker":{"color":"rgba(152,78,163,1)","size":2.5,"opacity":0.59999999999999998,"line":{"color":"rgba(152,78,163,1)"}},"text":["Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10","Chart 10"],"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"type":"scatter3d","name":"10","textfont":{"color":"rgba(152,78,163,1)"},"error_y":{"color":"rgba(152,78,163,1)"},"error_x":{"color":"rgba(152,78,163,1)"},"line":{"color":"rgba(152,78,163,1)"},"frame":null},{"x":[-0.43343551692761551,-0.25584032143929408,-0.46046991654698372,-0.48736120982820236,-0.44478553611461891,-0.32815496870302358,-0.2927038774873737,-0.36845748713261772,-0.59419658454966184,-0.55840461841908973,-0.36615140626316495,-0.2488610887689709,-0.35164371219886714,-0.23616986920844246,-0.50529703131325598,-0.23232335052098588,-0.47385608910889682,-0.39850722374158165,-0.074819803231724102,-0.42989176283810721,-0.16125777125512622,-0.22257832926485729,-0.5746968426929201,-0.24830216137928637,-0.31997290336427447,-0.23683470580659052,0.024401121555230377,-0.66539055093268096,-0.50339331249769725,-0.5337917345692833,-0.0040464281713065266,-0.47591601528086536,-0.23606156672585585,-0.57999395092040817,-0.44298587539708745,-0.25220431842993807,-0.22321731043278617,-0.53113212255888564,-0.40917053696236566,-0.24372337815684345,-0.49182878978269678,-0.070023709127922676,-0.3310468297763895,-0.12536406197127778,-0.22904732922278973,0.028735737119957621,-0.61212087255394465,-0.63917659203900923,-0.27141144926598215,-0.35867081360051839,-0.19124417902655552,-0.4157352354128952,-0.54108434304575781,-0.40660572577874726,-0.035624267722908325,-0.42451874433456271,-0.23351294827871394,-0.072345901703402382,-0.31749456526117065,-0.41883867283685466,-0.61299926427757656,-0.31046277003504186,-0.60881095745536384,-0.52425484883688778,-0.36486495887822218,-0.49531759871295106,-0.38629481023937173,-0.18058582365040213,-0.094124690624267829,-0.23121608829214288,-0.52163656487861398,-0.23199200232692568,-0.35009183747862643,-0.64764232469899841,-0.56825710491790571],"y":[0.55256336873831524,0.28149457688669366,0.023868387956185861,-0.025836044589537387,0.30753158842652317,-0.0079764383192590018,0.55662235341292221,0.57027665145339601,-0.13299528959368914,0.12983696570294043,0.22437310445236086,0.36832698986151752,-0.06495826328160173,0.32940749091147331,0.082891743998979542,0.38362224454768729,0.28180464124275983,0.095726328616079062,0.49983971145464101,0.32094858611784072,0.33289752121421773,0.11899172951820104,0.31985687914039623,0.33435564750456565,0.45266550874257189,0.25902327361061261,0.19644173427582348,0.051616294666287595,-0.13552486238529582,0.35673926582441989,0.2144312359544796,-0.11611644232167417,0.056309243400912583,-0.072183855502467517,0.11101007003278095,0.41645935089714964,0.51918866941708264,0.10744416285333271,0.31573561649788845,0.40805464293262034,0.2431066764073461,0.30458299269395778,0.2512115119173407,0.26945048808695399,0.38041085273986497,0.17248299165881317,0.21909766146795664,0.15260346233654229,0.037782523860005884,0.30134673009201135,0.039990426087143406,0.42322058139475599,0.19760380349600301,-0.11306219958511016,0.40429097281563198,0.51097111647597349,0.5363069334088667,0.36833029174338799,0.048421529100044172,0.5137205674177785,0.0527841476673701,-0.0065310662857745805,-0.037678435787612305,-0.15929408843269796,0.18698711478638291,-0.089368989725173556,-0.056489402216473077,0.20725795784599269,0.50548912671503654,0.43778304626588538,-0.10980388407701726,0.039009048202379222,0.29651523785174827,0.17877665822246136,0.25899661606138669],"z":[0.71190404985100031,0.92482783971354365,0.88735435763373971,0.8728181654587388,0.84118377836421132,0.94459022488445044,0.77749340562149882,0.73418228048831224,0.79324817797169089,0.81934525351971388,0.90310013713315129,0.8957697176374495,0.93387746717780828,0.91417421633377671,0.85895510297268629,0.89378959173336625,0.83429404348134995,0.91215594206005335,0.86288021178916097,0.8439105860888958,0.9290722100995481,0.96762588620185852,0.75326961698010564,0.90914923837408423,0.83229278400540352,0.93638467835262418,0.98021182930096984,0.74470878392457962,0.85336286807432771,0.76668342901393771,0.97673070570454001,0.87179178604856133,0.970105255022645,0.81141635915264487,0.88962929276749492,0.87347271898761392,0.82499524718150496,0.84044893970713019,0.85609023598954082,0.87982402974739671,0.83606434287503362,0.94990835385397077,0.90956075815483928,0.95481950463727117,0.8960049687884748,0.98459325358271599,0.75980540411546826,0.75376751553267241,0.96172153251245618,0.88348480220884085,0.98072750028222799,0.80501462938264012,0.8174230670556426,0.90658078668639064,0.91393638774752617,0.74745725886896253,0.81107741687446833,0.92687586368992925,0.94...</script></div></div></section>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rworks.dev/posts/atlas-learn-sphere/"> R Works</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/the-atlas-learn-approach-to-the-manifold-hypothesis/">The Atlas-Learn Approach to the Manifold Hypothesis</a>]]></content:encoded>
					
		
		<enclosure url="https://rworks.dev/posts/atlas-learn-sphere/geo.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">401347</post-id>	</item>
		<item>
		<title>More crochet/programming thoughts</title>
		<link>https://www.r-bloggers.com/2026/05/more-crochet-programming-thoughts/</link>
		
		<dc:creator><![CDATA[Maëlle&#039;s R blog on Maëlle Salmon&#039;s personal website]]></dc:creator>
		<pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://masalmon.eu/2026/05/19/crochet-again/</guid>

					<description><![CDATA[<p>Here I am again, writing about crochet and programming!<br />
I’ve continued creating my tons of shitty stitches cute creatures.<br />
New Git analogy! The crochet lifeline<br />
First Git/crochet analogy.<br />
I read a great crochet book about creating your own amigurumi pa...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/more-crochet-programming-thoughts/">More crochet/programming thoughts</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://masalmon.eu/2026/05/19/crochet-again/"> Maëlle&#039;s R blog on Maëlle Salmon&#039;s personal website</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Here I am again, writing about crochet and programming!
I’ve continued creating <a href="https://masalmon.eu/2026/01/26/amigurumi/#the-importance-of-practice" rel="nofollow" target="_blank"><del>my tons of shitty stitches</del></a> cute creatures.</p>
<h2 id="new-git-analogy-the-crochet-lifeline">New Git analogy! The crochet lifeline</h2>
<p><em><a href="https://masalmon.eu/2026/02/15/stitch-markers-git-commits/" rel="nofollow" target="_blank">First Git/crochet analogy</a></em>.</p>
<p>I read a great crochet book about creating your own amigurumi patterns: <a href="https://www.editions-eyrolles.com/livre/creer-ses-propres-modeles-d-amigurumis-au-crochet" rel="nofollow" target="_blank"><em>Créer ses propres modèles d’amigurumis au crochet</em></a> by Clotilde Massot and Lise Grandjonc.
One of its authors (Clotilde Massot) is a software developer who, among other things, published an octocat pattern that I bought and <a href="https://bsky.app/profile/masalmon.eu/post/3mjw2zxta2k2z" rel="nofollow" target="_blank">used</a>…
Anyway in the book they explain that when you create a pattern, you will probably have to undo your work several times.
Undoing a round is easy if you have a stitch marker in the first stitch of the current/last round: you undo until you hit that stitch marker.
But what about undoing several rounds?
In that case, you’ll be better off if instead of using a stitch marker at the beginning of the current round, you use contrasting yarn stuck <em>under the first stitch of each round</em>.
That yarn creates what the authors of the book call your <strong>lifeline</strong>!</p>
<p>Now, if that’s not a great analogy for commits and the ability to reset…</p>
<h2 id="communities-of-practice">Communities of practice</h2>
<p>If you’ve had the opportunity to attend or watch the wonderful <a href="https://yabellini.netlify.app/talk/2025_user2025/" rel="nofollow" target="_blank">useR! 2025 keynote talk of my rOpenSci colleague Yanina Bellini Saibene</a>, you’ve heard of communities of practice.
In her talk, Yani quoted Etienne Wenger who defined communities of practice as <em>“groups of people who share a passion for something that they know how to do, and who interact regularly in order to learn how to do it better“</em>.
Yani mentioned her swimming team and English conversation club.
Well, I found a community of practice for crochet: a stitch club at a local café!
Participants meet up to crochet side by side, talking a lot about crochet: comparing yarns, exchanging tips, etc.
The first time I went, we even started with a round of introductions where we said what each of us would work on during that meeting, which reminded me of <a href="https://ropensci.org/coworking/" rel="nofollow" target="_blank">rOpenSci coworking sessions</a> where participants do exactly that.</p>
<h2 id="usefulness-of-seeing-others-work">Usefulness of seeing others’ work</h2>
<p>I’ve worked on some patterns by Yan Schenkel a.k.a. <a href="https://picapauyan.com/" rel="nofollow" target="_blank">Pica Pau</a>.
One cool aspect of the patterns is that they include a link to a gallery where anyone can upload pics of their take on each creature.
So you can look at them, maybe seeing a crucial (to you) angle that’s absent from the pattern, noticing whether some “flaw” of your own project is present in others’ projects, comparing variations and picking what you prefer before you start, etc.
For instance I stared at many pics of <a href="https://www.amigurumi.com/forum/Animal-Friends-of-Pica-Pau-3/Alberto-Seagull/" rel="nofollow" target="_blank">Alberto Seagull</a> before making the legs for mine.</p>
<p>The usefulness of seeing others’ crocheted animals reminds me of how reading and reviewing open-source, or our colleagues’, code helps us (and LLMs, I suppose) learn how to do, or not to do, some things, how it help us refine our taste!
Ironically, I do publish most of my code, but I haven’t had interest in doing that for my crocheting yet. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f638.png" alt="😸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h2 id="conclusion">Conclusion</h2>
<p>In summary, I keep finding excuses to talk about crochet. My pink octocat approves!</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://masalmon.eu/2026/05/19/crochet-again/"> Maëlle&#039;s R blog on Maëlle Salmon&#039;s personal website</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/more-crochet-programming-thoughts/">More crochet/programming thoughts</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401301</post-id>	</item>
		<item>
		<title>Querying Neo4j Aura from R with neo2R</title>
		<link>https://www.r-bloggers.com/2026/05/querying-neo4j-aura-from-r-with-neo2r/</link>
		
		<dc:creator><![CDATA[Patrice Godard]]></dc:creator>
		<pubDate>Mon, 18 May 2026 22:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://patzaw.github.io/posts/neo2R-Aura.html</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>1 Introduction<br />
Graph databases excel at storing and traversing highly connected data used for recommendation engines, fraud detection, knowledge graphs, and social networks. Neo4j is one of the most widely used graph databases, and with Neo...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/querying-neo4j-aura-from-r-with-neo2r/">Querying Neo4j Aura from R with neo2R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://patzaw.github.io/posts/neo2R-Aura.html"> Patrice Godard</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 




<p><img src="https://i1.wp.com/patzaw.github.io/posts/images/neo2R-Aura.png?w=450&#038;ssl=1" class="img-fluid" alt="Illustration of neo2R used to connect to Aura, generated with Gemini"  data-recalc-dims="1"></p>
<script src="https://patzaw.github.io/site_libs/htmlwidgets-1.6.4/htmlwidgets.js"></script>
<link rel="stylesheet" href="https://patzaw.github.io/site_libs/vis-9.1.0/vis-network.min.css">
<script src="https://patzaw.github.io/site_libs/vis-9.1.0/vis-network.min.js"></script>
<script src="https://patzaw.github.io/site_libs/visNetwork-binding-2.1.4/visNetwork.js"></script>
<section id="introduction" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="introduction"><span class="header-section-number">1</span> Introduction</h2>
<p>Graph databases excel at storing and traversing highly connected data used for recommendation engines, fraud detection, knowledge graphs, and social networks. <strong>Neo4j</strong> is one of the most widely used graph databases, and with <strong>Neo4j Aura</strong>, its managed cloud service now makes it easy to spin up a production-grade instance without any infrastructure overhead.</p>
<p>On the R side, the <a href="https://cran.r-project.org/package=neo2R" rel="nofollow" target="_blank">neo2R</a> package has long been available for querying self-hosted Neo4j instances from R. Version <strong>3.0.0</strong> brings two important changes:</p>
<ol type="1">
<li><strong>Unified connection model</strong> — a single <code>startGraph()</code> call handles both a self-hosted Neo4j instance (<code>http://localhost:7474</code>) and a <em>cloud</em> Neo4j Aura instance (<code>https://&lt;id&gt;.databases.neo4j.io</code>).</li>
<li><strong>httr2 backend</strong> — the internal HTTP layer migrated from the deprecated <code>httr</code> package to <a href="https://httr2.r-lib.org/" rel="nofollow" target="_blank"><code>httr2</code></a>.</li>
</ol>
<p>In this post, we’ll connect to the <strong>free Neo4j Aura demo database</strong> preloaded with the classic Movie Recommendations dataset, explore the graph with Cypher queries, and finish with an interactive network visualization built with <a href="https://datastorm-open.github.io/visNetwork/" rel="nofollow" target="_blank">visNetwork</a>.</p>
<hr>
</section>
<section id="prerequisites" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="prerequisites"><span class="header-section-number">2</span> Prerequisites</h2>
<div class="cell">
<pre>install.packages(c(&quot;neo2R&quot;, &quot;dplyr&quot;, &quot;visNetwork&quot;))</pre>
</div>
<div class="cell">
<pre>library(neo2R)
library(dplyr)</pre>
<div class="cell-output cell-output-stderr">
<pre>
Attaching package: 'dplyr'</pre>
</div>
<div class="cell-output cell-output-stderr">
<pre>The following objects are masked from 'package:stats':

    filter, lag</pre>
</div>
<div class="cell-output cell-output-stderr">
<pre>The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union</pre>
</div>
<pre>library(visNetwork)</pre>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>neo2R 3.0.0 requires R ≥ 4.1 and httr2 ≥ 1.0.0. Check your versions with <code>packageVersion(&quot;neo2R&quot;)</code> and <code>packageVersion(&quot;httr2&quot;)</code>.</p>
</div>
</div>
<hr>
</section>
<section id="connecting-to-neo4j-aura" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="connecting-to-neo4j-aura"><span class="header-section-number">3</span> Connecting to Neo4j Aura</h2>
<section id="create-and-connect-to-an-aura-instance" class="level3" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="create-and-connect-to-an-aura-instance"><span class="header-section-number">3.1</span> Create and Connect to an Aura Instance</h3>
<p>Neo4j provides a <strong>free Aura Free</strong> tier (up to 200 k nodes / 400 k relationships).</p>
<p>Create a free instance at https://console.neo4j.io and get your connection details.</p>
<p>Connect to your instance with <code>startGraph()</code>.</p>
<div class="cell">
<pre>my_aura &lt;- startGraph(
  url = &quot;https://&lt;INSTANCEID&gt;.databases.neo4j.io&quot;,
  database = &quot;INSTANCEID&quot;,
  username = &quot;INSTANCEID&quot;,
  password = &quot;INSTANCEPASSWORD&quot;
  ## api = &quot;v2&quot; is set automatically for *.databases.neo4j.io URLs
)</pre>
</div>
</section>
<section id="the-movie-recommendations-dataset" class="level3" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="the-movie-recommendations-dataset"><span class="header-section-number">3.2</span> The Movie Recommendations Dataset</h3>
<p>Neo4j provides <a href="https://neo4j.com/docs/getting-started/appendix/example-data/" rel="nofollow" target="_blank">example datasets</a>, and most of them are available as a one-click templates in the <a href="https://console.neo4j.io/" rel="nofollow" target="_blank">Neo4j Aura console</a>.</p>
<p>The <a href="https://github.com/neo4j-graph-examples/recommendations" rel="nofollow" target="_blank">Movie Recommendations dataset</a> is a graph example using a dataset of movie reviews for generating personalized, real-time recommendations. This dataset is also available on a demo server that can be accessed as follows.</p>
<div class="cell">
<pre>graph &lt;- startGraph(
  url = &quot;https://demo.neo4jlabs.com:7473&quot;,
  database = &quot;recommendations&quot;,
  username = &quot;recommendations&quot;,
  password = &quot;recommendations&quot;
)</pre>
</div>
</section>
</section>
<section id="exploring-the-schema" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="exploring-the-schema"><span class="header-section-number">4</span> Exploring the schema</h2>
<p>The Movie database contains the following node labels and relationship types:</p>
<div class="cell">
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<thead>
<tr class="header">
<th style="text-align: left;">Node label</th>
<th style="text-align: left;">Key properties</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Movie</td>
<td style="text-align: left;">title, released, imdbId</td>
</tr>
<tr class="even">
<td style="text-align: left;">Genre</td>
<td style="text-align: left;">name</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Actor</td>
<td style="text-align: left;">name, born, imdbId</td>
</tr>
<tr class="even">
<td style="text-align: left;">Director</td>
<td style="text-align: left;">name, born, imdbId</td>
</tr>
<tr class="odd">
<td style="text-align: left;">User</td>
<td style="text-align: left;">name</td>
</tr>
</tbody>
</table>
</div>
</div>
<p><br></p>
<div class="cell">
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<thead>
<tr class="header">
<th style="text-align: left;">Relationship type</th>
<th style="text-align: left;">Key properties</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">IN_GENRE</td>
<td style="text-align: left;"></td>
</tr>
<tr class="even">
<td style="text-align: left;">ACTED_IN</td>
<td style="text-align: left;">role</td>
</tr>
<tr class="odd">
<td style="text-align: left;">DIRECTED</td>
<td style="text-align: left;"></td>
</tr>
<tr class="even">
<td style="text-align: left;">RATED</td>
<td style="text-align: left;">rating || timestamp</td>
</tr>
</tbody>
</table>
</div>
</div>
<p><br></p>
<p>Let’s count the number of these different concepts:</p>
<div class="cell">
<pre>## Node types and counts
cypher(
  graph,
  &quot;
  MATCH (n)
  RETURN labels(n) AS label, count(n) AS n
  ORDER BY n DESC
&quot;
) |&gt;
  as_tibble() |&gt;
  ## filtering out technical nodes
  filter(label %notin% c(&quot;_Bloom_Perspective_&quot;, &quot;_Bloom_Scene_&quot;, &quot;&quot;))</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 6 × 2
  label                           n
  &lt;chr&gt;                       &lt;int&gt;
1 Actor || Person             14956
2 Movie                        9125
3 Director || Person           3604
4 User                          671
5 Actor || Director || Person   487
6 Genre                          20</pre>
</div>
<pre>## Relationship types and counts
cypher(
  graph,
  &quot;
  MATCH ()-[r]-&gt;()
  RETURN type(r) AS type, count(r) AS n
  ORDER BY n DESC
&quot;
) |&gt;
  as_tibble() |&gt;
  ## filtering out technical relationships
  filter(type %notin% c(&quot;_Bloom_HAS_SCENE_&quot;))</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 4 × 2
  type          n
  &lt;chr&gt;     &lt;int&gt;
1 RATED    100004
2 ACTED_IN  35910
3 IN_GENRE  20340
4 DIRECTED  10007</pre>
</div>
</div>
</section>
<section id="querying-with-cypher" class="level2" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="querying-with-cypher"><span class="header-section-number">5</span> Querying with Cypher</h2>
<section id="top-prolific-actors" class="level3" data-number="5.1">
<h3 data-number="5.1" class="anchored" data-anchor-id="top-prolific-actors"><span class="header-section-number">5.1</span> Top prolific actors</h3>
<div class="cell">
<pre>cypher(
  graph,
  &quot;
  MATCH (p:Person)-[:ACTED_IN]-&gt;(m:Movie)
  RETURN p.name AS actor, count(m) AS movies
  ORDER BY movies DESC
  LIMIT 10
&quot;
) |&gt;
  as_tibble()</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 10 × 2
   actor             movies
   &lt;chr&gt;              &lt;int&gt;
 1 Robert De Niro        56
 2 Bruce Willis          49
 3 Samuel L. Jackson     45
 4 Nicolas Cage          45
 5 Michael Caine         40
 6 Clint Eastwood        40
 7 Tom Hanks             38
 8 John Cusack           38
 9 Morgan Freeman        38
10 Gene Hackman          38</pre>
</div>
</div>
</section>
<section id="movies-and-their-directors" class="level3" data-number="5.2">
<h3 data-number="5.2" class="anchored" data-anchor-id="movies-and-their-directors"><span class="header-section-number">5.2</span> Movies and their directors</h3>
<div class="cell">
<pre>cypher(
  graph,
  &quot;
  MATCH (d:Person)-[:DIRECTED]-&gt;(m:Movie)
  RETURN m.title AS movie, m.released as released, d.name AS director
  ORDER BY m.released IS NOT NULL DESC, m.released DESC
  LIMIT 10
&quot;
) |&gt;
  as_tibble()</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 10 × 3
   movie         released   director            
   &lt;chr&gt;         &lt;chr&gt;      &lt;chr&gt;               
 1 Solace        2016-09-02 &quot;Afonso Poyart&quot;     
 2 Ben-hur       2016-08-12 &quot;Timur Bekmambetov&quot; 
 3 Rustom        2016-08-12 &quot;Tinu Suresh Desai&quot; 
 4 Mohenjo Daro  2016-08-12 &quot;Ashutosh Gowariker&quot;
 5 Suicide Squad 2016-08-05 &quot;David Ayer&quot;        
 6 Shin Godzilla 2016-07-29 &quot;Hideaki Anno&quot;      
 7 Shin Godzilla 2016-07-29 &quot; Shinji Higuchi&quot;   
 8 Jason Bourne  2016-07-29 &quot;Paul Greengrass&quot;   
 9 Star Trek 3   2016-07-22 &quot;Justin Lin&quot;        
10 Ghostbusters  2016-07-15 &quot;Paul Feig&quot;         </pre>
</div>
</div>
</section>
<section id="parameterised-queries" class="level3" data-number="5.3">
<h3 data-number="5.3" class="anchored" data-anchor-id="parameterised-queries"><span class="header-section-number">5.3</span> Parameterised queries</h3>
<p>neo2R supports <strong>named parameters</strong>, keeping queries safe from injection and easy to reuse:</p>
<div class="cell">
<pre>## Find all co-stars of a given actor
cypher(
  graph,
  &quot;
  MATCH (a:Person {name: $actor})-[:ACTED_IN]-&gt;(m:Movie)&lt;-[:ACTED_IN]-(co:Person)
  RETURN DISTINCT co.name AS co_star, m.title AS shared_movie
  ORDER BY co_star
  &quot;,
  parameters = list(actor = &quot;Tom Hanks&quot;)
) |&gt;
  as_tibble()</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 114 × 2
   co_star            shared_movie        
   &lt;chr&gt;              &lt;chr&gt;               
 1 Adrian Zmed        Bachelor Party      
 2 Alexander Godunov  Money Pit, The      
 3 Amy Adams          Charlie Wilson's War
 4 Annie Rose Buckley Saving Mr. Banks    
 5 Audrey Tautou      Da Vinci Code, The  
 6 Ayelet Zurer       Angels & Demons     
 7 Barkhad Abdi       Captain Phillips    
 8 Barkhad Abdirahman Captain Phillips    
 9 Barry Pepper       Saving Private Ryan 
10 Bill Paxton        Apollo 13           
# &#x2139; 104 more rows</pre>
</div>
</div>
<hr>
</section>
</section>
<section id="network-visualisation-with-visnetwork" class="level2" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="network-visualisation-with-visnetwork"><span class="header-section-number">6</span> Network visualisation with visNetwork</h2>
<p>The real power of a graph database is visible when you <em>draw</em> the graph. Let’s pull Tom Hanks’s ego network, everyone he has acted alongside, and render it with <strong>visNetwork</strong>.</p>
<section id="step-1-fetch-nodes-and-edges" class="level3" data-number="6.1">
<h3 data-number="6.1" class="anchored" data-anchor-id="step-1-fetch-nodes-and-edges"><span class="header-section-number">6.1</span> Step 1 — Fetch nodes and edges</h3>
<div class="cell">
<pre>## Tom Hanks, his movies, and his co-stars
hub &lt;- &quot;Tom Hanks&quot;
nodes_raw &lt;- cypher(
  graph,
  &quot;
  MATCH (hub:Person {name: $hub})-[hr:ACTED_IN]-&gt;(m:Movie)
  &lt;-[cr:ACTED_IN]-(co:Person)
  RETURN hub.name AS hub, hr.role AS hub_role,
  m.title AS movie, m.year AS year,
  co.name AS co, cr.role AS co_role
  &quot;,
  parameters = list(hub = hub)
) |&gt;
  as_tibble()</pre>
</div>
</section>
<section id="step-2-shape-data-for-visnetwork" class="level3" data-number="6.2">
<h3 data-number="6.2" class="anchored" data-anchor-id="step-2-shape-data-for-visnetwork"><span class="header-section-number">6.2</span> Step 2 — Shape data for visNetwork</h3>
<p>visNetwork expects two data frames: <code>nodes</code> (with columns <code>id</code>, <code>label</code>, <code>group</code>, …) and <code>edges</code> (with columns <code>from</code>, <code>to</code>, …).</p>
<div class="cell">
<pre>nodes &lt;- bind_rows(
  nodes_raw |&gt;
    distinct(
      id = hub,
      group = &quot;Hub&quot;
    ),
  nodes_raw |&gt;
    distinct(
      id = co,
      group = &quot;Co-star&quot;
    ),
  nodes_raw |&gt;
    distinct(
      id = movie,
      group = &quot;Movie&quot;,
      year
    )
) |&gt;
  distinct() |&gt;
  mutate(
    title = sprintf(
      '&lt;b&gt;%s&lt;/b&gt;: %s%s',
      group,
      id,
      ifelse(!is.na(year), sprintf(&quot;(%s)&quot;, year), &quot;&quot;)
    ),
    shape = ifelse(group == &quot;Movie&quot;, &quot;dot&quot;, &quot;star&quot;),
    size = ifelse(group == &quot;Hub&quot;, 30, 18)
  ) |&gt; 
    arrange(id)

edges &lt;- bind_rows(
  nodes_raw |&gt;
    distinct(
      from = hub,
      to = movie,
      role = hub_role
    ),
  nodes_raw |&gt;
    distinct(
      from = co,
      to = movie,
      role = co_role
    )
) |&gt;
  mutate(
    title = sprintf('&lt;b&gt;Role&lt;/b&gt;: %s', role),
    arrows = &quot;to&quot;
  )</pre>
</div>
</section>
<section id="step-3-draw-the-network" class="level3" data-number="6.3">
<h3 data-number="6.3" class="anchored" data-anchor-id="step-3-draw-the-network"><span class="header-section-number">6.3</span> Step 3 — Draw the network</h3>
<div class="cell">
<pre>visNetwork(nodes, edges) |&gt;
  visGroups(
    groupname = &quot;hub&quot;,
    color = list(
      background = &quot;#3B82F6&quot;,
      border = &quot;#1D4ED8&quot;,
      highlight = &quot;#93C5FD&quot;
    )
  ) |&gt;
  visGroups(
    groupname = &quot;movie&quot;,
    color = list(
      background = &quot;#F97316&quot;,
      border = &quot;#C2410C&quot;,
      highlight = &quot;#FED7AA&quot;
    ),
    shape = &quot;square&quot;
  ) |&gt;
  visGroups(
    groupname = &quot;costar&quot;,
    color = list(
      background = &quot;#6B7280&quot;,
      border = &quot;#374151&quot;,
      highlight = &quot;#D1D5DB&quot;
    )
  ) |&gt;
  visEdges(
    color = list(color = &quot;#CBD5E1&quot;, highlight = &quot;#3B82F6&quot;),
    width = 1.5
  ) |&gt;
  visOptions(
    highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE),
    nodesIdSelection = TRUE
  ) |&gt;
  visLayout(randomSeed = 42) |&gt;
  visPhysics(
    solver = &quot;forceAtlas2Based&quot;,
    forceAtlas2Based = list(
      gravitationalConstant = -60,
      springLength = 120,
      springConstant = 0.04
    )
  ) |&gt;
  visLegend(position = &quot;right&quot;, main = &quot;Node type&quot;)</pre>
<div class="cell-output-display">
<div class="visNetwork html-widget html-fill-item" id="htmlwidget-6a800980bfb41b9a5d8f" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-6a800980bfb41b9a5d8f">{"x":{"nodes":{"id":["'burbs, The","Adrian Zmed","Alexander Godunov","Amy Adams","Angels & Demons","Annie Rose Buckley","Apollo 13","Audrey Tautou","Ayelet Zurer","Bachelor Party","Barkhad Abdi","Barkhad Abdirahman","Barry Pepper","Big","Bill Paxton","Bonfire of the Vanities","Bonnie Hunt","Bruce Dern","Bruce Willis","Buzz Kilman","Captain Phillips","Carl Weathers","Carrie Fisher","Catch Me If You Can","Catherine Keener","Catherine Zeta-Jones","Charles Durning","Charlie Wilson's War","Chi McBride","Christopher Plummer","Christopher Walken","Cloud Atlas","Colin Farrell","Craig T. Nelson","Da Vinci Code, The","Dabney Coleman","Dan Aykroyd","Daryl Hannah","David Andrews","David Morse","Denzel Washington","Don Rickles","Dragnet","Eddie Deezen","Edward Burns","Elizabeth Perkins","Emma Thompson","Eugene Levy","Eva Marie Saint","Ewan McGregor","Extremely Loud and Incredibly Close","Forrest Gump","From the Earth to the Moon","Gary Sinise","Geena Davis","George Grizzard","Green Mile, The","Greg Kinnear","Halle Berry","Harry Morgan","Hector Elizondo","Hugo Weaving","Ian McKellen","Irma P. Hall","J.K. Simmons","Jackie Gleason","Jean Reno","Jim Broadbent","Jim Varney","Joan Cusack","Joe Versus the Volcano","John Candy","John Goodman","John Heard","Julia Roberts","Kelsey Grammer","Kevin Bacon","Kim Cattrall","Ladykillers, The","Lane Smith","Larry Crowne","League of Their Own, A","Leonardo DiCaprio","Leslie Zemeckis","Lloyd Bridges","Lori Petty","Lori Singer","Madonna","Man with One Red Shoe, The","Mare Winningham","Mark Rydell","Marlon Wayans","Martin Sheen","Maureen Stapleton","Meg Ryan","Melanie Griffith","Michael Clarke Duncan","Michael Conner Humphreys","Money Pit, The","Ned Beatty","Nick Searcy","Nona Gaye","Nothing in Common","Parker Posey","Philadelphia","Philip Seymour Hoffman","Polar Express, The","Punchline","Randall Park","Reginald VelJohnson","Rick Ducommun","Rita Wilson","Robert Loggia","Robert Stack","Roberta Maxwell","Robin Wright","Ross Malinger","Roxana Ortega","Sally Field","Sandra Bullock","Sarah Mahoney","Saving Mr. Banks","Saving Private Ryan","Shelley Long","Sleepless in Seattle","Splash","Stanley Tucci","Stellan Skarsgård","Tawny Kitaen","Terminal, The","Thomas Horn","Tim Allen","Tim Thomerson","Tom Hanks","Tom Sizemore","Toy Story","Toy Story 2","Toy Story 3","Toy Story of Terror","Turner & Hooch","Victor Garber","Volunteers","You've Got Mail","Zoe Caldwell"],"group":["Movie","Co-star","Co-star","Co-star","Movie","Co-star","Movie","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Movie","Co-star","Movie","Co-star","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Movie","Movie","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Co-star","Movie","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Movie","Co-star","Movie","Co-star","Movie","Movie","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Co-star","Movie","Movie","Co-star","Movie","Movie","Co-star","Co-star","Co-star","Movie","Co-star","Co-star","Co-star","Hub","Co-star","Movie","Movie","Movie","Movie","Movie","Co-star","Movie","Movie","Co-star"],"year":[1989,null,null,null,2009,null,1995,null,null,1984,null,null,null,1988,null,1990,null,null,null,null,2013,null,null,2002,null,null,null,2007,null,null,null,2012,null,null,2006,null,null,null,null,null,null,null,1987,null,null,null,null,null,null,null,2011,1994,1998,null,null,null,1999,null,null,null,null,null,null,null,null,null,null,null,null,null,1990,null,null,null,null,null,null,null,2004,null,2011,1992,null,null,null,null,null,null,1985,null,null,null,null,null,null,null,null,null,1986,null,null,null,1986,null,1993,null,2004,1988,null,null,null,null,null,null,null,null,null,null,null,null,null,2013,1998,null,1993,1984,null,null,null,2004,null,null,null,null,null,1995,1999,2010,2013,1989,null,1985,1998,null],"title":["<b>Movie<\/b>: 'burbs, The(1989)","<b>Co-star<\/b>: Adrian Zmed","<b>Co-star<\/b>: Alexander Godunov","<b>Co-star<\/b>: Amy Adams","<b>Movie<\/b>: Angels & Demons(2009)","<b>Co-star<\/b>: Annie Rose Buckley","<b>Movie<\/b>: Apollo 13(1995)","<b>Co-star<\/b>: Audrey Tautou","<b>Co-star<\/b>: Ayelet Zurer","<b>Movie<\/b>: Bachelor Party(1984)","<b>Co-star<\/b>: Barkhad Abdi","<b>Co-star<\/b>: Barkhad Abdirahman","<b>Co-star<\/b>: Barry Pepper","<b>Movie<\/b>: Big(1988)","<b>Co-star<\/b>: Bill Paxton","<b>Movie<\/b>: Bonfire of the Vanities(1990)","<b>Co-star<\/b>: Bonnie Hunt","<b>Co-star<\/b>: Bruce Dern","<b>Co-star<\/b>: Bruce Willis","<b>Co-star<\/b>: Buzz Kilman","<b>Movie<\/b>: Captain Phillips(2013)","<b>Co-star<\/b>: Carl Weathers","<b>Co-star<\/b>: Carrie Fisher","<b>Movie<\/b>: Catch Me If You Can(2002)","<b>Co-star<\/b>: Catherine Keener","<b>Co-star<\/b>: Catherine Zeta-Jones","<b>Co-star<\/b>: Charles Durning","<b>Movie<\/b>: Charlie Wilson's War(2007)","<b>Co-star<\/b>: Chi McBride","<b>Co-star<\/b>: Christopher Plummer","<b>Co-star<\/b>: Christopher Walken","<b>Movie<\/b>: Cloud Atlas(2012)","<b>Co-star<\/b>: Colin Farrell","<b>Co-star<\/b>: Craig T. Nelson","<b>Movie<\/b>: Da Vinci Code, The(2006)","<b>Co-star<\/b>: Dabney Coleman","<b>Co-star<\/b>: Dan Aykroyd","<b>Co-star<\/b>: Daryl Hannah","<b>Co-star<\/b>: David Andrews","<b>Co-star<\/b>: David Morse","<b>Co-star<\/b>: Denzel Washington","<b>Co-star<\/b>: Don Rickles","<b>Movie<\/b>: Dragnet(1987)","<b>Co-star<\/b>: Eddie Deezen","<b>Co-star<\/b>: Edward Burns","<b>Co-star<\/b>: Elizabeth Perkins","<b>Co-star<\/b>: Emma Thompson","<b>Co-star<\/b>: Eugene Levy","<b>Co-star<\/b>: Eva Marie Saint","<b>Co-star<\/b>: Ewan McGregor","<b>Movie<\/b>: Extremely Loud and Incredibly Close(2011)","<b>Movie<\/b>: Forrest Gump(1994)","<b>Movie<\/b>: From the Earth to the Moon(1998)","<b>Co-star<\/b>: Gary Sinise","<b>Co-star<\/b>: Geena Davis","<b>Co-star<\/b>: George Grizzard","<b>Movie<\/b>: Green Mile, The(1999)","<b>Co-star<\/b>: Greg Kinnear","<b>Co-star<\/b>: Halle Berry","<b>Co-star<\/b>: Harry Morgan","<b>Co-star<\/b>: Hector Elizondo","<b>Co-star<\/b>: Hugo Weaving","<b>Co-star<\/b>: Ian McKellen","<b>Co-star<\/b>: Irma P. Hall","<b>Co-star<\/b>: J.K. Simmons","<b>Co-star<\/b>: Jackie Gleason","<b>Co-star<\/b>: Jean Reno","<b>Co-star<\/b>: Jim Broadbent","<b>Co-star<\/b>: Jim Varney","<b>Co-star<\/b>: Joan Cusack","<b>Movie<\/b>: Joe Versus the Volcano(1990)","<b>Co-star<\/b>: John Candy","<b>Co-star<\/b>: John Goodman","<b>Co-star<\/b>: John Heard","<b>Co-star<\/b>: Julia Roberts","<b>Co-star<\/b>: Kelsey Grammer","<b>Co-star<\/b>: Kevin Bacon","<b>Co-star<\/b>: Kim Cattrall","<b>Movie<\/b>: Ladykillers, The(2004)","<b>Co-star<\/b>: Lane Smith","<b>Movie<\/b>: Larry Crowne(2011)","<b>Movie<\/b>: League of Their Own, A(1992)","<b>Co-star<\/b>: Leonardo DiCaprio","<b>Co-star<\/b>: Leslie Zemeckis","<b>Co-star<\/b>: Lloyd Bridges","<b>Co-star<\/b>: Lori Petty","<b>Co-star<\/b>: Lori Singer","<b>Co-star<\/b>: Madonna","<b>Movie<\/b>: Man with One Red Shoe, The(1985)","<b>Co-star<\/b>: Mare Winningham","<b>Co-star<\/b>: Mark Rydell","<b>Co-star<\/b>: Marlon Wayans","<b>Co-star<\/b>: Martin Sheen","<b>Co-star<\/b>: Maureen Stapleton","<b>Co-star<\/b>: Meg Ryan","<b>Co-star<\/b>: Melanie Griffith","<b>Co-star<\/b>: Michael Clarke Duncan","<b>Co-star<\/b>: Michael Conner Humphreys","<b>Movie<\/b>: Money Pit, The(1986)","<b>Co-star<\/b>: Ned Beatty","<b>Co-star<\/b>: Nick Searcy","<b>Co-star<\/b>: Nona Gaye","<b>Movie<\/b>: Nothing in Common(1986)","<b>Co-star<\/b>: Parker Posey","<b>Movie<\/b>: Philadelphia(1993)","<b>Co-star<\/b>: Philip Seymour Hoffman","<b>Movie<\/b>: Polar Express, The(2004)","<b>Movie<\/b>: Punchline(1988)","<b>Co-star<\/b>: Randall Park","<b>Co-star<\/b>: Reginald VelJohnson","<b>Co-star<\/b>: Rick Ducommun","<b>Co-star<\/b>: Rita Wilson","<b>Co-star<\/b>: Robert Loggia","<b>Co-star<\/b>: Robert Stack","<b>Co-star<\/b>: Roberta Maxwell","<b>Co-star<\/b>: Robin Wright","<b>Co-star<\/b>: Ross Malinger","<b>Co-star<\/b>: Roxana Ortega","<b>Co-star<\/b>: Sally Field","<b>Co-star<\/b>: Sandra Bullock","<b>Co-star<\/b>: Sarah Mahoney","<b>Movie<\/b>: Saving Mr. Banks(2013)","<b>Movie<\/b>: Saving Private Ryan(1998)","<b>Co-star<\/b>: Shelley Long","<b>Movie<\/b>: Sleepless in Seattle(1993)","<b>Movie<\/b>: Splash(1984)","<b>Co-star<\/b>: Stanley Tucci","<b>Co-star<\/b>: Stellan Skarsgård","<b>Co-star<\/b>: Tawny Kitaen","<b>Movie<\/b>: Terminal, The(2004)","<b>Co-star<\/b>: Thomas Horn","<b>Co-star<\/b>: Tim Allen","<b>Co-star<\/b>: Tim Thomerson","<b>Hub<\/b>: Tom Hanks","<b>Co-star<\/b>: Tom Sizemore","<b>Movie<\/b>: Toy Story(1995)","<b>Movie<\/b>: Toy Story 2(1999)","<b>Movie<\/b>: Toy Story 3(2010)","<b>Movie<\/b>: Toy Story of Terror(2013)","<b>Movie<\/b>: Turner & Hooch(1989)","<b>Co-star<\/b>: Victor Garber","<b>Movie<\/b>: Volunteers(1985)","<b>Movie<\/b>: You've Got Mail(1998)","<b>Co-star<\/b>: Zoe Caldwell"],"shape":["dot","star","star","star","dot","star","dot","star","star","dot","star","star","star","dot","star","dot","star","star","star","star","dot","star","star","dot","star","star","star","dot","star","star","star","dot","star","star","dot","star","star","star","star","star","star","star","dot","star","star","star","star","star","star","star","dot","dot","dot","star","star","star","dot","star","star","star","star","star","star","star","star","star","star","star","star","star","dot","star","star","star","star","star","star","star","dot","star","dot","dot","star","star","star","star","star","star","dot","star","star","star","star","star","star","star","star","star","dot","star","star","star","dot","star","dot","star","dot","dot","star","star","star","star","star","star","star","star","star","star","star","star","star","dot","dot","star","dot","dot","star","star","star","dot","star","star","star","star","star","dot","dot","dot","dot","dot","star","dot","dot","star"],"size":[18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,30,18,18,18,18,18,18,18,18,18,18],"label":["'burbs, The","Adrian Zmed","Alexander Godunov","Amy Adams","Angels & Demons","Annie Rose Buckley","Apollo 13","Audrey Tautou","Ayelet Zurer","Bachelor Party","Barkhad Abdi","Barkhad Abdirahman","Barry Pepper","Big","Bill Paxton","Bonfire of the Vanities","Bonnie Hunt","Bruce Dern","Bruce Willis","Buzz Kilman","Captain Phillips","Carl Weathers","Carrie Fisher","Catch Me If You Can","Catherine Keener","Catherine Zeta-Jones","Charles Durning","Charlie Wilson's War","Chi McBride","Christopher Plummer","Christopher Walken","Cloud Atlas","Colin Farrell","Craig T. Nelson","Da Vinci Code, The","Dabney Coleman","Dan Aykroyd","Daryl Hannah","David Andrews","David Morse","Denzel Washington","Don Rickles","Dragnet","Eddie Deezen","Edward Burns","Elizabeth Perkins","Emma Thompson","Eugene Levy","Eva Marie Saint","Ewan McGregor","Extremely Loud and Incredibly Close","Forrest Gump","From the Earth to the Moon","Gary Sinise","Geena Davis","George Grizzard","Green Mile, The","Greg Kinnear","Halle Berry","Harry Morgan","Hector Elizondo","Hugo Weaving","Ian McKellen","Irma P. Hall","J.K. Simmons","Jackie Gleason","Jean Reno","Jim Broadbent","Jim Varney","Joan Cusack","Joe Versus the Volcano","John Candy","John Goodman","John Heard","Julia Roberts","Kelsey Grammer","Kevin Bacon","Kim Cattrall","Ladykillers, The","Lane Smith","Larry Crowne","League of Their Own, A","Leonardo DiCaprio","Leslie Zemeckis","Lloyd Bridges","Lori Petty","Lori Singer","Madonna","Man with One Red Shoe, The","Mare Winningham","Mark Rydell","Marlon Wayans","Martin Sheen","Maureen Stapleton","Meg Ryan","Melanie Griffith","Michael Clarke Duncan","Michael Conner Humphreys","Money Pit, The","Ned Beatty","Nick Searcy","Nona Gaye","Nothing in Common","Parker Posey","Philadelphia","Philip Seymour Hoffman","Polar Express, The","Punchline","Randall Park","Reginald VelJohnson","Rick Ducommun","Rita Wilson","Robert Loggia","Robert Stack","Roberta Maxwell","Robin Wright","Ross Malinger","Roxana Ortega","Sally Field","Sandra Bullock","Sarah Mahoney","Saving Mr. Banks","Saving Private Ryan","Shelley Long","Sleepless in Seattle","Splash","Stanley Tucci","Stellan Skarsgård","Tawny Kitaen","Terminal, The","Thomas Horn","Tim Allen","Tim Thomerson","Tom Hanks","Tom Sizemore","Toy Story","Toy Story 2","Toy Story 3","Toy Story of Terror","Turner & Hooch","Victor Garber","Volunteers","You've Got Mail","Zoe Caldwell"]},"edges":{"from":["Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Tom Hanks","Sally Field","Mark Rydell","John Goodman","Martin Sheen","Leonardo DiCaprio","Christopher Walken","Dan Aykroyd","Harry Morgan","Christopher Plummer","Colin Farrell","Emma Thompson","Annie Rose Buckley","Tawny Kitaen","Adrian Zmed","George Grizzard","Tim Thomerson","Rita Wilson","John Candy","Dabney Coleman","Lori Singer","Charles Durning","Eugene Levy","John Candy","Daryl Hannah","John Heard","Elizabeth Perkins","Robert Loggia","Hector Elizondo","Eva Marie Saint","Jackie Gleason","Alexander Godunov","Shelley Long","Maureen Stapleton","Tim Allen","Joan Cusack","Carl Weathers","Barkhad Abdirahman","Catherine Keener","Barkhad Abdi","Sarah Mahoney","Roxana Ortega","Randall Park","Hugo Weaving","Jim Broadbent","Halle Berry","Stellan Skarsgård","Ayelet Zurer","Ewan McGregor","Thomas Horn","Zoe Caldwell","Sandra Bullock","Amy Adams","Philip Seymour Hoffman","Julia Roberts","Tim Allen","Ned Beatty","Joan Cusack","David Andrews","Lane Smith","Nick Searcy","Bonnie Hunt","Michael Clarke Duncan","David Morse","Tom Sizemore","Edward Burns","Barry Pepper","Kelsey Grammer","Tim Allen","Joan Cusack","Bill Paxton","Kevin Bacon","Gary Sinise","Jim Varney","Tim Allen","Don Rickles","Geena Davis","Lori Petty","Madonna","Robin Wright","Sally Field","Michael Conner Humphreys","Roberta Maxwell","Denzel Washington","Buzz Kilman","Victor Garber","Rita Wilson","Ross Malinger","Craig T. Nelson","Reginald VelJohnson","Mare Winningham","Rick Ducommun","Carrie Fisher","Bruce Dern","Meg Ryan","Robert Stack","Lloyd Bridges","Bruce Willis","Melanie Griffith","Kim Cattrall","Catherine Zeta-Jones","Chi McBride","Stanley Tucci","Audrey Tautou","Jean Reno","Ian McKellen","Eddie Deezen","Leslie Zemeckis","Nona Gaye","Irma P. Hall","Marlon Wayans","J.K. Simmons","Meg Ryan","Parker Posey","Greg Kinnear"],"to":["Punchline","Catch Me If You Can","Dragnet","Saving Mr. Banks","Bachelor Party","Volunteers","Man with One Red Shoe, The","Splash","Big","Nothing in Common","Money Pit, The","Toy Story of Terror","Captain Phillips","Larry Crowne","Cloud Atlas","Angels & Demons","Extremely Loud and Incredibly Close","Charlie Wilson's War","Toy Story 3","From the Earth to the Moon","Green Mile, The","Saving Private Ryan","Toy Story 2","Apollo 13","Toy Story","League of Their Own, A","Forrest Gump","Philadelphia","Sleepless in Seattle","Turner & Hooch","'burbs, The","Joe Versus the Volcano","Bonfire of the Vanities","Terminal, The","Da Vinci Code, The","Polar Express, The","Ladykillers, The","You've Got Mail","Punchline","Punchline","Punchline","Catch Me If You Can","Catch Me If You Can","Catch Me If You Can","Dragnet","Dragnet","Dragnet","Saving Mr. Banks","Saving Mr. Banks","Saving Mr. Banks","Bachelor Party","Bachelor Party","Bachelor Party","Volunteers","Volunteers","Volunteers","Man with One Red Shoe, The","Man with One Red Shoe, The","Man with One Red Shoe, The","Splash","Splash","Splash","Big","Big","Big","Nothing in Common","Nothing in Common","Nothing in Common","Money Pit, The","Money Pit, The","Money Pit, The","Toy Story of Terror","Toy Story of Terror","Toy Story of Terror","Captain Phillips","Captain Phillips","Captain Phillips","Larry Crowne","Larry Crowne","Larry Crowne","Cloud Atlas","Cloud Atlas","Cloud Atlas","Angels & Demons","Angels & Demons","Angels & Demons","Extremely Loud and Incredibly Close","Extremely Loud and Incredibly Close","Extremely Loud and Incredibly Close","Charlie Wilson's War","Charlie Wilson's War","Charlie Wilson's War","Toy Story 3","Toy Story 3","Toy Story 3","From the Earth to the Moon","From the Earth to the Moon","From the Earth to the Moon","Green Mile, The","Green Mile, The","Green Mile, The","Saving Private Ryan","Saving Private Ryan","Saving Private Ryan","Toy Story 2","Toy Story 2","Toy Story 2","Apollo 13","Apollo 13","Apollo 13","Toy Story","Toy Story","Toy Story","League of Their Own, A","League of Their Own, A","League of Their Own, A","Forrest Gump","Forrest Gump","Forrest Gump","Philadelphia","Philadelphia","Philadelphia","Sleepless in Seattle","Sleepless in Seattle","Sleepless in Seattle","Turner & Hooch","Turner & Hooch","Turner & Hooch","'burbs, The","'burbs, The","'burbs, The","Joe Versus the Volcano","Joe Versus the Volcano","Joe Versus the Volcano","Bonfire of the Vanities","Bonfire of the Vanities","Bonfire of the Vanities","Terminal, The","Terminal, The","Terminal, The","Da Vinci Code, The","Da Vinci Code, The","Da Vinci Code, The","Polar Express, The","Polar Express, The","Polar Express, The","Ladykillers, The","Ladykillers, The","Ladykillers, The","You've Got Mail","You've Got Mail","You've Got Mail"],"role":["Steven Gold","Carl Hanratty","Pep Streebeck","Walt Disney","Rick Gassko","Lawrence Whatley Bourne III","Richard Harlan Drew","Allen Bauer","Joshua \"Josh\" Baskin","David Basner","Walter Fielding, Jr.","Woody (Voice)","Captain Richard Phillips","Larry Crowne","Dr. Henry Goose / Hotel Manager / Isaac Sachs / Dermot Hoggins / Cavendish Look-a-Like Actor / Zachry","Robert Langdon","Thomas Schell","Charlie Wilson","Woody (voice)",null,"Paul Edgecomb","Captain John H. Miller","Woody (voice)","Jim Lovell","Woody (voice)","Jimmy Dugan - Manager","Forrest Gump","Andrew Beckett","Sam Baldwin","Scott Turner","Ray Peterson","Joe Banks","Sherman McCoy","Viktor Navorski","Robert Langdon","Hero Boy / Father / Conductor / Hobo / Scrooge / Santa Claus (voice)","Professor G.H. Dorr","Joe Fox","Lilah Krytsick","Romeo","John Krytsick","Roger Strong","Frank Abagnale Jr.","Frank Abagnale","Sgt. Joe Friday","Captain Gannon","Reverend Jonathan Whirley","Travers Robert Goff","P.L. Travers","Ginty","Debbie Thompson","Jay O'Neill","Ed Thompson","John Reynolds","Beth Wexler","Tom Tuttle","Cooper","Maddy","Ross","Walter Kornbluth","Freddie Bauer","Madison","Paul","Susan","MacMillan",null,"Lorraine Basner","Max Basner","Max Beissart","Anna Crowley","Estelle","Buzz Lightyear (Voice)","Jessie (Voice)","Combat Carl / Combat Carl Jr. (Voice)","Bilal","Andrea Phillips","Muse","Samantha","Alvarez","Trainee Wong","Haskell Moore / Tadeusz Kesselring / Bill Smoke / Nurse Noakes / Boardman Mephi / Old Georgie","Captain Molyneux / Vyvyan Ayrs / Timothy Cavendish / Korean Musician / Prescient 2","Native Woman / Jocasta Ayrs / Luisa Rey / Indian Party Guest / Ovid / Meronym","Commander Richter","Vittoria Vetra","Camerlengo Patrick McKenna","Oskar Schell","Oskar's Grandmother","Linda Schell","Bonnie Bach","Gust Avrakotos","Joanne Herring","Buzz Lightyear (voice)","Lotso (voice)","Jessie the Yodeling Cowgirl (voice)",null,null,null,"Jan Edgecomb","John Coffey","Brutus \"Brutal\" Howell","Technical Sergeant Michael Horvath","Private Richard Reiben","Private Daniel Jackson","Stinky Pete the Prospector (voice)","Buzz Lightyear (voice)","Jessie the Yodeling Cowgirl (voice)","Fred Haise","Jack Swigert","Ken Mattingly","Slinky Dog (voice)","Buzz Lightyear (voice)","Mr. Potato Head (voice)","Dottie Hinson - Catcher","Kit Keller - Pitcher","Mae Mordabito - Center Field","Jenny Curran","Mrs. Gump","Young Forrest Gump","Judge Tate","Joe Miller","Crutches","Greg","Suzy","Jonah Baldwin","Howard Hyde","Det. David Sutton","Emily Carson","Art Weingartner","Carol Peterson","Lt. Mark Rumsfield","DeDe/Angelica Graynamore/Patricia Graynamore","Dr. Ellison","Samuel Harvey Graynamore","Peter Fallow","Maria Ruskin","Judy McCoy","Amelia Warren","Mulroy","Frank Dixon","Sophie Neveu","Captain Bezu Fache","Sir Leigh Teabing","Know-It-All (voice)","Sister Sarah / Mother (voice)","Hero Girl (voice)","Marva Munson","Gawain MacSam","Garth Pancake","Kathleen Kelly","Patricia Eden","Frank Navasky"],"title":["<b>Role<\/b>: Steven Gold","<b>Role<\/b>: Carl Hanratty","<b>Role<\/b>: Pep Streebeck","<b>Role<\/b>: Walt Disney","<b>Role<\/b>: Rick Gassko","<b>Role<\/b>: Lawrence Whatley Bourne III","<b>Role<\/b>: Richard Harlan Drew","<b>Role<\/b>: Allen Bauer","<b>Role<\/b>: Joshua \"Josh\" Baskin","<b>Role<\/b>: David Basner","<b>Role<\/b>: Walter Fielding, Jr.","<b>Role<\/b>: Woody (Voice)","<b>Role<\/b>: Captain Richard Phillips","<b>Role<\/b>: Larry Crowne","<b>Role<\/b>: Dr. Henry Goose / Hotel Manager / Isaac Sachs / Dermot Hoggins / Cavendish Look-a-Like Actor / Zachry","<b>Role<\/b>: Robert Langdon","<b>Role<\/b>: Thomas Schell","<b>Role<\/b>: Charlie Wilson","<b>Role<\/b>: Woody (voice)","<b>Role<\/b>: NA","<b>Role<\/b>: Paul Edgecomb","<b>Role<\/b>: Captain John H. Miller","<b>Role<\/b>: Woody (voice)","<b>Role<\/b>: Jim Lovell","<b>Role<\/b>: Woody (voice)","<b>Role<\/b>: Jimmy Dugan - Manager","<b>Role<\/b>: Forrest Gump","<b>Role<\/b>: Andrew Beckett","<b>Role<\/b>: Sam Baldwin","<b>Role<\/b>: Scott Turner","<b>Role<\/b>: Ray Peterson","<b>Role<\/b>: Joe Banks","<b>Role<\/b>: Sherman McCoy","<b>Role<\/b>: Viktor Navorski","<b>Role<\/b>: Robert Langdon","<b>Role<\/b>: Hero Boy / Father / Conductor / Hobo / Scrooge / Santa Claus (voice)","<b>Role<\/b>: Professor G.H. Dorr","<b>Role<\/b>: Joe Fox","<b>Role<\/b>: Lilah Krytsick","<b>Role<\/b>: Romeo","<b>Role<\/b>: John Krytsick","<b>Role<\/b>: Roger Strong","<b>Role<\/b>: Frank Abagnale Jr.","<b>Role<\/b>: Frank Abagnale","<b>Role<\/b>: Sgt. Joe Friday","<b>Role<\/b>: Captain Gannon","<b>Role<\/b>: Reverend Jonathan Whirley","<b>Role<\/b>: Travers Robert Goff","<b>Role<\/b>: P.L. Travers","<b>Role<\/b>: Ginty","<b>Role<\/b>: Debbie Thompson","<b>Role<\/b>: Jay O'Neill","<b>Role<\/b>: Ed Thompson","<b>Role<\/b>: John Reynolds","<b>Role<\/b>: Beth Wexler","<b>Role<\/b>: Tom Tuttle","<b>Role<\/b>: Cooper","<b>Role<\/b>: Maddy","<b>Role<\/b>: Ross","<b>Role<\/b>: Walter Kornbluth","<b>Role<\/b>: Freddie Bauer","<b>Role<\/b>: Madison","<b>Role<\/b>: Paul","<b>Role<\/b>: Susan","<b>Role<\/b>: MacMillan","<b>Role<\/b>: NA","<b>Role<\/b>: Lorraine Basner","<b>Role<\/b>: Max Basner","<b>Role<\/b>: Max Beissart","<b>Role<\/b>: Anna Crowley","<b>Role<\/b>: Estelle","<b>Role<\/b>: Buzz Lightyear (Voice)","<b>Role<\/b>: Jessie (Voice)","<b>Role<\/b>: Combat Carl / Combat Carl Jr. (Voice)","<b>Role<\/b>: Bilal","<b>Role<\/b>: Andrea Phillips","<b>Role<\/b>: Muse","<b>Role<\/b>: Samantha","<b>Role<\/b>: Alvarez","<b>Role<\/b>: Trainee Wong","<b>Role<\/b>: Haskell Moore / Tadeusz Kesselring / Bill Smoke / Nurse Noakes / Boardman Mephi / Old Georgie","<b>Role<\/b>: Captain Molyneux / Vyvyan Ayrs / Timothy Cavendish / Korean Musician / Prescient 2","<b>Role<\/b>: Native Woman / Jocasta Ayrs / Luisa Rey / Indian Party Guest / Ovid / Meronym","<b>Role<\/b>: Commander Richter","<b>Role<\/b>: Vittoria Vetra","<b>Role<\/b>: Camerlengo Patrick McKenna","<b>Role<\/b>: Oskar Schell","<b>Role<\/b>: Oskar's Grandmother","<b>Role<\/b>: Linda Schell","<b>Role<\/b>: Bonnie Bach","<b>Role<\/b>: Gust Avrakotos","<b>Role<\/b>: Joanne Herring","<b>Role<\/b>: Buzz Lightyear (voice)","<b>Role<\/b>: Lotso (voice)","<b>Role<\/b>: Jessie the Yodeling Cowgirl (voice)","<b>Role<\/b>: NA","<b>Role<\/b>: NA","<b>Role<\/b>: NA","<b>Role<\/b>: Jan Edgecomb","<b>Role<\/b>: John Coffey","<b>Role<\/b>: Brutus \"Brutal\" Howell","<b>Role<\/b>: Technical Sergeant Michael Horvath","<b>Role<\/b>: Private Richard Reiben","<b>Role<\/b>: Private Daniel Jackson","<b>Role<\/b>: Stinky Pete the Prospector (voice)","<b>Role<\/b>: Buzz Lightyear (voice)","<b>Role<\/b>: Jessie the Yodeling Cowgirl (voice)","<b>Role<\/b>: Fred Haise","<b>Role<\/b>: Jack Swigert","<b>Role<\/b>: Ken Mattingly","<b>Role<\/b>: Slinky Dog (voice)","<b>Role<\/b>: Buzz Lightyear (voice)","<b>Role<\/b>: Mr. Potato Head (voice)","<b>Role<\/b>: Dottie Hinson - Catcher","<b>Role<\/b>: Kit Keller - Pitcher","<b>Role<\/b>: Mae Mordabito - Center Field","<b>Role<\/b>: Jenny Curran","<b>Role<\/b>: Mrs. Gump","<b>Role<\/b>: Young Forrest Gump","<b>Role<\/b>: Judge Tate","<b>Role<\/b>: Joe Miller","<b>Role<\/b>: Crutches","<b>Role<\/b>: Greg","<b>Role<\/b>: Suzy","<b>Role<\/b>: Jonah Baldwin","<b>Role<\/b>: Howard Hyde","<b>Role<\/b>: Det. David Sutton","<b>Role<\/b>: Emily Carson","<b>Role<\/b>: Art Weingartner","<b>Role<\/b>: Carol Peterson","<b>Role<\/b>: Lt. Mark Rumsfield","<b>Role<\/b>: DeDe/Angelica Graynamore/Patricia Graynamore","<b>Role<\/b>: Dr. Ellison","<b>Role<\/b>: Samuel Harvey Graynamore","<b>Role<\/b>: Peter Fallow","<b>Role<\/b>: Maria Ruskin","<b>Role<\/b>: Judy McCoy","<b>Role<\/b>: Amelia Warren","<b>Role<\/b>: Mulroy","<b>Role<\/b>: Frank Dixon","<b>Role<\/b>: Sophie Neveu","<b>Role<\/b>: Captain Bezu Fache","<b>Role<\/b>: Sir Leigh Teabing","<b>Role<\/b>: Know-It-All (voice)","<b>Role<\/b>: Sister Sarah / Mother (voice)","<b>Role<\/b>: Hero Girl (voice)","<b>Role<\/b>: Marva Munson","<b>Role<\/b>: Gawain MacSam","<b>Role<\/b>: Garth Pancake","<b>Role<\/b>: Kathleen Kelly","<b>Role<\/b>: Patricia Eden","<b>Role<\/b>: Frank Navasky"],"arrows":["to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to"]},"nodesToDataframe":true,"edgesToDataframe":true,"options":{"width":"100%","height":"100%","nodes":{"shape":"dot"},"manipulation":{"enabled":false},"groups":{"hub":{"color":{"background":"#3B82F6","border":"#1D4ED8","highlight":"#93C5FD"}},"movie":{"color":{"background":"#F97316","border":"#C2410C","highlight":"#FED7AA"},"shape":"square"},"useDefaultGroups":true,"costar":{"color":{"background":"#6B7280","border":"#374151","highlight":"#D1D5DB"}}},"edges":{"width":1.5,"color":{"color":"#CBD5E1","highlight":"#3B82F6"}},"interaction":{"hover":true,"zoomSpeed":1},"layout":{"randomSeed":42},"physics":{"solver":"forceAtlas2Based","forceAtlas2Based":{"gravitationalConstant":-60,"springLength":120,"springConstant":0.04}}},"groups":["Movie","Co-star","Hub"],"width":null,"height":null,"idselection":{"enabled":true,"style":"width: 150px; height: 26px","useLabels":true,"main":"Select by id"},"byselection":{"enabled":false,"style":"width: 150px; height: 26px","multiple":false,"hideColor":"rgba(200,200,200,0.5)","highlight":false},"main":null,"submain":null,"footer":null,"background":"rgba(0, 0, 0, 0)","tooltipStay":300,"tooltipStyle":"position: fixed;visibility:hidden;padding: 5px;white-space: nowrap;font-family: verdana;font-size:14px;font-color:#000000;background-color: #f5f4ed;-moz-border-radius: 3px;-webkit-border-radius: 3px;border-radius: 3px;border: 1px solid #808074;box-shadow: 3px 3px 10px rgba(0, 0, 0, 0.2);","highlight":{"enabled":true,"hoverNearest":true,"degree":1,"algorithm":"all","hideColor":"rgba(200,200,200,0.5)","labelOnly":true},"collapse":{"enabled":false,"fit":false,"resetHighlight":true,"clusterOptions":null,"keepCoord":true,"labelSuffix":"(cluster)"},"legend":{"width":0.2,"useGroups":true,"position":"right","ncol":1,"stepX":100,"stepY":100,"zoom":true,"main":{"text":"Node type","style":"font-family:Georgia, Times New Roman, Times, serif;font-weight:bold;font-size:14px;text-align:center;"}}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>Hover over any node to see its label. Use the <strong>Select by id</strong> dropdown or a node to highlight movies shared with Tom Hanks.</p>
</section>
</section>
<section id="conclusion" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="conclusion"><span class="header-section-number">7</span> Conclusion</h2>
<p>neo2R 3.0.0 removes the last friction point for R users who want to work with <strong>Neo4j Aura</strong>: a single <code>startGraph()</code> call now handles cloud and local instances uniformly, the httr2 backend gives reliable retries and clean error handling, and the Cypher query interface remains exactly as it was.</p>
<section id="further-reading" class="level3" data-number="7.1">
<h3 data-number="7.1" class="anchored" data-anchor-id="further-reading"><span class="header-section-number">7.1</span> Further reading</h3>
<ul>
<li><a href="https://cran.r-project.org/package=neo2R" rel="nofollow" target="_blank">neo2R on CRAN</a> — package documentation</li>
<li><a href="https://github.com/patzaw/neo2R" rel="nofollow" target="_blank">neo2R GitHub</a> — source, changelog, and issues</li>
<li><a href="https://console.neo4j.io/" rel="nofollow" target="_blank">Neo4j Aura console</a> — create your free instance</li>
<li><a href="https://neo4j.com/docs/cypher-manual/current/" rel="nofollow" target="_blank">Neo4j Cypher reference</a> — query language docs</li>
<li><a href="https://datastorm-open.github.io/visNetwork/" rel="nofollow" target="_blank">visNetwork documentation</a> — all chart options</li>
</ul>


</section>
</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://patzaw.github.io/posts/neo2R-Aura.html"> Patrice Godard</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/querying-neo4j-aura-from-r-with-neo2r/">Querying Neo4j Aura from R with neo2R</a>]]></content:encoded>
					
		
		<enclosure url="https://patzaw.github.io/posts/images/neo2R-Aura.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">401303</post-id>	</item>
		<item>
		<title>Leaflet attribution</title>
		<link>https://www.r-bloggers.com/2026/05/leaflet-attribution/</link>
		
		<dc:creator><![CDATA[Michael]]></dc:creator>
		<pubDate>Mon, 18 May 2026 04:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r.iresmi.net/posts/2026/leaflet_attribution/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Leaflets – CC-BY-NC-ND by Steve Walser</p>
<p>Note for myself and others: how to remove the “Leaflet &#124; ” prefix in map attribution using the R package {leaflet}.</p>
<p>Note</p>
<p>It’s allowed. Masking the attribution is sometimes useful in certain circumstances, but generally please cite the software and data used…</p>
<p>According to some sources ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/leaflet-attribution/">Leaflet attribution</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://r.iresmi.net/posts/2026/leaflet_attribution/"> r.iresmi.net</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 






<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://www.flickr.com/photos/bassbro/54876966918/" rel="nofollow" target="_blank"><img src="https://i2.wp.com/r.iresmi.net/posts/2026/leaflet_attribution/images/54876966918_065fda8cbb_c.jpg?w=578&#038;ssl=1" class="preview-image img-fluid figure-img" alt="A photo of autumn leaves" data-recalc-dims="1"></a></p>
<figcaption>Leaflets – CC-BY-NC-ND by Steve Walser</figcaption>
</figure>
</div>
<p>Note for myself and others: how to remove the “Leaflet | ” prefix in map attribution using the R package {leaflet}.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>It’s <a href="https://groups.google.com/g/leaflet-js/c/fA6M7fbchOs/m/JTNVhqdc7JcJ?pli=1" rel="nofollow" target="_blank">allowed</a>. Masking the attribution is sometimes useful in certain circumstances, but generally please cite the software and data used…</p>
</div>
</div>
<p>According to <a href="https://stackoverflow.com/questions/57092107/how-can-i-remove-attribution-in-leaflet-map-in-r/77265384" rel="nofollow" target="_blank">some sources</a> we could write <code>leaflet(options = leafletOptions(attributionPrefix = &quot;&quot;))</code> but it doesn’t work in my case.</p>
<p>So instead we can execute some javascript:</p>
<div class="cell">
<pre>library(leaflet)

leaflet() |&gt;
  addTiles(urlTemplate = &quot;&quot;, attribution = &quot;Only my data&quot;) |&gt; 
  htmlwidgets::onRender(&quot;function(el, x) {
    // Remove Leaflet attribution prefix
    this.attributionControl.setPrefix('');
    }&quot;)</pre>
<div id="fig-leaflet" class="cell-output-display quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-leaflet-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="leaflet html-widget html-fill-item" id="htmlwidget-bf42d06680c921c885f8" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-bf42d06680c921c885f8">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"Only my data"}]}]},"evals":[],"jsHooks":{"render":[{"code":"function(el, x, data) {\n  return (function(el, x) {\n    // Remove Leaflet attribution prefix\n    this.attributionControl.setPrefix('');\n    }).call(this.getMap(), el, x, data);\n}","data":null}]}}</script>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-leaflet-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure 1: A blank leaflet map with a custom attribution
</figcaption>
</figure>
</div>
</div>


<!-- -->


 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://r.iresmi.net/posts/2026/leaflet_attribution/"> r.iresmi.net</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/leaflet-attribution/">Leaflet attribution</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401318</post-id>	</item>
		<item>
		<title>Five tips for managing your R-universe 🚀</title>
		<link>https://www.r-bloggers.com/2026/05/five-tips-for-managing-your-r-universe-%f0%9f%9a%80/</link>
		
		<dc:creator><![CDATA[R &#124; Dr Tom Palmer]]></dc:creator>
		<pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://remlapmot.github.io/post/2026/runiverse-tips/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Introduction</p>
<p>rOpenSci’s<br />
R-universe system is an open source platform allowing users to create their own CRAN-like universe of R packages.<br />
It is absolutely fantastic. It is particularly useful in one area I research, Mendelian randomization (at ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/five-tips-for-managing-your-r-universe-%f0%9f%9a%80/">Five tips for managing your R-universe 🚀</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://remlapmot.github.io/post/2026/runiverse-tips/"> R | Dr Tom Palmer</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<h2 id="introduction">Introduction</h2>
<p>
<a href="https://ropensci.org/" rel="nofollow" target="_blank">rOpenSci</a>’s 
<a href="https://r-universe.dev/" rel="nofollow" target="_blank">R-universe</a> system is an open source platform allowing users to create their own CRAN-like universe of R packages.</p>
<p>It is absolutely fantastic. It is particularly useful in one area I research, Mendelian randomization (at the interface of Epidemiology and Genetic Epidemiology), because a lot of the packages are GitHub/GitLab-only.</p>
<p>Therefore, I setup and maintain 
<a href="https://mrcieu.r-universe.dev/" rel="nofollow" target="_blank">https://mrcieu.r-universe.dev/</a> to include both packages from our MRCIEU GitHub organisation (from the MRC Integrative Epidemiology Unit at the University of Bristol, UK), and as many of the GitHub-only packages for Mendelian randomization I could find.</p>
<p>It is difficult to overstate how useful this is. For the first time, not only do researchers have a list of the Mendelian randomization packages in one place but they can install binaries &#8211; without having to go through the hassle &#8211; especially on (Ubuntu) Linux &#8211; of <code>remotes::install_github()</code>. Researchers can also see how often packages are updated and R-universe checks for changes in packages approximately every hour, keeping it always up to date.</p>
<p>This post gives five tips I have developed to help manage my R-universe.</p>
<h2 id="tip-1-referring-to-a-package-from-a-pull-request-instead-of-from-a-branch-on-a-fork">Tip 1: Referring to a package from a pull request instead of from a branch on a fork</h2>
<p>In the Mendelian randomization field many of these GitHub-only packages are not well written or abandoned once the PhD student/researcher leaves. Often when I add a package to our R-universe I find that their build fails, or they have <code>R CMD check</code> errors and warnings, or after several months their build fails because they are not maintained. I sometimes look into the failed builds and <code>check</code> problems. If it’s clear just a few fixes are required to rectify the situation I often open a pull request. Often that pull request is not responded to.</p>
<p>Previously, for such cases I would switch the source of the package entry in <em>packages.json</em> to be from the relevant branch on my fork. However, I have always felt a bit uneasy about this. I wondered if GitHub had a way to refer to the pull request branch without having to switch the repository. It turns out that it does. The format of pull request branch names is <code>refs/pull/{number}/head</code> where <code>{number}</code> is the number assigned once the PR is opened. Therefore, when I open a PR on a package I now add the <code>&quot;branch&quot;</code> field to the package entry in <em>packages.json</em> as follows.</p>
<pre>  {
    &quot;package&quot;: &quot;GWASBrewer&quot;,
    &quot;url&quot;: &quot;https://github.com/jean997/GWASBrewer&quot;,
    &quot;branch&quot;: &quot;refs/pull/18/head&quot;
  },
</pre>
<p>I switch back to the default branch if the PR is merged.</p>
<h2 id="tip-2-justfile-recipe-for-adding-a-package-to-packagesjson">Tip 2: Justfile recipe for adding a package to packages.json</h2>
<p>I regularly find that I need to add or remove a package. Manually editing the <em>packages.json</em> file is not hard, but I have found the following 
<a href="https://just.systems/" rel="nofollow" target="_blank">Justfile</a> (Just is like Make, but specifically designed for running commands and has a much friendlier syntax) recipes helpful for doing this quickly.</p>
<p>These recipes require 
<a href="https://docs.astral.sh/uv/" rel="nofollow" target="_blank">uv</a> and just to be installed and on your <code>PATH</code> (uv automatically installs the required version of Python and creates/destroys/manages any required virtual environments). To use them, copy them into a text file named <em>justfile</em> at the top level of your R-universe registry repository and follow the instructions.</p>
<p>This recipe adds a package to your <em>packages.json</em> in alphabetical order. It has one required argument and 3 optional arguments.</p>
<pre># add a package entry to packages.json in alphabetical order
[arg(&quot;branch&quot;, short=&quot;b&quot;)]
[arg(&quot;pkgname&quot;, short=&quot;p&quot;)]
[arg(&quot;subdir&quot;, short=&quot;s&quot;)]
add url pkgname=&quot;&quot; branch=&quot;&quot; subdir=&quot;&quot;:
    #!/usr/bin/env -S uv run --python 3.14 python3
    import json, re, sys
    url = &quot;{{ url }}&quot;
    if re.fullmatch(r'[^/]+/[^/]+', url):
        url = f&quot;https://github.com/{url}&quot;
    pkgname = &quot;{{ pkgname }}&quot; or url.rstrip(&quot;/&quot;).split(&quot;/&quot;)[-1]
    branch = &quot;{{ branch }}&quot;
    subdir = &quot;{{ subdir }}&quot;
    with open(&quot;packages.json&quot;) as f:
        packages = json.load(f)
    if any(p[&quot;package&quot;] == pkgname for p in packages):
        print(f&quot;Error: '{pkgname}' already exists in packages.json&quot;, file=sys.stderr)
        sys.exit(1)
    entry = {&quot;package&quot;: pkgname, &quot;url&quot;: url}
    if branch:
        entry[&quot;branch&quot;] = branch
    if subdir:
        entry[&quot;subdir&quot;] = subdir
    packages.append(entry)
    packages.sort(key=lambda p: p[&quot;package&quot;].lower())
    with open(&quot;packages.json&quot;, &quot;w&quot;) as f:
        json.dump(packages, f, indent=2)
        f.write(&quot;\n&quot;)
    print(f&quot;Added {pkgname}&quot;)
</pre>
<p>Where <code>url</code> is say <code>https://github.com/MRCIEU/TwoSampleMR</code>, except that for GitHub packages you can specify this as <code>MRCIEU/TwoSampleMR</code>.</p>
<p>To add a GitHub package whose name matches its repository name, simply run</p>
<pre>just add username/reponame
</pre>
<p>You can inspect the recipe’s arguments and options with</p>
<pre>just --usage add

Usage: just add [OPTIONS] url

Arguments:
  url

Options:
  -p pkgname [default: &quot;&quot;]
  -b branch [default: &quot;&quot;]
  -s subdir [default: &quot;&quot;]
</pre>
<p>The 3 optional arguments allow you to specify the package name (<code>-p pkgname</code>), branch (<code>-b branchname</code>), or subdirectory (<code>-s subdirectory</code>) the package is in. For example, to add a GitHub package whose package name does not match its repository name run</p>
<pre>just add username/reponame -p pkgname
</pre>
<h2 id="tip-3-justfile-recipe-for-removing-a-package-from-packagesjson">Tip 3: Justfile recipe for removing a package from packages.json</h2>
<p>This recipe removes a package from your <em>packages.json</em>.</p>
<pre># remove a package entry from packages.json
remove pkgname:
    #!/usr/bin/env -S uv run --python 3.14 python3
    import json, sys
    pkgname = &quot;{{ pkgname }}&quot;
    with open(&quot;packages.json&quot;) as f:
        packages = json.load(f)
    filtered = [p for p in packages if p[&quot;package&quot;] != pkgname]
    if len(filtered) == len(packages):
        print(f&quot;Error: '{pkgname}' not found in packages.json&quot;, file=sys.stderr)
        sys.exit(1)
    with open(&quot;packages.json&quot;, &quot;w&quot;) as f:
        json.dump(filtered, f, indent=2)
        f.write(&quot;\n&quot;)
    print(f&quot;Removed {pkgname}&quot;)
</pre>
<p>Run it with</p>
<pre>just remove pkgname
</pre>
<h2 id="tip-4-justfile-recipe-for-checking-packagesjson-is-valid">Tip 4: Justfile recipe for checking packages.json is valid</h2>
<p>When manually editing <em>packages.json</em> it is very easy to forget a comma or to miss a closing bracket or quotation mark. This recipe checks your JSON is valid.</p>
<pre># check packages.json
check:
    uv run --python 3.14 -m json.tool packages.json &gt; /dev/null && echo &quot;JSON check passed&quot;
</pre>
<p>Run it with</p>
<pre>just check
</pre>
<h2 id="tip-5-conveniently-view-a-packages-dependencies">Tip 5: Conveniently view a package’s dependencies</h2>
<p>Knowing a package’s full strong dependency list is useful — for example, when a breaking change somewhere in the chain causes unexpected build failures. While there are several ways to determine this in R, R-universe shows you the full list immediately.</p>
<p>Navigate to the R-universe page for the package you are interested in, say 
<a href="https://mrcieu.r-universe.dev/TwoSampleMR" rel="nofollow" target="_blank">https://mrcieu.r-universe.dev/TwoSampleMR</a> and click the dependencies pill.</p>
<img src="https://i0.wp.com/remlapmot.github.io/post/2026/runiverse-tips/img/twosamplemr-dependencies-hover.png?w=450&#038;ssl=1" alt="Screenshot of hovering mouse over dependencies pill on an R-universe package page." style="display: block; margin: auto;" data-recalc-dims="1">
<p>It expands showing the full dependency list.</p>
<img src="https://i2.wp.com/remlapmot.github.io/post/2026/runiverse-tips/img/twosamplemr-dependencies-clicked.png?w=450&#038;ssl=1" alt="Screenshot of expanding the dependencies pill on an R-universe package page." style="display: block; margin: auto;" data-recalc-dims="1">
<h2 id="summary">Summary</h2>
<p>In summary, I have shown five tips I find useful to manage a large R-universe.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://remlapmot.github.io/post/2026/runiverse-tips/"> R | Dr Tom Palmer</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/five-tips-for-managing-your-r-universe-%f0%9f%9a%80/">Five tips for managing your R-universe 🚀</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401290</post-id>	</item>
		<item>
		<title>Conformalized TabPFN: Prediction Intervals for a Pretrained Transformer for Tabular Data in Python and R</title>
		<link>https://www.r-bloggers.com/2026/05/conformalized-tabpfn-prediction-intervals-for-a-pretrained-transformer-for-tabular-data-in-python-and-r/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/05/17/r/python/conformalized-tabpfn</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Prediction Intervals for Tabular Regression in Python and R via Conformalized TabPFN</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/conformalized-tabpfn-prediction-intervals-for-a-pretrained-transformer-for-tabular-data-in-python-and-r/">Conformalized TabPFN: Prediction Intervals for a Pretrained Transformer for Tabular Data in Python and R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/17/r/python/conformalized-tabpfn"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Knowing a model’s prediction is useful. Knowing how confident that prediction is, even more so. Conformal prediction provides exactly that: statistically valid prediction intervals with guaranteed coverage (under certain conditions), regardless of the underlying model or data distribution.</p>

<p>In this post, we pair two powerful tools: <code>TabPFN</code>, a <strong>pretrained transformer for tabular data</strong>, and <code>nnetsauce</code>’s <code>PredictionInterval</code> (which implements Split Conformal Prediction), which wraps any scikit-learn-compatible regressor into a conformal predictor. We demonstrate the full pipeline on the diabetes dataset, first in Python, then in R via reticulate. Both versions produce identical results: a coverage rate of 96.7% at a nominal 95% level.</p>

<h1 id="1---python-version">1 &#8211; Python version</h1>

<pre>!pip install tabpfn tabpfn_client

!pip install nnetsauce

import tabpfn_client

API_TOKEN = &quot;&quot; # &lt;- Paste your TabPFN token here (from https://priorlabs.ai/tabpfn)


tabpfn_client.set_access_token(API_TOKEN)

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from tabpfn_client import TabPFNRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

reg = TabPFNRegressor()

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

reg.fit(X_train, y_train)
preds = reg.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, preds))
print(-rmse)

00:00 Fitting... |

WARNING:tabpfn_client.client:The provided train set hashes match previously uploaded train sets.


00:00 Fitting... Done!
00:00 Predicting... -

WARNING:tabpfn_client.client:The provided test set hash matches a previously uploaded test set.


00:01 Predicting... Done!
-51.559912022529886

import nnetsauce as ns

reg_conformal = ns.PredictionInterval(reg, level=95)
reg_conformal.fit(X_train, y_train)
preds = reg_conformal.predict(X_test, return_pi=True)

00:00 Fitting... |

WARNING:tabpfn_client.client:The provided train set hashes match previously uploaded train sets.


00:00 Fitting... Done!
00:00 Predicting... -

WARNING:tabpfn_client.client:The provided test set hash matches a previously uploaded test set.


00:01 Predicting... Done!
00:00 Predicting... -

WARNING:tabpfn_client.client:The provided test set hash matches a previously uploaded test set.


00:01 Predicting... Done!
00:00 Predicting... -

WARNING:tabpfn_client.client:The provided test set hash matches a previously uploaded test set.


00:01 Predicting... Done!

print(f&quot;coverage_rate: {np.mean((preds.lower&lt;=y_test)*(preds.upper&gt;=y_test))}&quot;)

coverage_rate: 0.9662921348314607

import warnings
import matplotlib.pyplot as plt


warnings.filterwarnings('ignore')

split_color = 'green'
split_color2 = 'orange'
local_color = 'gray'

def plot_func(x,
              y,
              y_u=None,
              y_l=None,
              pred=None,
              shade_color=&quot;&quot;,
              method_name=&quot;&quot;,
              title=&quot;&quot;):

    fig = plt.figure()

    plt.plot(x, y, 'k.', alpha=.3, markersize=10,
             fillstyle='full', label=u'Test set observations')

    if (y_u is not None) and (y_l is not None):
        plt.fill(np.concatenate([x, x[::-1]]),
                 np.concatenate([y_u, y_l[::-1]]),
                 alpha=.3, fc=shade_color, ec='None',
                 label = method_name + ' Prediction interval')

    if pred is not None:
        plt.plot(x, pred, 'k--', lw=2, alpha=0.9,
                 label=u'Predicted value')

    #plt.ylim([-2.5, 7])
    plt.xlabel('$X$')
    plt.ylabel('$Y$')
    plt.legend(loc='upper right')
    plt.title(title)

    plt.show()


max_idx = 50
plot_func(x = range(max_idx),
          y = y_test[0:max_idx],
          y_u = preds.upper[0:max_idx],
          y_l = preds.lower[0:max_idx],
          pred = preds.mean[0:max_idx],
          shade_color=split_color2,
          title = f&quot;conformalized TabPFN ({max_idx} first points in test set)&quot;)

</pre>

<p><img src="https://i2.wp.com/thierrymoudiki.github.io/images/2026-05-17/2026-05-17-conformalized-tabpfn_10_0.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>

<h1 id="2---r-version">2 &#8211; R version</h1>

<p>For this R version, I used R in the same notebook as Python, in Google Colab.</p>

<pre>%load_ext rpy2.ipython

%R install.packages(&quot;reticulate&quot;)

%%R

# Conformalized TabPFN in R via reticulate

library(reticulate)

# ── 0. Python environment ──────────────────────────────────────────────────────
# Use your preferred Python env. Uncomment one (automatic on Google Colab):
# use_python(&quot;/usr/bin/python3&quot;)
# use_virtualenv(&quot;r-tabpfn&quot;)
# use_condaenv(&quot;r-tabpfn&quot;)

# Install required packages into the active Python env (run once)
# py_install(c(&quot;tabpfn&quot;, &quot;tabpfn_client&quot;, &quot;nnetsauce&quot;, &quot;scikit-learn&quot;,
#              &quot;matplotlib&quot;, &quot;numpy&quot;), pip = TRUE)

# ── 1. Imports ─────────────────────────────────────────────────────────────────
sklearn_datasets  &lt;- import(&quot;sklearn.datasets&quot;)
sklearn_model_sel &lt;- import(&quot;sklearn.model_selection&quot;)
sklearn_metrics   &lt;- import(&quot;sklearn.metrics&quot;)
tabpfn_client     &lt;- import(&quot;tabpfn_client&quot;)
ns                &lt;- import(&quot;nnetsauce&quot;)
np                &lt;- import(&quot;numpy&quot;)
plt               &lt;- import(&quot;matplotlib.pyplot&quot;)
warnings          &lt;- import(&quot;warnings&quot;)

# ── 2. TabPFN API token ────────────────────────────────────────────────────────
API_TOKEN &lt;- &quot;&quot;   # &lt;-- paste your TabPFN token here (from https://priorlabs.ai/tabpfn)
tabpfn_client$set_access_token(API_TOKEN)

TabPFNRegressor &lt;- tabpfn_client$TabPFNRegressor

# ── 3. Data ────────────────────────────────────────────────────────────────────
diabetes   &lt;- sklearn_datasets$load_diabetes(return_X_y = TRUE)
X          &lt;- diabetes[[1]]
y          &lt;- diabetes[[2]]

split      &lt;- sklearn_model_sel$train_test_split(X, y, test_size = 0.2, random_state = 42L)
X_train    &lt;- split[[1]]
X_test     &lt;- split[[2]]
y_train    &lt;- split[[3]]
y_test     &lt;- split[[4]]

# ── 4. Fit TabPFN regressor ────────────────────────────────────────────────────
reg   &lt;- TabPFNRegressor()
reg$fit(X_train, y_train)
preds_plain &lt;- reg$predict(X_test)

rmse &lt;- sqrt(sklearn_metrics$mean_squared_error(y_test, preds_plain))
cat(sprintf(&quot;TabPFN RMSE: %.4f\n&quot;, rmse))

# ── 5. Conformal prediction with nnetsauce ─────────────────────────────────────
reg_conformal &lt;- ns$PredictionInterval(reg, level = 95L)
reg_conformal$fit(X_train, y_train)
preds &lt;- reg_conformal$predict(X_test, return_pi = TRUE)

coverage &lt;- np$mean((preds$lower &lt;= y_test) * (preds$upper &gt;= y_test))
cat(sprintf(&quot;Coverage rate: %.4f\n&quot;, coverage))

# ── 6. Plot (first 50 test points) ────────────────────────────────────────────
warnings$filterwarnings(&quot;ignore&quot;)

max_idx    &lt;- 50L
x_range    &lt;- np$array(0:(max_idx - 1))   # numeric index
y_obs      &lt;- y_test[1:max_idx]
y_upper    &lt;- preds$upper[1:max_idx]
y_lower    &lt;- preds$lower[1:max_idx]
y_pred     &lt;- preds$mean[1:max_idx]

# Build the filled polygon (matplotlib-style concatenation)
x_fill &lt;- np$concatenate(list(x_range, x_range[max_idx:1]))
y_fill &lt;- np$concatenate(list(y_upper, y_lower[max_idx:1]))

fig &lt;- plt$figure()
plt$plot(x_range, y_obs,  &quot;k.&quot;, alpha = 0.3, markersize = 10L,
         label = &quot;Test set observations&quot;)
plt$fill(x_fill, y_fill, alpha = 0.3, fc = &quot;orange&quot;, ec = &quot;None&quot;,
         label = &quot;Conformal Prediction interval&quot;)
plt$plot(x_range, y_pred, &quot;k--&quot;, lw = 2L, alpha = 0.9,
         label = &quot;Predicted value&quot;)
plt$xlabel(&quot;Index&quot;)
plt$ylabel(&quot;Y&quot;)
plt$legend(loc = &quot;upper right&quot;)
plt$title(sprintf(&quot;Conformalized TabPFN (first %d points in test set)&quot;, max_idx))
plt$tight_layout()
plt$show()
# To save instead: plt$savefig(&quot;conformalized_tabpfn.png&quot;, dpi = 150L)

00:02 Fitting... Done!
00:02 Predicting... Done!
TabPFN RMSE: 51.5599
00:01 Fitting... Done!
00:02 Predicting... Done!
00:00 Predicting... -

WARNING:tabpfn_client.client:The provided test set hash matches a previously uploaded test set.


00:01 Predicting... Done!
00:02 Predicting... Done!
Coverage rate: 0.9663
</pre>

<p><img src="https://i2.wp.com/thierrymoudiki.github.io/images/2026-05-17/2026-05-17-conformalized-tabpfn_14_3.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/17/r/python/conformalized-tabpfn"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/conformalized-tabpfn-prediction-intervals-for-a-pretrained-transformer-for-tabular-data-in-python-and-r/">Conformalized TabPFN: Prediction Intervals for a Pretrained Transformer for Tabular Data in Python and R</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401270</post-id>	</item>
		<item>
		<title>Exploring the CovR/S Two-Component System in Streptococcus pyogenes</title>
		<link>https://www.r-bloggers.com/2026/05/exploring-the-covr-s-two-component-system-in-streptococcus-pyogenes/</link>
		
		<dc:creator><![CDATA[r on Everyday Is A School Day]]></dc:creator>
		<pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.kenkoonwong.com/blog/haddock/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
Exploring the CovR/S two-component system in Group A Strep 🧫 — from genome annotation with Bakta &#038; BaktFold, to AlphaFold confidence metrics, and a first attempt at protein docking with Haddock3. Learning as we go! 🙌</p>
<p>Motivations</p>
<p>     ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/exploring-the-covr-s-two-component-system-in-streptococcus-pyogenes/">Exploring the CovR/S Two-Component System in Streptococcus pyogenes</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.kenkoonwong.com/blog/haddock/"> r on Everyday Is A School Day</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<blockquote>
<p>Exploring the CovR/S two-component system in Group A Strep <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f9eb.png" alt="🧫" class="wp-smiley" style="height: 1em; max-height: 1em;" /> — from genome annotation with Bakta &#038; BaktFold, to AlphaFold confidence metrics, and a first attempt at protein docking with Haddock3. Learning as we go! <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f64c.png" alt="🙌" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
</blockquote>
<p align="center">
  <img loading="lazy" src="https://i2.wp.com/www.kenkoonwong.com/blog/haddock/strep.jpg?w=50%25&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>




<h2 id="motivations">Motivations
  <a href="https://www.kenkoonwong.com/blog/haddock/#motivations" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Since the last 
<a href="https://www.kenkoonwong.com/blog/ampc/" rel="nofollow" target="_blank">ampC adventure</a>, I’m really curious about the mechanism of some of these bacterial virulence. Remember how chromosomal ampC organisms use ampG, ampD, and then ampR to repress class C beta lactamase gene? It’s such an orchestrated endeavor. What about streptococcus pyogenes and its virulence? How can it be a colonizer on one end and then virulence on the other that caused a number of devastating infection? Let’s learn a bit of the mechanism, and of course why not use this opportunity too to learn some other bioinformatic tools along the way? And see if we can use existing knowledge to make it more fun and educational! I’m looking forward to this! Join me in exploring the mechanism of the CovR/S two-component system in streptococcus pyogenes, aka Group A strep!</p>




<h2 id="objectives">Objectives:
  <a href="https://www.kenkoonwong.com/blog/haddock/#objectives" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#covrs" rel="nofollow" target="_blank">What is CovR/S Two-component System?</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#ncbi" rel="nofollow" target="_blank">Let’s Look A Where Does It Show in NCBI</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#annotate" rel="nofollow" target="_blank">What If We Have WGS? How to Annotate?</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#hypothetical" rel="nofollow" target="_blank">What Are Hypothetical?</a>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#alphafoid" rel="nofollow" target="_blank">What Is An Acceptable AlphaFold Confidence?</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#baktfold" rel="nofollow" target="_blank">A New Tool Called BaktFold</a></li>
</ul>
</li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#strep" rel="nofollow" target="_blank">Do All Streptococcus Pyogenes Have CovR/S Two-component System?</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#otherstrep" rel="nofollow" target="_blank">Do Other Streptococcus species Have CovR/S Two-component System</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#covr" rel="nofollow" target="_blank">What Does CovR Look Like?</a>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#phos" rel="nofollow" target="_blank">What would a Phosphorylated CovR Look like?</a></li>
</ul>
</li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#opportunities" rel="nofollow" target="_blank">Opportunities For Improvement</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/haddock/#lessons" rel="nofollow" target="_blank">Lessons Learnt</a></li>
</ul>




<h2 id="covrs">What is CovR/S Two-component System?
  <a href="https://www.kenkoonwong.com/blog/haddock/#covrs" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>The CovR/S (Control of Virulence) system functions as a sophisticated environmental sensor that integrates multiple host-derived signals to orchestrate the transition from colonization to invasive disease. The evidence reveals three primary environmental triggers that modulate this master regulatory system. Simplistically, <code>CovS (Sensor)</code> senses first -> gets activated -> trigger <code>CovR (Regulator)</code> -> downstream repression. Breakage of such system will de-repress the virulence factors.</p>
<ol>
<li>
<p><strong>Magnesium Levels</strong>: The Baseline Sensor
High extracellular magnesium concentrations (typical of healthy tissue) activate CovS kinase activity, leading to increased CovR phosphorylation and repression of virulence genes. This creates a colonization-friendly state where GAS maintains low virulence factor expression suitable for asymptomatic carriage.</p>
</li>
<li>
<p><strong>LL-37 Antimicrobial Peptide</strong>: The Invasion Signal
LL-37 cathelicidin peptide — released by neutrophils and epithelial cells during inflammation — directly binds to the extracellular domain of CovS and inhibits its kinase activity. This creates a paradoxical host-pathogen interaction where the host’s antimicrobial defense actually triggers bacterial virulence. LL-37 binding to CovS reduces CovR phosphorylation, leading to derepression of multiple virulence factors including pyrogenic exotoxin A, DNase Sda1, streptolysin O, and hyaluronic acid capsule. Critically, LL-37 signaling converts GAS from a colonizing to an invasive phenotype, with marked increases in resistance to opsonophagocytic killing by human leukocytes.</p>
</li>
<li>
<p><strong>Acidic Stress</strong>: The Tissue Environment Sensor
Acidic conditions (pH < 7.0, typical of infected or inflamed tissue) enhance CovR/S-dependent gene repression through activation of the covR/S promoter itself. This creates a negative feedback loop where tissue acidosis increases CovR/S expression, which then more strongly represses virulence factors.</p>
</li>
</ol>
<p>Below is an image referenced directly from source that depicts the mechanism of CovR/S system in streptococcus pyogenes.</p>
<p><img src="https://www.pnas.org/cms/10.1073/pnas.202353699/asset/49eaaa3c-e74d-44d4-af86-b97d7eba66ba/assets/graphic/pq2023536004.jpeg$0" alt=""></p>




<h2 id="ncbi">Let’s Look A Where Does It Show in NCBI
  <a href="https://www.kenkoonwong.com/blog/haddock/#ncbi" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Let’s go to 
<a href="https://www.ncbi.nlm.nih.gov/datasets/gene/GCF_900475035.1/?search=cov" rel="nofollow" target="_blank">here</a>. I picked out streptococcus pyogenes reference genome annotation and search for <code>cov</code> and this popped up.</p>
<p align="center">
  <img loading="lazy" src="https://i2.wp.com/www.kenkoonwong.com/blog/haddock/ncbi.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>There you go! CovS and CovR. Sometimes these system <code>can also be known as CsrR/CsrS (Capsule Synthesis Regulator)</code>. There may have been 2 different research groups discovered these identical gene and called it differently?</p>
<p>It’s also so interesting that these 2 genes are so close to each other. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h2 id="annotate">What If We Have WGS? How to Annotate?
  <a href="https://www.kenkoonwong.com/blog/haddock/#annotate" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Alright, let’s pick a random streptococcus pyogenes and see if we can use bakta to help us annotate. Let’s look at this one.</p>
<p align="center">
  <img loading="lazy" src="https://i1.wp.com/www.kenkoonwong.com/blog/haddock/random.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>
<a href="https://github.com/oschwengers/bakta" rel="nofollow" target="_blank">Install Bakta, see here</a></p>
<pre>#make sure you use the environment name you created
conda activate bakta_env 

bakta \
  --db /path/to/bakta_db \
  --output rabdom_bakta_output \
  --prefix random \
  --threads 8 \
  --skip-crispr \
  --force \
  random.fna
</pre><p>After it is done, when you look in the folder, you will see something like this</p>
<p align="center">
  <img loading="lazy" src="https://i0.wp.com/www.kenkoonwong.com/blog/haddock/bakta.png?w=40%25&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>When we look at the annotated gff3 file, we can see that there are 2 features annotated as <code>two-component system response regulator</code> and <code>two-component system sensor histidine kinase</code>. These are likely to be CovR and CovS.</p>
<pre>library(ape)
library(tidyverse)

readLines(&quot;random.gff3&quot;) |&gt; str_detect(&quot;FASTA&quot;) |&gt; which() #found fasta, apparently bakta has ###FASTA inserted and ape cannot handle

## [1] 2025

tmp &lt;- tempfile()
readLines(&quot;random.gff3&quot;)[1:2024] |&gt; writeLines(tmp)
gff &lt;- read.gff(tmp, GFF3 = T)
gff |&gt;
  filter(str_detect(attributes, &quot;[Cc][Oo][Vv]&quot;))

##        seqid    source type  start    end score strand phase
## 331 contig_1 Pyrodigal  CDS 303144 303830    NA      +     0
## 332 contig_1 Pyrodigal  CDS 303836 305338    NA      +     0
##                                                                                                                                                                                                                             attributes
## 331           ID=EEBJGP_00327;Name=two-component system response regulator CovR;locus_tag=EEBJGP_00327;product=two-component system response regulator CovR;Dbxref=BlastRules:WP_002991052,SO:0001217,UniRef:UniRef50_Q49XM7;gene=covR
## 332 ID=EEBJGP_00328;Name=two-component system sensor histidine kinase CovS;locus_tag=EEBJGP_00328;product=two-component system sensor histidine kinase CovS;Dbxref=BlastRules:WP_002991036,SO:0001217,UniRef:UniRef50_D3KVE8;gene=covS
</pre><p>There you go! We found them after annotation. Wait a minute… what are those <code>hypotheticals</code> on our folder? Why are they there? Are they important? Let’s find out.</p>




<h2 id="hypothetical">What Are Hypothetical?
  <a href="https://www.kenkoonwong.com/blog/haddock/#hypothetical" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Hypotheticals are those proteins that we don’t know what they do. They are annotated as “hypothetical protein” because they are predicted to be proteins based on the DNA sequence, but we have no experimental evidence of their function. They are often annotated as “hypothetical” because they have no known homologs in other organisms, or because they have no known domains or motifs that can be used to predict their function.</p>
<p>When we take a peek at the <code>random.hypotheticals.tsv</code>, it looks like this:</p>
<p align="center">
  <img loading="lazy" src="https://i1.wp.com/www.kenkoonwong.com/blog/haddock/hypothetical.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>Let’s check how many hypotheticals we have here for this genome</p>
<pre>tmp &lt;- tempfile()
readLines(&quot;random.hypotheticals.tsv&quot;)[3:194] |&gt; writeLines(tmp)
hypo &lt;- read_tsv(tmp)
nrow(hypo)

## [1] 191
</pre><p>OK we have 191 of hypotheticals. Let’s see if a new tool on the block will be able to add some annotation to these hypotheticals and see if we can find anything interesting. We could also use filter and see if we can see those hypotheticals</p>
<pre>gff |&gt;
  filter(str_detect(attributes,&quot;hypothetical protein&quot;)) |&gt;
  nrow()

## [1] 192
</pre><p>Hmm.. they don’t tally. But let’s move on.</p>




<h2 id="baktfold">A New Tool Called BaktFold
  <a href="https://www.kenkoonwong.com/blog/haddock/#baktfold" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>
<a href="https://github.com/gbouras13/baktfold" rel="nofollow" target="_blank">BaktFold</a> is a new tool that uses AlphaFold to predict the structure of proteins and then uses that structure to predict the function of the protein. It is a very powerful tool that can be used to annotate hypothetical proteins. 
<a href="https://www.biorxiv.org/content/10.64898/2026.03.31.715528v1" rel="nofollow" target="_blank">Check out their paper</a></p>
<pre>baktfold run \
  -i random.json \
  -o random_baktfold_output \
  -d /path/to/baktfold_db \
  -t 8 \
  -f 

readLines(&quot;baktfold.gff3&quot;) |&gt; str_detect(&quot;FASTA&quot;) |&gt; which() #found fasta, apparently bakta has ###FASTA inserted and ape cannot handle

## [1] 2025

tmp &lt;- tempfile()
readLines(&quot;baktfold.gff3&quot;)[1:2024] |&gt; writeLines(tmp)
gff_baktfold &lt;- read.gff(tmp, GFF3 = T)
gff_baktfold |&gt;
  filter(str_detect(attributes,&quot;hypothetical protein&quot;)) |&gt;
  nrow()

## [1] 106
</pre><p>Wow, this is really cool! We can see that we have much less hypotheticals! About 86 less! Let’s take a look what were previous hypotheticals and what they are annotated now and how?</p>
<pre>gff_hypo &lt;- gff |&gt;
  filter(str_detect(attributes,&quot;hypothetical protein&quot;)) |&gt;
  pull(start)

new_df &lt;- tibble(start = gff_hypo, temp = NA)

gff_baktfold_hypo &lt;- gff_baktfold |&gt;
  right_join(new_df) |&gt;
  mutate(temp = case_when(
    !str_detect(attributes,&quot;hypothetical protein&quot;) ~ attributes,
    TRUE ~ temp
  ))

## Joining with `by = join_by(start)`
</pre><p align="center">
  <img loading="lazy" src="https://i1.wp.com/www.kenkoonwong.com/blog/haddock/baktfold_hypo.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>Take a look at the image above, if you see temp column with NA, that would be hypotheticals from bakta. If you see it filled, it means baktfold was able to identify it across one or more databases. It’s interesting how baktfold work, it conducts sequential protein structure-based searches against four complementary structure databases (SwissProt, Alphafold Cluster Database, PDB, CATH). Protein sequences are transformed into Foldseek 3Di tokens via the ProstT5 protein language model and subsequently searched against structure databases via Foldseek. Pretty cool! Also, interestingly, when comparing a few of the baktfold predicted functional proteins with NCBI’s annotation, we sometimes do see some baktfold-annotated functions whereas NCBI labeled them as uncharacterized gene. This is not an exhaustive or thorough comparison by any means, but interesting to note.</p>
<p>Speaking of AlphaFold, we’ve always wanted to know a bit more about AlphaFold confidence. When we look at the predicted structure of a protein, how do we know if we can trust it? What is an acceptable AlphaFold confidence? Let’s learn a bit more.</p>




<h2 id="alphafoid">What Is An Acceptable AlphaFold Confidence?
  <a href="https://www.kenkoonwong.com/blog/haddock/#alphafoid" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>**These are edited responses from back-and-forth Claude Sonnet 4.6 query, **</p>
<p>AlphaFold reports per-residue confidence as per-residue confidence as <strong>pLDDT</strong> (predicted local distance difference test), scored 0–100:</p>
<table>
<thead>
<tr>
<th>pLDDT</th>
<th>Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>> 90</td>
<td>High confidence — trust side-chain positions</td>
</tr>
<tr>
<td>70–90</td>
<td>Good — backbone reliable, some side-chain uncertainty</td>
</tr>
<tr>
<td>50–70</td>
<td>Low — treat as a rough scaffold only</td>
</tr>
<tr>
<td>< 50</td>
<td>Likely disordered or misfolded prediction</td>
</tr>
</tbody>
</table>
<p>It tells you how well-placed AlphaFold thinks each amino acid is relative to nearby residues. Crucially, it is not a measure of experimental validation — it is the model’s self-assessed confidence.</p>
<p>For docking, pLDDT is essentially a proxy for how much you can trust the binding pocket geometry. Active site residues need pLDDT ≥ 90 ideally, with ≥ 70 as a minimum. Second-shell residues within ~8 Å should also clear 70, and any low-confidence loops capping the binding site entrance are a red flag even if the catalytic residues themselves look fine. Meaning, might be a good idea to visualize with B factor in ChimeraX to ensure the binding sites are acceptable.</p>
<p>For MD, it seems to be a bit more forgiving. Regions scoring 70–90 will generally equilibrate fine — the force field redistributes strain and lets uncertain side chains settle. Regions in the 50–70 band need longer equilibration (100+ ns) and staged restraint release to avoid unphysical collapse (interesting area to explore). Below 50, MD often can’t rescue the geometry — these regions should either be truncated if non-essential, cross-checked against other databases, or explored with enhanced sampling methods. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /> OK what about PAE on their website?</p>
<p>PAE (predicted aligned error) is the second major confidence metric AlphaFold produces, and it tells you something fundamentally different from pLDDT. Where pLDDT is a per-residue score asking “how confident am I in this residue’s local geometry,” PAE is a pairwise score asking “how confident am I in the relative position and orientation of residue A with respect to residue B.” It’s an N×N matrix where every cell (i,j) contains the expected position error in Å for residue j when residue i is used as the alignment reference.</p>
<p>Why it matters? pLDDT can look great across an entire protein — every residue scores above 80 — but if the PAE between two domains is high, that confident-looking structure is misleading. The two domains are individually well-folded, but AlphaFold is telling you it has no idea how they pack against each other. For docking, check the PAE within the domain containing your binding site — you want a dark block there, confirming the domain’s internal geometry is reliable as a unit. For MD, high inter-domain PAE is a heads-up that you may need enhanced sampling to explore the conformational space between domains rather than assuming the AlphaFold pose is the dominant one.</p>
<blockquote>
<p>Two metrics before trusting AlphaFold. pLDDT is local — want ≥ 70 at active site, ≥ 90 for catalytic residues. PAE is pairwise — dark green means confident relative positioning between any two residues. Single-chain: check diagonal at binding site. Multi-chain: off-diagonal blocks tell you if the predicted interface is real. Check both before docking or MD.</p>
</blockquote>




<h2 id="strep">Do All Streptococcus Pyogenes Have CovR/S Two-component System?
  <a href="https://www.kenkoonwong.com/blog/haddock/#strep" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Interestingly, after downloaded 2840 Streptococcus pyogenes refseq annotation feature (gff3), I found <code>99.96% (2839/2840) of these contain CovR/S genes</code>. Why not 100%? Turns out to be this <code>GCF_005472355.1</code> that doesn’t have CovR/S listed in the annotation. I used Bakta to annotate it, and still no luck. Then used Baktfold to further annotate, couldn’t find it either. Used <code>tblastn</code> to look for CovR/S protein, the best return was 42% identity. Wow, does this isolate really have absent gene for those 2? <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<blockquote>
<p>Note: Expand below to see the utility of using –dehydrate and then rehydrate for downloading and annotating large number of genomes.</p>
</blockquote>
<details>
<summary>code in terminal</summary>
<pre>datasets download genome taxon &quot;Streptococcus pyogenes&quot; \
  --annotated \
  --assembly-source refseq \
  --include gff3 \
  --dehydrated \
  --filename strep_pyogenes_refseq_gff3.zip

datasets rehydrate --directory strep_pyogenes_refseq_gff3/ --max-workers 10
</pre></details>




<h2 id="otherstrep">Do Other Streptococcus species Have CovR/S Two-component System?
  <a href="https://www.kenkoonwong.com/blog/haddock/#otherstrep" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>We’ve seen what other streptococcus can do clinically, I wonder if they have similar system when compared to streptococcus pyogenes. Let’ download all reference gene of streptococcus genus and see if we can find it in their annotations.</p>
<details>
<summary>code</summary>
#### Terminal
<pre>datasets download genome taxon &quot;Streptococcus&quot; \
  --reference \
  --include gff3 \
  --assembly-source RefSeq \
  --dehydrated \
  --filename strep_dehydrated.zip

unzip strep_dehydrated.zip

datasets rehydrate --directory strep_dehydrated
</pre><p>Downloaded the above gff3, then used claude code to find covS and covR in their annotation and output to a csv, since these are reference genes, it should be quite reliable.</p>




<h4 id="r">R
  <a href="https://www.kenkoonwong.com/blog/haddock/#r" rel="nofollow" target="_blank"></a>
</h4>
<pre>df_cov &lt;- read_csv(&quot;covr_covs_annotation.csv&quot;)

df_cov_single &lt;- df_cov |&gt;
  filter(covR_present == &quot;yes&quot; & covS_present == &quot;no&quot;) |&gt; 
  pull(organism) 

df_cov |&gt; head(10)

## # A tibble: 10 × 8
##    accession  organism covR_present covS_present covR_gene_names covS_gene_names
##    &lt;chr&gt;      &lt;chr&gt;    &lt;chr&gt;        &lt;chr&gt;        &lt;chr&gt;           &lt;chr&gt;          
##  1 GCF_00002… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
##  2 GCF_00016… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
##  3 GCF_00018… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
##  4 GCF_00018… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
##  5 GCF_00018… Strepto… yes          no           covR            &lt;NA&gt;           
##  6 GCF_00018… Strepto… yes          no           covR            &lt;NA&gt;           
##  7 GCF_00022… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
##  8 GCF_00025… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
##  9 GCF_00037… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
## 10 GCF_00037… Strepto… no           no           &lt;NA&gt;            &lt;NA&gt;           
## # &#x2139; 2 more variables: covR_annotation &lt;chr&gt;, covS_annotation &lt;chr&gt;
</pre></details>
<p>Wow, amonng the streptococcus species (n=153) streptococcus pyogenes appears to be the ONLY species that contains both CovR/S two-component system?</p>
<p>Interestingly, there were 17 that have covR but not covS.</p>
<p>These are Streptococcus parauberis NCFD 2020, Streptococcus ictaluri 707-05, Streptococcus didelphis DSM 15616, Streptococcus castoreus DSM 17536, Streptococcus iniae, Streptococcus phocae, Streptococcus bovimastitidis, Streptococcus catagoni, Streptococcus equi subsp. zooepidemicus, Streptococcus dysgalactiae, Streptococcus halichoeri, Streptococcus penaeicida, Streptococcus hongkongensis, Streptococcus porcinus, Streptococcus uberis, Streptococcus canis, Streptococcus pseudoporcinus. How curious! <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f9d0.png" alt="🧐" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h2 id="covr">What Does CovR Look Like?
  <a href="https://www.kenkoonwong.com/blog/haddock/#covr" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Let’s take a look at 
<a href="https://alphafold.com/entry/AF-D3KVK6-F1" rel="nofollow" target="_blank">Alpha Fold Database</a>. Looking at 
<a href="https://pubmed.ncbi.nlm.nih.gov/16788170/" rel="nofollow" target="_blank">CovR active site</a>, mutation of D53A showed no dimerization of CoVR, which means that is a phosphylation site. Dimerization means that another phosphorylated CovR molecule is binding to another phosphorylated CovR, which is the active form of CovR. Which then binds to the DNA and represses the virulence genes.</p>
<p align="center">
  <img loading="lazy" src="https://i1.wp.com/www.kenkoonwong.com/blog/haddock/alphafold.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>The above shows both the PAE, plDDT and the predicted structure. The PAE is pretty good, the plDDT is also pretty good, with most of the residues above 70. The predicted structure also looks pretty reasonable, with the active site D53 highlighted in red.</p>




<h2 id="phos">What would a Phosphorylated CovR Look like?
  <a href="https://www.kenkoonwong.com/blog/haddock/#phos" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Apparently for a 
<a href="https://pubmed.ncbi.nlm.nih.gov/28289082/" rel="nofollow" target="_blank">CovR with D53E mutation</a> (so it mimics the phosphorylated state), the structure is quite different from the wild type. The D53E mutation causes a conformational change that allows CovR to dimerize and bind to DNA, even in the absence of phosphorylation. Let’s visualize a D53E CovR predicted by AF3.</p>
<p align="center">
  <img loading="lazy" src="https://i0.wp.com/www.kenkoonwong.com/blog/haddock/d53e.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p>We can see that the highlighted portion is glutamine which is E. I’m not sure if I can tell the difference between wild-type and D53E. Let’s plug it into R and see if we can see differences in RMSD</p>
<pre>library(bio3d)

covr_wt &lt;- read.pdb(&quot;AF-D3KVK6-F1-model_v6.pdb&quot;)
covr_d53e &lt;- read.pdb(&quot;covr_d53e_chainA.pdb&quot;)
covr_wt_idx &lt;- atom.select(covr_wt, &quot;calpha&quot;)
covr_d53e_idx &lt;- atom.select(covr_d53e, &quot;calpha&quot;)
fit &lt;- fit.xyz(fixed = covr_wt$xyz, 
               mobile = covr_d53e$xyz, 
               fixed.inds  = covr_wt_idx$xyz,
                mobile.inds = covr_d53e_idx$xyz)

n_res &lt;- length(covr_wt_idx$atom)
n_res_seq &lt;- seq(1,n_res*3,3)

rmsd_vec &lt;- vector(mode = &quot;numeric&quot;, length=length(covr_wt_idx$atom))

for (i in 1:n_res) {
  idx &lt;- c(n_res_seq[i]:(n_res_seq[i]+2))
  rmsd_vec[i] &lt;- rmsd(covr_wt$xyz[covr_wt_idx$xyz][idx], fit[covr_d53e_idx$xyz][idx])
}

df &lt;- tibble(
  residue = covr_wt$atom$resno[covr_wt_idx$atom],
  aa = covr_wt$seqres,
  rms = rmsd_vec
)

df |&gt;
  ggplot(aes(x=residue,y=rms)) +
  geom_line() +
  geom_label(aes(label=aa),size=2) +
  theme_bw()
</pre><img src="https://i2.wp.com/www.kenkoonwong.com/blog/haddock/index_files/figure-html/unnamed-chunk-11-1.png?w=450&#038;ssl=1" alt="" data-recalc-dims="1" />
<p>Alright, what this tells us is that certain residues actually didn’t vary much, but if we start seeing higher rmsd such as residue 180s, those are really high rmsd. Later on after hoddock, we’ll see if we can correlate these high RMSD residues to residues in protein-protein complex that interacted.</p>
<p>Next we’ll take a look at using 
<a href="https://github.com/haddocking/haddock3" rel="nofollow" target="_blank">Haddock3</a> for the first time and we see what we get. Our assessment is to evaluate the haddock score and body surface area (BSA) to see if a dimerization of 2 of the same covr (D53 vs D53E) would be more favorable in comparison. Since we’ve never done this before, we had Claude Code set up for us. This is mainly for my notes purposes so in the future when I review back I can reference and modify accordingly. The steps are:</p>
<ol>
<li>Prepare the input files: Duplicate the CovR pdb but change the chain to A and B.</li>
<li>Set up the haddock input files: Create a text file of active and passive residue, then use <code>haddodck3-restratins</code> to genereate <code>tbl</code> file. The text file will look something like this</li>
</ol>
<p>we can create the <code>actpass_A.txt</code> file with the following content:</p>




<h4 id="create-actpass_atxt">create actpass_A.txt
  <a href="https://www.kenkoonwong.com/blog/haddock/#create-actpass_atxt" rel="nofollow" target="_blank"></a>
</h4>
<pre>87 88 89 90 91 98 99 100 101 106 107 108 109 110 111 112 113 114 115 116 117 118 120 121
81 82 83 84 85 86 93 94 102 103 104 105 119 122 123 124 125 126
</pre><p>With the above of alpha4-beta5-alpha5 the first row are active residues - solve-exposed residues; the 2nd row is passive residues - surface-exposed neighbors surrounding the active patch. Then we use haddock-restraint</p>




<h4 id="generate-tbl-file">Generate tbl file
  <a href="https://www.kenkoonwong.com/blog/haddock/#generate-tbl-file" rel="nofollow" target="_blank"></a>
</h4>
<pre>conda run --name haddock3 haddock3-restraints active_passive_to_ambig \
      actpass_A.txt actpass_A.txt \
      --segid-one A --segid-two B \
      &gt; ambig_covr_dimer.tbl
</pre><p>This basically tells haddock3 that residue 87 in chain A should interact with residue 87, 88, etc. in chain B with a distance of 2.0 Å and a weight of 1.0. which looks something like this</p>
<pre>assign (resi 87 and segid A)
  (
         (resi 87 and segid B)
          or
         (resi 88 and segid B)
          or
         ...
  ) 2.0 2.0 0.0
</pre><ol start="3">
<li>Create a <code>covr_dimer_workflow.toml</code> file to specify the parameters for the docking run.</li>
</ol>




<h4 id="create-covr_dimer_workflowtoml">Create covr_dimer_workflow.toml
  <a href="https://www.kenkoonwong.com/blog/haddock/#create-covr_dimer_workflowtoml" rel="nofollow" target="_blank"></a>
</h4>
<pre>  run_dir = &quot;run_covr_dimer&quot;
  mode = &quot;local&quot;
  ncores = 4
  postprocess = true

  molecules = [
      &quot;covr_d53e_chainA.pdb&quot;,
      &quot;covr_d53e_chainB.pdb&quot;,
  ]

  [topoaa]
  autotoppar = false
  delenph = true

  [rigidbody]
  ambig_fname = &quot;ambig_covr_dimer.tbl&quot;
  sampling = 200
  sym_on = true
  nc2sym = 1
  c2sym_sta1_1 = 1
  c2sym_end1_1 = 228
  c2sym_seg1_1 = &quot;A&quot;
  c2sym_sta2_1 = 1
  c2sym_end2_1 = 228
  c2sym_seg2_1 = &quot;B&quot;

  [seletop]
  select = 100
  
  [flexref]
  ambig_fname = &quot;ambig_covr_dimer.tbl&quot;
  sym_on = true
  nc2sym = 1
  c2sym_sta1_1 = 1
  c2sym_end1_1 = 228
  c2sym_seg1_1 = &quot;A&quot;
  c2sym_sta2_1 = 1
  c2sym_end2_1 = 228
  c2sym_seg2_1 = &quot;B&quot;

  [emref]
  ambig_fname = &quot;ambig_covr_dimer.tbl&quot;
  sym_on = true
  nc2sym = 1
  c2sym_sta1_1 = 1
  c2sym_end1_1 = 228
  c2sym_seg1_1 = &quot;A&quot;
  c2sym_sta2_1 = 1
  c2sym_end2_1 = 228
  c2sym_seg2_1 = &quot;B&quot;

  [clustfcc]
  plot_matrix = true

  [seletopclusts]
  top_models = 4

  [emscoring]
</pre><p>The above you would have to change some of the settings and param including your <code>tbl</code>, <code>pdb</code> files. And change your ncore accordingly</p>
<ol start="3">
<li>Run Haddock: Use the command line to run Haddock with your input files. For example:</li>
</ol>
<pre>conda install - c bioconda -n haddock3 haddock3
conda activate haddock3
haddock3 -i it1.pdb -r it1.tbl -o haddock_output
</pre><p>What does 
<a href="https://github.com/haddocking/haddock3" rel="nofollow" target="_blank">Haddock3</a> do? Haddock3 is a flexible docking software that allows you to model the interaction between two or more biomolecules (like proteins) based on experimental data or predicted interactions. It uses a combination of rigid-body docking, semi-flexible refinement, and scoring to predict the most likely binding modes between the molecules. The workflow of Haddock3 goes like this rigid pose -> flexible refinement -> scoring -> clustering.</p>
<ol start="4">
<li>Look at the analysis results</li>
</ol>
<pre>result_d53 &lt;- read_tsv(&quot;capri_ss_d53.tsv&quot;) |&gt; arrange(caprieval_rank) |&gt; mutate(prot=&quot;d53&quot;) |&gt; slice_head(n=5)
result_d53e &lt;- read_tsv(&quot;capri_ss_d53e.tsv&quot;) |&gt; arrange(caprieval_rank) |&gt; mutate(prot=&quot;d53e&quot;) |&gt; slice_head(n=5)
compare_result &lt;- rbind(result_d53,result_d53e) |&gt;
  select(score,bsa,total,prot) |&gt;
  pivot_longer(cols = c(score:total), names_to = &quot;param&quot;, values_to = &quot;values&quot;) 

compare_result |&gt;
  ggplot(aes(x=prot,y=values,fill=prot)) +
  geom_boxplot(alpha=0.2,width=0.3) +
  geom_violin(alpha=0.5) +
  facet_wrap(.~param, scale=&quot;free_y&quot;) +
  theme_bw()
</pre><img src="https://i2.wp.com/www.kenkoonwong.com/blog/haddock/index_files/figure-html/unnamed-chunk-17-1.png?w=450&#038;ssl=1" alt="" data-recalc-dims="1" />
<p>Among the top 5 ranked docking models, the phosphomimetic D53E mutant demonstrated consistently superior HADDOCK3 scores (median −138.1 vs −117.4 kcal/mol) and substantially larger buried surface area (median 2,567 vs 2,161 Ų, +18.8%), supporting enhanced dimerization propensity. Energy decomposition revealed that D53E gains its advantage primarily through van der Waals interactions (median −86.3 vs −32.7 kcal/mol, ~2.6×), despite weaker electrostatic contributions (median −335.7 vs −514.1 kcal/mol). Not really sure what this means. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /> But let’s visualize our rank 1 pdb!</p>
<p align="center">
  <img loading="lazy" src="https://i1.wp.com/www.kenkoonwong.com/blog/haddock/dimer.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<blockquote>
<p>Note to self: BSA is calculated by taking the sum of the solvent-accessible surface areas (SASA) of the two individual proteins and subtracting the SASA of the complex. A larger BSA indicates a more extensive interface between the two proteins, which often correlates with stronger binding affinity. In this case, the D53E mutant’s larger BSA suggests it forms a more stable dimer compared to the wild-type D53, consistent with its role as a phosphomimetic that promotes dimerization and activation of CovR.</p>
</blockquote>
<p>Let’s see which residues of these 2 chains interact!</p>
<pre>haddock_d53e_d53e &lt;- read.pdb(&quot;emscoring_1.pdb&quot;)

chainA &lt;- atom.select(haddock_d53e_d53e, chain = &quot;A&quot;)
chainB &lt;- atom.select(haddock_d53e_d53e, chain = &quot;B&quot;)
dist_matrix &lt;- dist.xyz(chainA$xyz, chainB$xyz)
  
get_interface &lt;- function(pdb, cutoff = 5.0) {
  
  chainA &lt;- atom.select(pdb, chain = &quot;A&quot;)
  chainB &lt;- atom.select(pdb, chain = &quot;B&quot;)
  
  # Get coordinates as matrices
  coordA &lt;- matrix(pdb$xyz[chainA$xyz], ncol = 3, byrow = TRUE)
  coordB &lt;- matrix(pdb$xyz[chainB$xyz], ncol = 3, byrow = TRUE)
  
  # Find contacting atom indices
  contact_i &lt;- c()
  contact_j &lt;- c()
  
  for (i in 1:nrow(coordA)) {
    dists &lt;- sqrt(rowSums(sweep(coordB, 2, coordA[i,])^2))
    hits &lt;- which(dists &lt; cutoff)
    if (length(hits) &gt; 0) {
      contact_i &lt;- c(contact_i, rep(i, length(hits)))
      contact_j &lt;- c(contact_j, hits)
    }
  }
  
  # Map back to residues
  resA &lt;- pdb$atom[chainA$atom[unique(contact_i)], ] |&gt;
    as_tibble() |&gt;
    distinct(resno, resid) |&gt;
    mutate(chain = &quot;A&quot;)
  
  resB &lt;- pdb$atom[chainB$atom[unique(contact_j)], ] |&gt;
    as_tibble() |&gt;
    distinct(resno, resid) |&gt;
    mutate(chain = &quot;B&quot;)
  
  bind_rows(resA, resB)
}

cont &lt;- get_interface(haddock_d53e_d53e) |&gt;
  filter(chain == &quot;A&quot;) |&gt; 
  select(residue=resno, aa=resid) |&gt;
  mutate(contact = 1)

df |&gt;
  left_join(cont, by = c(&quot;residue&quot;,&quot;aa&quot;)) |&gt;
  mutate(contact = case_when(
    is.na(contact) ~ 0,
    TRUE ~ contact
  )) |&gt;
  ggplot(aes(x=residue,y=rms)) +
  geom_line() +
  ggrepel::geom_label_repel(aes(label=aa,fill=as.factor(contact)),alpha=0.5,size=2) +
  theme_bw() +
  theme(legend.position = &quot;none&quot;)
</pre><img src="https://i2.wp.com/www.kenkoonwong.com/blog/haddock/index_files/figure-html/unnamed-chunk-18-1.png?w=450&#038;ssl=1" alt="" data-recalc-dims="1" />
<p>I couldn’t easily find a function in bio3d to look for closer contact between the 2 chains, hence got Claude to produce a code to filter out anything less 5 Angstrom and map it back to our RMSD df. Wow, interesting! Not all high rmsd are close contact residues, and not all close contact residues are high in rmsd. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h2 id="final-thoughts">Final Thoughts
  <a href="https://www.kenkoonwong.com/blog/haddock/#final-thoughts" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Wow, all of the above took a long time! But it was quite interesting to learn several things here. We learnt quite a few things here. It’s our first time running a protein-protein docking! <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f64c.png" alt="🙌" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Even though I don’t deeply understand the method and the results, it’s a good start! Let’s keep at it and learn some more next time! If you notice something wrong here, please feel free to let me know!</p>




<h2 id="opportunities">Opportunities For Improvement
  <a href="https://www.kenkoonwong.com/blog/haddock/#opportunities" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<ul>
<li>Need to explore Prots5 and Foldseek in the future, the concept is quite interesting. Turning 3D coordinates into one dimention, very latent spacey.</li>
<li>Need to understand haddock3 a bit more, the method and its output</li>
<li>Need to understand how alpha helices and beta sheets occur, the math behind it and see if we can reproduce that from scratch</li>
<li>Need to figure out how to reproduce a phosphorylated CovR structure instead of using a phosphomimetic</li>
<li>Need to learn pymol</li>
</ul>




<h2 id="lessons">Lessons learnt
  <a href="https://www.kenkoonwong.com/blog/haddock/#lessons" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<ul>
<li>learnt CovR/S two-component system in streptococcus pyogenes</li>
<li>learnt what hypotheticals are, and what other methods we can use to further identify these hypotheticals</li>
<li>learnt strep pyogenes is the only strep species (reference gene only) that has CovR/S, some other strep species have CovR but no CovS.</li>
<li>learnt Baktfold</li>
<li>learnt the bare basics of haddock3</li>
<li>learnt RMSD, BSA, haddock score, angstrom unit (just eucleadian distance of xyz)</li>
</ul>
<p>If you like this article:</p>
<ul>
<li>please feel free to send me a 
<a href="https://www.kenkoonwong.com/blog/" rel="nofollow" target="_blank">comment or visit my other blogs</a></li>
<li>please feel free to follow me on 
<a href="https://bsky.app/profile/kenkoonwong.bsky.social" rel="nofollow" target="_blank">BlueSky</a>, 
<a href="https://twitter.com/kenkoonwong/" rel="nofollow" target="_blank">twitter</a>, 
<a href="https://github.com/kenkoonwong/" rel="nofollow" target="_blank">GitHub</a> or 
<a href="https://rstats.me/@kenkoonwong" rel="nofollow" target="_blank">Mastodon</a></li>
<li>if you would like collaborate please feel free to 
<a href="https://www.kenkoonwong.com/contact/" rel="nofollow" target="_blank">contact me</a></li>
</ul>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.kenkoonwong.com/blog/haddock/"> r on Everyday Is A School Day</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/exploring-the-covr-s-two-component-system-in-streptococcus-pyogenes/">Exploring the CovR/S Two-Component System in Streptococcus pyogenes</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401268</post-id>	</item>
		<item>
		<title>Probabilistic Time Series Cross-Validation with R package crossvalidation</title>
		<link>https://www.r-bloggers.com/2026/05/probabilistic-time-series-cross-validation-with-r-package-crossvalidation/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/05/16/r/crossvalidation</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Examples of use of R package crossvalidation for Probabilistic Time Series Cross-Validation (measuring coverage and Winkler score)</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/probabilistic-time-series-cross-validation-with-r-package-crossvalidation/">Probabilistic Time Series Cross-Validation with R package crossvalidation</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/16/r/crossvalidation"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>A previous post introduced the <code>crossvalidation</code> package for R. This time, the focus is on probabilistic forecasting — evaluating not just how accurate point forecasts are, but how well-calibrated prediction intervals are, using empirical coverage rates and Winkler scores – and <code>crossvalidation</code>.</p>

<pre>install.packages(&quot;remotes&quot;)

install.packages(&quot;forecast&quot;)

remotes::install_github(&quot;Techtonique/crossvalidation&quot;)

library(crossvalidation)
</pre>

<h1 id="example-1">Example 1</h1>

<pre>require(forecast)
data(&quot;AirPassengers&quot;)



eval_metric &lt;- function(predicted, observed)
{
  error &lt;- observed - predicted$mean

  me &lt;- mean(error)
  rmse &lt;- sqrt(mean(error^2))
  mae &lt;- mean(abs(error))

  # ----- 80% interval -----

  lower80 &lt;- predicted$lower[, 1]
  upper80 &lt;- predicted$upper[, 1]

  coverage80 &lt;- mean(
    observed &gt;= lower80 & observed &lt;= upper80
  )

  alpha80 &lt;- 0.20

  winkler80 &lt;- ifelse(
    observed &lt; lower80,
    (upper80 - lower80) + (2 / alpha80) * (lower80 - observed),
    ifelse(
      observed &gt; upper80,
      (upper80 - lower80) + (2 / alpha80) * (observed - upper80),
      (upper80 - lower80)
    )
  )

  # ----- 95% interval -----

  lower95 &lt;- predicted$lower[, 2]
  upper95 &lt;- predicted$upper[, 2]

  coverage95 &lt;- mean(
    observed &gt;= lower95 & observed &lt;= upper95
  )

  alpha95 &lt;- 0.05

  winkler95 &lt;- ifelse(
    observed &lt; lower95,
    (upper95 - lower95) + (2 / alpha95) * (lower95 - observed),
    ifelse(
      observed &gt; upper95,
      (upper95 - lower95) + (2 / alpha95) * (observed - upper95),
      (upper95 - lower95)
    )
  )

  c(
    ME = me,
    RMSE = rmse,
    MAE = mae,
    Coverage80 = coverage80,
    Winkler80 = mean(winkler80),
    Coverage95 = coverage95,
    Winkler95 = mean(winkler95)
  )
}

(res &lt;- crossval_ts(y=AirPassengers, initial_window = 10,
horizon = 3, fcast_func = forecast::thetaf, eval_metric = eval_metric))
print(colMeans(res))


Loading required package: forecast



  |======================================================================| 100%
</pre>

<table class="dataframe">
<caption>A matrix: 132 × 7 of type dbl</caption>
<thead>
	<tr><th></th><th scope="col">ME</th><th scope="col">RMSE</th><th scope="col">MAE</th><th scope="col">Coverage80</th><th scope="col">Winkler80</th><th scope="col">Coverage95</th><th scope="col">Winkler95</th></tr>
</thead>
<tbody>
	<tr><th scope="row">result.1</th><td>-28.794660</td><td>29.300287</td><td>28.794660</td><td>0.0000000</td><td>153.10992</td><td>0.3333333</td><td>207.58384</td></tr>
	<tr><th scope="row">result.2</th><td> 16.198526</td><td>16.894302</td><td>16.198526</td><td>1.0000000</td><td> 45.01795</td><td>1.0000000</td><td> 68.84902</td></tr>
	<tr><th scope="row">result.3</th><td> 11.201494</td><td>15.993359</td><td>12.578276</td><td>1.0000000</td><td> 45.05996</td><td>1.0000000</td><td> 68.91326</td></tr>
	<tr><th scope="row">result.4</th><td> 21.430125</td><td>22.483895</td><td>21.430125</td><td>0.6666667</td><td> 63.01207</td><td>1.0000000</td><td> 68.84778</td></tr>
	<tr><th scope="row">result.5</th><td> 10.055765</td><td>11.527746</td><td>10.055765</td><td>1.0000000</td><td> 45.99967</td><td>1.0000000</td><td> 70.35043</td></tr>
	<tr><th scope="row">result.6</th><td> -2.640822</td><td>10.676714</td><td> 9.999466</td><td>1.0000000</td><td> 46.56907</td><td>1.0000000</td><td> 71.22125</td></tr>
	<tr><th scope="row">result.7</th><td> 14.296434</td><td>23.709132</td><td>20.531135</td><td>0.6666667</td><td> 75.04186</td><td>1.0000000</td><td> 67.58381</td></tr>
	<tr><th scope="row">result.8</th><td> 38.247497</td><td>39.529998</td><td>38.247497</td><td>0.0000000</td><td>198.74990</td><td>0.3333333</td><td>212.44029</td></tr>
	<tr><th scope="row">result.9</th><td> 23.043159</td><td>23.947630</td><td>23.043159</td><td>0.3333333</td><td> 93.83463</td><td>1.0000000</td><td> 64.19366</td></tr>
	<tr><th scope="row">result.10</th><td>-21.689067</td><td>27.907560</td><td>21.689067</td><td>0.6666667</td><td> 90.23377</td><td>1.0000000</td><td> 84.12361</td></tr>
	<tr><th scope="row">result.11</th><td>-41.782157</td><td>46.664199</td><td>41.782157</td><td>0.3333333</td><td>222.06310</td><td>0.3333333</td><td>345.16553</td></tr>
	<tr><th scope="row">result.12</th><td>-34.934831</td><td>36.512081</td><td>34.934831</td><td>0.3333333</td><td>162.38092</td><td>0.6666667</td><td>212.58117</td></tr>
	<tr><th scope="row">result.13</th><td> -4.002700</td><td>12.728771</td><td> 9.999100</td><td>1.0000000</td><td> 59.64475</td><td>1.0000000</td><td> 91.21878</td></tr>
	<tr><th scope="row">result.14</th><td> 30.349582</td><td>30.588761</td><td>30.349582</td><td>0.6666667</td><td> 72.14355</td><td>1.0000000</td><td> 99.76932</td></tr>
	<tr><th scope="row">result.15</th><td> 21.192349</td><td>25.806712</td><td>21.192349</td><td>0.6666667</td><td> 71.39094</td><td>1.0000000</td><td>101.02401</td></tr>
	<tr><th scope="row">result.16</th><td> 23.193143</td><td>25.914875</td><td>23.193143</td><td>0.6666667</td><td> 91.70660</td><td>1.0000000</td><td> 76.57925</td></tr>
	<tr><th scope="row">result.17</th><td> 30.081542</td><td>30.679960</td><td>30.081542</td><td>0.3333333</td><td>111.58689</td><td>1.0000000</td><td> 75.78459</td></tr>
	<tr><th scope="row">result.18</th><td> -6.530509</td><td> 9.111376</td><td> 6.999059</td><td>1.0000000</td><td> 69.51704</td><td>1.0000000</td><td>106.31714</td></tr>
	<tr><th scope="row">result.19</th><td> 19.907586</td><td>23.010762</td><td>19.907586</td><td>1.0000000</td><td> 67.03506</td><td>1.0000000</td><td>102.52128</td></tr>
	<tr><th scope="row">result.20</th><td> 17.631089</td><td>19.829355</td><td>17.631089</td><td>1.0000000</td><td> 67.97573</td><td>1.0000000</td><td>103.95991</td></tr>
	<tr><th scope="row">result.21</th><td> 11.738022</td><td>14.718185</td><td>12.229846</td><td>1.0000000</td><td> 61.61617</td><td>1.0000000</td><td> 94.23380</td></tr>
	<tr><th scope="row">result.22</th><td>-21.787490</td><td>28.489509</td><td>21.787490</td><td>0.6666667</td><td> 93.30920</td><td>1.0000000</td><td> 88.70090</td></tr>
	<tr><th scope="row">result.23</th><td>-43.557571</td><td>47.527244</td><td>43.557571</td><td>0.3333333</td><td>206.77368</td><td>0.6666667</td><td>216.50078</td></tr>
	<tr><th scope="row">result.24</th><td>-34.473558</td><td>35.514155</td><td>34.473558</td><td>0.3333333</td><td>146.63288</td><td>0.6666667</td><td>173.17046</td></tr>
	<tr><th scope="row">result.25</th><td> -4.699360</td><td>10.498595</td><td> 7.201224</td><td>1.0000000</td><td> 60.07550</td><td>1.0000000</td><td> 91.87755</td></tr>
	<tr><th scope="row">result.26</th><td> 25.974138</td><td>26.581272</td><td>25.974138</td><td>1.0000000</td><td> 63.01942</td><td>1.0000000</td><td> 96.37989</td></tr>
	<tr><th scope="row">result.27</th><td> 16.905109</td><td>19.474600</td><td>16.905109</td><td>1.0000000</td><td> 58.04472</td><td>1.0000000</td><td> 88.77173</td></tr>
	<tr><th scope="row">result.28</th><td> 15.218760</td><td>16.352917</td><td>15.218760</td><td>1.0000000</td><td> 55.27721</td><td>1.0000000</td><td> 84.53920</td></tr>
	<tr><th scope="row">result.29</th><td>  7.625241</td><td> 8.933828</td><td> 7.625241</td><td>1.0000000</td><td> 55.27718</td><td>1.0000000</td><td> 84.53916</td></tr>
	<tr><th scope="row">result.30</th><td>  2.261970</td><td>17.595326</td><td>15.666212</td><td>1.0000000</td><td> 57.13292</td><td>1.0000000</td><td> 87.37725</td></tr>
	<tr><th scope="row">⋮</th><td>⋮</td><td>⋮</td><td>⋮</td><td>⋮</td><td>⋮</td><td>⋮</td><td>⋮</td></tr>
	<tr><th scope="row">result.103</th><td>  95.047754</td><td>111.26440</td><td> 95.04775</td><td>0.3333333</td><td> 485.7096</td><td>0.6666667</td><td> 594.3549</td></tr>
	<tr><th scope="row">result.104</th><td> 121.335201</td><td>125.76554</td><td>121.33520</td><td>0.0000000</td><td> 646.5750</td><td>0.3333333</td><td> 772.4818</td></tr>
	<tr><th scope="row">result.105</th><td>  27.661546</td><td> 53.66952</td><td> 52.33567</td><td>0.6666667</td><td> 149.4669</td><td>1.0000000</td><td> 226.7499</td></tr>
	<tr><th scope="row">result.106</th><td> -82.928463</td><td>106.53675</td><td> 87.39838</td><td>0.3333333</td><td> 439.0476</td><td>0.6666667</td><td> 391.0034</td></tr>
	<tr><th scope="row">result.107</th><td>-168.429957</td><td>174.86402</td><td>168.42996</td><td>0.0000000</td><td>1125.8534</td><td>0.0000000</td><td>2680.3671</td></tr>
	<tr><th scope="row">result.108</th><td> -86.047368</td><td> 89.34969</td><td> 86.04737</td><td>0.6666667</td><td> 241.5086</td><td>1.0000000</td><td> 281.3325</td></tr>
	<tr><th scope="row">result.109</th><td> -35.392983</td><td> 38.64620</td><td> 35.39298</td><td>1.0000000</td><td> 192.3314</td><td>1.0000000</td><td> 294.1455</td></tr>
	<tr><th scope="row">result.110</th><td>  32.273683</td><td> 33.69167</td><td> 32.27368</td><td>1.0000000</td><td> 199.9978</td><td>1.0000000</td><td> 305.8702</td></tr>
	<tr><th scope="row">result.111</th><td>  35.911969</td><td> 45.52857</td><td> 35.91197</td><td>1.0000000</td><td> 195.2069</td><td>1.0000000</td><td> 298.5432</td></tr>
	<tr><th scope="row">result.112</th><td>  28.584481</td><td> 41.79144</td><td> 38.16654</td><td>1.0000000</td><td> 196.5409</td><td>1.0000000</td><td> 300.5833</td></tr>
	<tr><th scope="row">result.113</th><td>  78.144295</td><td> 79.31310</td><td> 78.14430</td><td>1.0000000</td><td> 196.9343</td><td>1.0000000</td><td> 301.1850</td></tr>
	<tr><th scope="row">result.114</th><td>  37.152546</td><td> 52.61404</td><td> 39.21044</td><td>1.0000000</td><td> 192.5487</td><td>1.0000000</td><td> 294.4778</td></tr>
	<tr><th scope="row">result.115</th><td>  95.078342</td><td>110.88602</td><td> 95.07834</td><td>0.6666667</td><td> 366.3676</td><td>1.0000000</td><td> 274.9151</td></tr>
	<tr><th scope="row">result.116</th><td> 109.166178</td><td>116.17612</td><td>109.16618</td><td>0.3333333</td><td> 406.7397</td><td>1.0000000</td><td> 277.4405</td></tr>
	<tr><th scope="row">result.117</th><td>  41.289554</td><td> 62.02085</td><td> 57.33490</td><td>0.3333333</td><td> 215.1577</td><td>1.0000000</td><td> 222.4127</td></tr>
	<tr><th scope="row">result.118</th><td> -92.399494</td><td>116.61777</td><td> 92.82407</td><td>0.3333333</td><td> 466.7285</td><td>0.6666667</td><td> 445.3571</td></tr>
	<tr><th scope="row">result.119</th><td>-175.618445</td><td>183.27955</td><td>175.61845</td><td>0.0000000</td><td>1143.5479</td><td>0.0000000</td><td>2574.2409</td></tr>
	<tr><th scope="row">result.120</th><td> -94.580461</td><td> 97.36039</td><td> 94.58046</td><td>0.6666667</td><td> 277.7847</td><td>1.0000000</td><td> 293.2590</td></tr>
	<tr><th scope="row">result.121</th><td> -27.751828</td><td> 32.93559</td><td> 27.75183</td><td>1.0000000</td><td> 202.1374</td><td>1.0000000</td><td> 309.1425</td></tr>
	<tr><th scope="row">result.122</th><td>  36.177008</td><td> 38.16646</td><td> 36.17701</td><td>1.0000000</td><td> 208.6352</td><td>1.0000000</td><td> 319.0800</td></tr>
	<tr><th scope="row">result.123</th><td>   5.992278</td><td> 14.16185</td><td> 13.99743</td><td>1.0000000</td><td> 200.0098</td><td>1.0000000</td><td> 305.8885</td></tr>
	<tr><th scope="row">result.124</th><td>  12.637863</td><td> 33.65269</td><td> 27.98030</td><td>1.0000000</td><td> 200.1828</td><td>1.0000000</td><td> 306.1532</td></tr>
	<tr><th scope="row">result.125</th><td>  71.834372</td><td> 76.95073</td><td> 71.83437</td><td>1.0000000</td><td> 200.5753</td><td>1.0000000</td><td> 306.7534</td></tr>
	<tr><th scope="row">result.126</th><td>  85.518711</td><td> 93.75094</td><td> 85.51871</td><td>0.6666667</td><td> 252.5638</td><td>1.0000000</td><td> 295.0496</td></tr>
	<tr><th scope="row">result.127</th><td>  94.429064</td><td>115.52397</td><td> 94.42906</td><td>0.6666667</td><td> 407.3636</td><td>0.6666667</td><td> 417.2566</td></tr>
	<tr><th scope="row">result.128</th><td> 173.325805</td><td>177.66652</td><td>173.32580</td><td>0.0000000</td><td>1129.6141</td><td>0.0000000</td><td>2547.8618</td></tr>
	<tr><th scope="row">result.129</th><td>  33.890665</td><td> 63.84191</td><td> 61.66861</td><td>0.6666667</td><td> 242.6901</td><td>1.0000000</td><td> 230.3885</td></tr>
	<tr><th scope="row">result.130</th><td>-119.059067</td><td>137.73685</td><td>119.05907</td><td>0.3333333</td><td> 619.4166</td><td>0.3333333</td><td> 668.9786</td></tr>
	<tr><th scope="row">result.131</th><td>-180.821172</td><td>190.45241</td><td>180.82117</td><td>0.0000000</td><td>1152.4949</td><td>0.0000000</td><td>2469.3936</td></tr>
	<tr><th scope="row">result.132</th><td>-103.156396</td><td>108.61881</td><td>103.15640</td><td>0.6666667</td><td> 330.0400</td><td>1.0000000</td><td> 302.1675</td></tr>
</tbody>
</table>

<pre>         ME        RMSE         MAE  Coverage80   Winkler80  Coverage95 
  2.6570822  51.4271704  46.5118747   0.6590909 218.4527816   0.8459596 
  Winkler95 
312.1383104 
</pre>

<h1 id="example-2">Example 2</h1>

<pre>eval_metric &lt;- function(predicted, observed)
{
  error &lt;- observed - predicted$mean

  me &lt;- mean(error)
  rmse &lt;- sqrt(mean(error^2))
  mae &lt;- mean(abs(error))

  # Only one interval returned
  lower &lt;- predicted$lower
  upper &lt;- predicted$upper

  coverage &lt;- mean(
    observed &gt;= lower & observed &lt;= upper
  )

  alpha &lt;- 0.05

  winkler &lt;- ifelse(
    observed &lt; lower,
    (upper - lower) + (2 / alpha) * (lower - observed),
    ifelse(
      observed &gt; upper,
      (upper - lower) + (2 / alpha) * (observed - upper),
      (upper - lower)
    )
  )

  c(
    ME = me,
    RMSE = rmse,
    MAE = mae,
    Coverage95 = coverage,
    Winkler95 = mean(winkler)
  )
}

fcast_func &lt;- function(y, h, ...)
{
  forecast::thetaf(
    y,
    h = h,
    level = 95
  )
}

res &lt;- crossval_ts(
  y = AirPassengers,
  initial_window = 10,
  horizon = 3,
  fcast_func = fcast_func,
  eval_metric = eval_metric
)

print(colMeans(res))

  |======================================================================| 100%
         ME        RMSE         MAE  Coverage95   Winkler95 
  2.6570822  51.4271704  46.5118747   0.8459596 312.1383104 

boxplot(res[, &quot;Coverage95&quot;])
</pre>

<p><img src="https://i1.wp.com/thierrymoudiki.github.io/images/2026-05-16/2026-05-16-crossvalidation_10_0.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/16/r/crossvalidation"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/probabilistic-time-series-cross-validation-with-r-package-crossvalidation/">Probabilistic Time Series Cross-Validation with R package crossvalidation</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401253</post-id>	</item>
		<item>
		<title>muttest 0.2.0: More Mutators, Better Reporting, and Parallel Execution</title>
		<link>https://www.r-bloggers.com/2026/05/muttest-0-2-0-more-mutators-better-reporting-and-parallel-execution/</link>
		
		<dc:creator><![CDATA[jakub::sobolewski]]></dc:creator>
		<pubDate>Fri, 15 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://jakubsobolewski.com/blog/muttest-0_2_0</guid>

					<description><![CDATA[<p>Expanded mutator library, improved reporting, and parallel execution for mutation testing in R.</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/muttest-0-2-0-more-mutators-better-reporting-and-parallel-execution/">muttest 0.2.0: More Mutators, Better Reporting, and Parallel Execution</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://jakubsobolewski.com/blog/muttest-0_2_0"> jakub::sobolewski</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Your tests pass. Coverage is high. Everything looks fine — until someone finds a bug in production that your didn’t catch – all because of a poor assertion.</p>
<p>Code coverage tells you which lines ran. It says nothing about whether those lines are actually tested. You can delete every assertion in your test suite, run <code>covr</code>, and still see 100%. Coverage is a measure of execution, not correctness. That gap is exactly what <a href="https://github.com/jakubsob/muttest" rel="nofollow" target="_blank"><code>{muttest}</code></a> was built to close — and 0.2.0 makes it much more capable than the previous version.</p>
<blockquote>
<p><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f4dd.png" alt="📝" class="wp-smiley" style="height: 1em; max-height: 1em;" /> See the full changelog <a href="https://github.com/jakubsob/muttest/blob/main/NEWS.md" rel="nofollow" target="_blank">here</a>.</p>
</blockquote>
<h2 id="what-is-mutation-testing">What Is Mutation Testing?</h2>
<p>Mutation testing asks a harder question than coverage: <em>if this code were subtly wrong, would your tests notice?</em></p>
<p>It works by making small, deliberate changes to your source code — swapping <code>&gt;</code> for <code>&gt;=</code>, flipping <code>TRUE</code> to <code>FALSE</code>, replacing <code>&&</code> with <code>||</code> — and then running your test suite against each modified version. Each modified version is called a <strong>mutant</strong>. If your tests fail, the mutant is <strong>killed</strong>: your tests noticed the change. If your tests pass, the mutant <strong>survived</strong>: your tests are blind to that kind of bug.</p>
<p>The result is a <strong>mutation score</strong>:</p>
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>Mutation Score</mtext><mo>=</mo><mfrac><mtext>Killed Mutants</mtext><mtext>Total Mutants</mtext></mfrac><mo>×</mo><mn>100</mn><mi mathvariant="normal">%</mi></mrow><annotation encoding="application/x-tex">\text{Mutation Score} = \frac{\text{Killed Mutants}}{\text{Total Mutants}} \times 100\%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord text"><span class="mord">Mutation Score</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.2251em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">Total Mutants</span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">Killed Mutants</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8056em;vertical-align:-0.0556em;"></span><span class="mord">100%</span></span></span></span></p>
<ul>
<li><strong>0%</strong> — Your tests pass no matter what the code does. Assertions are missing or trivial.</li>
<li><strong>100%</strong> — Every mutation triggers a test failure. Your tests are tight.</li>
</ul>
<p>Unlike coverage, this score reflects <strong>assertion quality</strong>, not just execution. A test suite full of <code>expect_true(is.numeric(x))</code> checks will hit 100% coverage while missing every meaningful failure. Mutation testing exposes that.</p>
<h2 id="why-you-should-care">Why You Should Care</h2>
<p>Here is the canonical example. The function <code>is_adult</code> has a boundary condition:</p>
<pre># R/is_adult.R
is_adult &lt;- function(age) {
  age &gt;= 18
}</pre>
<p>And these tests give 100% coverage:</p>
<pre># tests/testthat/test-is_adult.R
test_that(&quot;is_adult returns TRUE for adults&quot;, {
  expect_true(is_adult(25))
})

test_that(&quot;is_adult returns FALSE for minors&quot;, {
  expect_false(is_adult(10))
})</pre>
<p>Both tests pass. Both would still pass if <code>&gt;=</code> were accidentally replaced with <code>&gt;</code>. The boundary value <code>18</code> is never tested, so neither mutant is killed:</p>
<pre>#' R/is_adult.R — mutant 1: &quot;&gt;=&quot; → &quot;&gt;&quot;
is_adult &lt;- function(age) {
  age &gt; 18
}</pre>
<p>Imagine this bug makes it to production. A 18 year old user tries to sign up, and the system rejects them. The bug is real, but your tests never saw it coming.</p>
<p>Running <code>muttest</code> exposes this immediately:</p>
<pre>library(muttest)

plan &lt;- muttest_plan(
  mutators = comparison_operators()
)
muttest(plan)</pre>
<p>The progress table shows one survivor. The fix is a single test:</p>
<pre>test_that(&quot;is_adult returns TRUE at the boundary age&quot;, {
  expect_true(is_adult(18))  # kills the &gt;= → &gt; mutant
})</pre>
<p><strong>This surviving mutant is not a problem to fix — it’s a specification you forgot to write.</strong></p>
<h3 id="the-llm-test-problem">The LLM Test Problem</h3>
<p>Many developers now use LLMs to generate tests. Who likes to write tests themselves anyway?</p>
<p>LLMs are fast and produce syntactically correct code, but they may produce obvious cases, miss boundaries or just test properties of the code. The <code>is_adult</code> test suite above is what a language model might produce: structurally fine, semantically incomplete.</p>
<p>Mutation testing gives you an objective signal for how strong tests actually are, whether you wrote them yourself or they were generated by an LLM. A low mutation score doesn’t mean the LLM did a bad job — it means you now know exactly where to strengthen the assertions. <strong>LLM-generated tests need external validation just as much as human-written tests do.</strong></p>
<p><code>muttest</code> provides tools to help with this validation.</p>
<hr>
<h2 id="whats-new-in-020">What’s New in 0.2.0</h2>
<h3 id="expanded-mutator-library">Expanded Mutator Library</h3>
<p>The biggest addition in this release is a full roster of new mutators, organized into individual mutators and ready-made preset collections.</p>
<p><strong>New individual mutators:</strong></p>
<ul>
<li><code>boolean_literal(&quot;TRUE&quot;, &quot;FALSE&quot;)</code> — flips boolean constants: <code>TRUE → FALSE</code></li>
<li><code>na_literal(&quot;NA&quot;, &quot;NULL&quot;)</code> — swaps NA variants and NULL: <code>NA → NULL</code></li>
<li><code>negate_condition()</code> — wraps <code>if</code> conditions in <code>!(...)</code>: <code>if (x &gt; 0)</code> → <code>if (!(x &gt; 0))</code></li>
<li><code>remove_condition_negation()</code> — strips leading <code>!</code> from conditions: <code>if (!done)</code> → <code>if (done)</code></li>
<li><code>numeric_increment()</code> / <code>numeric_decrement()</code> — shifts numeric constants by one: <code>5 → 6</code>, <code>5 → 4</code></li>
<li><code>index_increment()</code> / <code>index_decrement()</code> — shifts subscript indices: <code>x[i]</code> → <code>x[i + 1L]</code></li>
<li><code>string_empty()</code> — replaces non-empty strings with <code>&quot;&quot;</code>: <code>&quot;hello&quot; → &quot;&quot;</code></li>
<li><code>string_fill()</code> — replaces empty strings with <code>&quot;mutant&quot;</code>: <code>&quot;&quot; → &quot;mutant&quot;</code></li>
<li><code>call_name(&quot;any&quot;, &quot;all&quot;)</code> — swaps function names: <code>any(x) → all(x)</code></li>
<li><code>remove_negation()</code> — removes <code>!</code> anywhere: <code>!is.na(x) → is.na(x)</code></li>
<li><code>replace_return_value()</code> — replaces explicit return values with <code>NULL</code>: <code>return(x) → return(NULL)</code></li>
<li><code>delete_statement()</code> — removes assignments and standalone calls one at a time, catching untested side effects and dead assignments</li>
</ul>
<p><strong>New preset collections</strong> — pass a single call and get the full set of relevant mutators:</p>
<ul>
<li><code>boolean_literals()</code> — <code>TRUE &#x2194; FALSE</code>, <code>T &#x2194; F</code></li>
<li><code>na_literals()</code> — <code>NA &#x2194; NULL</code>, <code>NA &#x2194; NA_real_</code>, <code>NA &#x2194; NA_integer_</code>, <code>NA &#x2194; NA_character_</code></li>
<li><code>numeric_literals()</code> — combines <code>numeric_increment()</code> and <code>numeric_decrement()</code></li>
<li><code>index_mutations()</code> — combines <code>index_increment()</code> and <code>index_decrement()</code></li>
<li><code>string_literals()</code> — combines <code>string_empty()</code> and <code>string_fill()</code></li>
<li><code>condition_mutations()</code> — combines <code>negate_condition()</code> and <code>remove_condition_negation()</code></li>
</ul>
<p>The three operator presets from 0.1.0 are still there — <code>arithmetic_operators()</code>, <code>comparison_operators()</code>, <code>logical_operators()</code> — and now they have company.</p>
<p>A practical starting configuration covers most of what you’d want to catch in business logic:</p>
<pre>plan &lt;- muttest_plan(
  source_files = &quot;R/my_file.R&quot;,
  mutators = c(
    arithmetic_operators(),
    comparison_operators(),
    logical_operators(),
    condition_mutations(),
    numeric_literals(),
    list(remove_negation())
  )
)</pre>
<p>Layer in <code>boolean_literals()</code>, <code>na_literals()</code>, <code>string_literals()</code>, or <code>index_mutations()</code> based on what your code actually does.</p>
<h3 id="mutators-are-now-parametrized">Mutators Are Now Parametrized</h3>
<p>Individual mutators accept configuration arguments. <code>operator(&quot;+&quot;, &quot;-&quot;)</code> and <code>boolean_literal(&quot;TRUE&quot;, &quot;FALSE&quot;)</code> let you define exactly which token to replace and with what — so you can express the mutations that matter for your domain without writing a custom mutator from scratch. The <code>Mutator</code> base class is also now exported for cases where you want to go further and build an entirely custom mutator.</p>
<h3 id="survived-mutants-are-now-reported">Survived Mutants Are Now Reported</h3>
<p>The <code>ProgressMutationReporter</code> previously showed you only killed and total mutant counts. In 0.2.0, it now reports <strong>survived mutants</strong> — the ones your tests missed.</p>
<p>This is the signal that matters. Survivors are not noise; each one represents a real gap in your test suite. Seeing them surfaced directly in the progress output makes the feedback loop tighter: run <code>muttest</code>, read the survivors, add a test, repeat.</p>
<pre>i Mutation Testing
  |   K |   S |   E |   T |   % | Mutator  | File
v |   1 |   0 |   0 |   1 | 100 | &gt; → &lt;    | shipping.R
x |   1 |   1 |   0 |   2 |  50 | &gt; → &gt;=   | shipping.R
-- Survived Mutants -----------------------------------------------
shipping.R  &gt; → &gt;=
2-   if (weight_kg &gt; 5) 15.00 else 5.00
2+   if (weight_kg &gt;= 5) 15.00 else 5.00
-- Results --------------------------------------------------------
[ KILLED 1 | SURVIVED 1 | ERRORS 0 | TOTAL 2 | SCORE 50.0% ]</pre>
<h3 id="timeouts-and-improved-error-handling">Timeouts and Improved Error Handling</h3>
<p>Mutation testing works by running your test suite once per mutant. Some mutations produce code that hangs — an infinite loop, a blocking call, a computation that never completes. In 0.1.0 that would stall your entire run.</p>
<p>In 0.2.0, <code>muttest()</code> supports <strong>per-mutant timeouts</strong>. Set a timeout and any mutant whose test run exceeds it is marked as errored. The rest of the run continues unaffected.</p>
<p>Error handling in general has been improved. When test execution fails unexpectedly, errors are now captured and reported cleanly rather than surfacing as unhandled conditions that stop the whole run. This makes mutation testing more robust in real projects where test environments are not always perfectly controlled.</p>
<h3 id="parallel-execution">Parallel Execution</h3>
<p>The 0.1.0 release ran mutants sequentially. In large files with many mutants, that adds up. <code>muttest()</code> now supports <strong>parallel execution</strong> with {mirai} under the hood: mutants can be run concurrently across multiple workers, cutting run time on larger repositories.</p>
<hr>
<h2 id="getting-started">Getting Started</h2>
<p>Install from CRAN:</p>
<pre>install.packages(&quot;muttest&quot;)</pre>
<p>Pick one file with meaningful logic — branching, comparisons, arithmetic. Define a plan:</p>
<pre>library(muttest)

plan &lt;- muttest_plan(
  source_files = &quot;R/your_file.R&quot;,
  mutators = comparison_operators()
)

muttest(plan)</pre>
<p>Read the output. Find the survivors. Add the tests they imply. Repeat.</p>
<p>Start with one file and one mutator preset. Aim for a meaningful score improvement each iteration rather than chasing 100% immediately. <strong>A score of 80%+ on critical business logic is a strong starting target.</strong></p>
<p>Try it on a file where you suspect the tests are weak. The survivors will tell you exactly what to add.</p>
<hr>
<h2 id="id-love-to-hear-from-you">I’d Love to Hear From You</h2>
<p><code>{muttest}</code> is still fresh and its features and interface might change. The new mutator library covers a wide range of patterns, but there are certainly mutations specific to your domain that aren’t covered yet. If you run into a case where the right mutation is missing, an existing mutator behaves unexpectedly, or something in the output is hard to interpret, please open an issue on GitHub.</p>
<p>Feature requests are equally welcome. If there’s a kind of code change you’d want to test for and there’s no good way to express it yet, please drop an issue in the repository.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://jakubsobolewski.com/blog/muttest-0_2_0"> jakub::sobolewski</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/muttest-0-2-0-more-mutators-better-reporting-and-parallel-execution/">muttest 0.2.0: More Mutators, Better Reporting, and Parallel Execution</a>]]></content:encoded>
					
		
		<enclosure url="https://jakubsobolewski.com/rss-image.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">401242</post-id>	</item>
		<item>
		<title>Is logistic regression regression?</title>
		<link>https://www.r-bloggers.com/2026/05/is-logistic-regression-regression/</link>
		
		<dc:creator><![CDATA[datascienceconfidential - r]]></dc:creator>
		<pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://datascienceconfidential.github.io/r/predictive-models/2026/05/14/is-logistic-regression-regression.html</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines1. Many statisticians of the more old-school type seemed to disagree. T...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/is-logistic-regression-regression/">Is logistic regression regression?</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://datascienceconfidential.github.io/r/predictive-models/2026/05/14/is-logistic-regression-regression.html"> datascienceconfidential - r</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines<sup><a href="https://datascienceconfidential.github.io/r/predictive-models/2026/05/14/is-logistic-regression-regression.html#myfootnote1" rel="nofollow" target="_blank">1</a></sup>. Many statisticians of the more old-school type seemed to disagree. This led me to think a bit more deeply about the subject. I’ve already written several posts on bad terminology in statistics (see <a href="https://datascienceconfidential.github.io/statistics/probability/2020/07/23/confidence-intervals.html" rel="nofollow" target="_blank">confidence level</a>, <a href="https://datascienceconfidential.github.io/statistics/linear-regression/2020/08/16/line-of-best-fit.html" rel="nofollow" target="_blank">line of best fit</a>, <a href="https://datascienceconfidential.github.io/statistics/linear-regression/python/2021/06/01/r-squared.html" rel="nofollow" target="_blank">r squared</a>) so I might have been expected to agree with the machine learning view, but in this case I agree with the statisticians, and I would like to explain why.</p>

<h1 id="what-data-scientists-think-regression-is">What data scientists think regression is</h1>

<p>In data science classes, students are taught that there are two kinds of predictive modelling. In both cases, the aim is to predict a response $Y$ given a vector of features $X$. If $Y$ is real-valued (<code>numeric</code> in R terminology) then it’s a <em>regression</em> problem. If $Y$ is categorical then it’s a <em>classification</em> problem. I’m not sure where this terminology originated, but it’s certainly been propogated very widely by Hastie and Tibshirani’s classic <a href="https://hastie.su.domains/ElemStatLearn/" rel="nofollow" target="_blank"><em>The Elements of Statistical Learning</em></a>.</p>

<p>In logistic regression, your data consists of some feature values $X$ and a response $Y \in \lbrace 0, 1 \rbrace$. In this case, the response is definitely categorical, so someone trained in data science would indeed call this a classification problem. But if you look more closely at the output produced by logistic regression, its predicted values are numbers, namely the probability of each data point being in the class labelled $1$. You need to do something to these numbers (for example, use a cutoff) in order to get a predicted class.</p>

<p>For example, in R:</p>

<pre>set.seed(100)
N &lt;- 100
a &lt;- -1
b &lt;- 1
x &lt;- 2 * rnorm(N)

# simulated binary data
y &lt;- rbinom(N, 1, 1/(1 + exp(-a -b * x)))

# plot observed values in grey
plot(x, y, pch=19, xlab=&quot;x&quot;, ylab=&quot;y&quot;,
     col=rgb(0, 0, 0, 0.3), las=1)

# fit logistic regression
model &lt;- glm(y ~ x, family=&quot;binomial&quot;)

# plot predicted values in red
points(x, 
       predict(model, data.frame(x=x), 
                  type=&quot;response&quot;),
       col=rgb(1, 0, 0, 0.3),
       pch=19)
</pre>

<div style="width:70%; margin:0 auto;">
 <img src="https://i2.wp.com/datascienceconfidential.github.io/blog/images/2026/logistic_regression_example.png?w=578&#038;ssl=1" data-recalc-dims="1" />
</div>

<p>In fact, it’s quite hard to think of a machine learning algorithm which directly predicts class membership rather than some sort of measure of how strongly a data point is a member of a class. Even Naive Bayes is making some sort of attempt to predict the probability of class membership. The simplest algorithm which directly predicts the class instead of the probability of class membership is the <a href="https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm" rel="nofollow" target="_blank">1-nearest neighbour algorithm</a>. (But if you used a larger number of neighbours, say 20, you would get some sort of estimate of how confident you were in your prediction.)</p>

<h1 id="what-statisticians-think-regression-is">What statisticians think regression is</h1>

<p>The term <em>regression</em> comes from Galton’s idea of <em>regression to the mean</em> (which I have written about <a href="https://datascienceconfidential.github.io/statistics/probability/2024/12/02/mount-everest.html" rel="nofollow" target="_blank">here</a>). Originally this was the observation that tall parents tend to have children who are shorter than them, and vice versa. The heights of children seem to regress towards the mean of the whole population.</p>

<p>More generally, the values of the response $Y$ corresponding to some fixed value of the features $x_0$ will follow some probability distribution. The mean of this distribution is $E[Y \vert x_0]$. The observed values of $Y$ will cluster around this mean. If you repeatedly draw values of $Y$, a large value will tend to be followed by a smaller value, and vice-versa. Thus, $E[Y \vert X]$ will tend to be smaller than $Y$ if $Y$ is unusually large, and larger than $Y$ if $Y$ is unusually small<sup><a href="https://datascienceconfidential.github.io/r/predictive-models/2026/05/14/is-logistic-regression-regression.html#myfootnote2" rel="nofollow" target="_blank">2</a></sup>. You can see this if you use linear regression to predict $Y$ given $X$, as in the following example.</p>

<pre>set.seed(100)
N &lt;- 500

x &lt;- rnorm(N)
y &lt;- 0.4 * x + 0.8 * rnorm(N)
plot(x, y)
abline(coef(lm(y~x)), col=&quot;red&quot;)
</pre>

<div style="width:70%; margin:0 auto;">
 <img src="https://i0.wp.com/datascienceconfidential.github.io/blog/images/2026/regression_example.png?w=578&#038;ssl=1" data-recalc-dims="1" />
</div>

<p>(Note how the slope of the regression line is shallower than the “slope” which the eye perceives in the cloud of data points, which is the <a href="https://en.wikipedia.org/wiki/Principal_axis_theorem" rel="nofollow" target="_blank">principal axis</a>.)</p>

<p>But some algorithms don’t give you any regression effect. For example, an overfitted decision tree (a.k.a 1-NN regressor) will not show any regression to the mean, as in the following example. Note that the blue line does not under- or over-predict for the extreme values of $x$.</p>

<pre>x &lt;- c(1:9)
y &lt;- c(-10, seq(-1,1, length=7), 10)
pred_nn &lt;- function(xx) y[which.min(abs(xx - x))[1]]

plot(x, y)
abline(coef(lm(y~x)), col=&quot;red&quot;)
xx &lt;- seq(1, 9, length=1000)
lines(xx, sapply(xx, pred_nn), type=&quot;s&quot;, lty=2, col=&quot;blue&quot;)
</pre>

<div style="width:70%; margin:0 auto;">
 <img src="https://i0.wp.com/datascienceconfidential.github.io/blog/images/2026/regression_example_2.png?w=578&#038;ssl=1" data-recalc-dims="1" />
</div>

<p>In this case, you have an algorithm which is predicting a numerical value, so data scientists would call it a regression, but it’s not actually exhibiting any regression. How annoying!</p>

<h1 id="what-regression-actually-is">What regression actually is</h1>

<p>Although it’s too late to rewrite the textbooks, maybe it could be argued that regression and classification should have been defined in the following way. If a predictive model directly predicts a response $Y$ given features $X$, then it should be called a classification model (even if $Y$ is numeric, as in the previous example). But if the model predicts $E[Y \vert X]$, then it should be called a regression model.</p>

<p>What about logistic regression? In this case, the model is predicting $P(Y=1 \vert X)$ which is just $E[Y \vert X]$. So the statisticians were right in the first place! Logistic regression <em>is</em> a regression model. It only becomes a classification model if you apply a second model to it. Usually this takes the form of a decision tree which predicts $Y=1$ if $E[Y \vert X] > p_0$ for some choice of $p_0$ and $Y=0$ otherwise. This decision tree <em>is</em> a classification model. But logistic regression itself isn’t.</p>

<hr />

<p><small>
<a name="myfootnote1">1</a>: I’m a little wary of calling myself a data scientist these days, partly because I think the profession has been devalued by various attempts to cash in on its popularity (leading to a glut of people with high confidence and low experience) and partly because I think data science is becoming a bit of a toxic brand with all the <a href="https://datascienceconfidential.github.io/economics/ai/llm/r/2026/01/07/so-how-much-does-openai-owe-us.html" rel="nofollow" target="_blank">real-world harm</a> being done by AI, data centres, mass surveillance, etc.
</small></p>

<p><small>
<a name="myfootnote2">2</a>: Anecdote time: at one of my old jobs we had to entertain a vendor who was basically selling a Kaggle-style workflow as a software-as-a-service product. The sales rep built a model on some of our data and presented it. In their write-up they included the observation that “interestingly, we noticed that the model tends to underpredict for large values of $x$ and overpredict for small values of $x$”. Well, that’s not very surprising because that’s what <em>every</em> predictive model does!
</small></p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://datascienceconfidential.github.io/r/predictive-models/2026/05/14/is-logistic-regression-regression.html"> datascienceconfidential - r</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/is-logistic-regression-regression/">Is logistic regression regression?</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401226</post-id>	</item>
		<item>
		<title>15 Years of rOpenSci, and We&#8217;re Just Getting Started 🎉</title>
		<link>https://www.r-bloggers.com/2026/05/15-years-of-ropensci-and-were-just-getting-started-%f0%9f%8e%89/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Wed, 13 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/05/13/anniversary2026/</guid>

					<description><![CDATA[<p>Digging through our memory box, we came across a conversation from which we tried to piece together when it all began with rOpenSci.<br />
On July 13, 2011, an email was sent with the idea of a shared blog, a clever domain name, and a way to connect R packa...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/15-years-of-ropensci-and-were-just-getting-started-%f0%9f%8e%89/">15 Years of rOpenSci, and We’re Just Getting Started 🎉</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/05/13/anniversary2026/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>Digging through our memory box, we came across a conversation from which we tried to piece together when it all began with rOpenSci.</p>
<p>On July 13, 2011, an email was sent with the idea of a shared blog, a clever domain name, and a way to connect R package developers who cared about open science. The name “rOpenSci” appear in that email. A few months before that, the first commits had already been pushed to what would become taxize and treeBASE, two packages that quietly planted the seed of something much bigger.</p>
<p>That was 15 years ago. This year, we celebrate. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f389.png" alt="🎉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<figure class="center"><img src="https://i1.wp.com/ropensci.org/blog/2026/05/13/anniversary2026/PixelArt15yearrOpenSci.png?w=450&#038;ssl=1"
alt="Retro pixel-art graphic celebrating rOpenSci&#39;s 15th anniversary. The text &#39;rOpenSci&#39; appears at the top in pixel font, flanked by three pixel-art balloons. A browser window frames the central message: &#39;15 YEARS / TRANSFORMING OPEN SCIENCE&#39; in bold pixel letters, overlaid on the rOpenSci geometric network pattern. A pixel badge reads &#39;OMG&#39;. A pixel folder and sparkle icons complete the design."  data-recalc-dims="1"><figcaption>
<p>Template design by Lauren Creatives in Canva. Adapted by Yani.</p>
</figcaption>
</figure>
<h2>
<em>Quinceañera</em> time <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f382.png" alt="🎂" class="wp-smiley" style="height: 1em; max-height: 1em;" />
</h2><p>Fifteen years<sup id="fnref:1"><a href="https://ropensci.org/blog/2026/05/13/anniversary2026/#fn:1" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">1</a></sup> is a milestone worth marking properly, that is why we want to celebrate with our community. We have a full year of activities planned, and we want you along for all of it.</p>
<p>Expect several diverse events, retrospectives and a few surprises we’re still stitching together. We’ll be reflecting on what we’ve built, highlighting the work of contributors old and new, and dreaming out loud about the next 15 years.</p>
<p>Stay tuned to this blog and our newsletter for announcements as the year unfolds.</p>
<h2>
First up: co-working session and casual virtual community celebration
</h2><p>We’re kicking things off with one <a href="https://ropensci.org/events/coworking-2026-06" rel="nofollow" target="_blank">co-working session on Tuesday, June 2</a> and two virtual celebrations on <a href="https://ropensci.org/events/celebrations-2026-06-10" rel="nofollow" target="_blank">Wednesday, June 10</a> and <a href="https://ropensci.org/events/celebrations-2026-06-17" rel="nofollow" target="_blank">Wednesday, June 17</a> in different timezones, so as many people as possible can join.</p>
<p>Each 90-minute celebration is built around rotating small-group conversations. You’ll meet community members you may not know yet, and together you’ll dig into questions to reflect on rOpenSci’s past and future.</p>
<p>Old friends and new faces alike, we would love to share this celebrations with you. No registration needed.</p>
<h2>
Thank you
</h2><p>To everyone who has contributed a package, reviewed code, written a blog post, helped in the forum, showed up to a community call, or simply used our tools in your research, thank you!</p>
<p>Here’s to 15 years and to whatever comes next.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>The fifteenth birthday of young women in many Latin American cultures is a special marker of adulthood called a Quinceañera. <a href="https://ropensci.org/blog/2026/05/13/anniversary2026/#fnref:1" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
</ol>
</div>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/05/13/anniversary2026/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/15-years-of-ropensci-and-were-just-getting-started-%f0%9f%8e%89/">15 Years of rOpenSci, and We’re Just Getting Started 🎉</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401220</post-id>	</item>
		<item>
		<title>Durations of wars by @ellis2013nz</title>
		<link>https://www.r-bloggers.com/2026/05/durations-of-wars-by-ellis2013nz/</link>
		
		<dc:creator><![CDATA[free range statistics - R]]></dc:creator>
		<pubDate>Tue, 12 May 2026 13:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://freerangestats.info/blog/2026/05/13/war-durations</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> How long do wars last, on average? If a war such as that currently under way in Iran has lasted 74 days so far, how long do we expect it to last in total? For all sorts of reasons, inquiring minds are interested. Luckily there are some very well curate...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/durations-of-wars-by-ellis2013nz/">Durations of wars by @ellis2013nz</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://freerangestats.info/blog/2026/05/13/war-durations"> free range statistics - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>How long do wars last, on average? If a war such as that currently under way in Iran has lasted 74 days so far, how long do we expect it to last in total? For all sorts of reasons, inquiring minds are interested. Luckily there are some very well curated datasets out there, including the <a href="https://correlatesofwar.org/data-sets/cow-war/" rel="nofollow" target="_blank">Correlates of War</a>, that make it easy to answer these questions.</p>

<p>A caveat to all this applies that I am not a military historian, just an interested amateur. I’m very open to having mistakes of interpretation or method pointed out to me.</p>

<h2 id="distribution-of-wars-durations">Distribution of wars’ durations</h2>

<p>The Correlates of War data lets us see, for example, that this is the distribution (on a logarithmic scale) of durations of wars post-Napoleon:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0321-density.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0321-density.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>You can see I’ve compared this to a log-normal distribution and found that it doesn’t have quite as fat tails as that. But that’s ok, I’m not too worried about the precise shape, because later on I’ll be using pretty straightforward empirical methods.</p>

<p>This data is only for inter-state wars, which are in contrast to intra-state (eg civil wars) and extra-state (eg with external non-state actors). As I’m interested in a reference population to compare the current USA-Israel-Iran war to, it’s the inter-state population I want.</p>

<p>The median length of a war is 139 days and the mean is 408 days.</p>

<p>The four day war in the dataset is the so-called “<a href="https://en.wikipedia.org/wiki/Football_War" rel="nofollow" target="_blank">Football War</a>” of 1969 between Honduras and El Salvador. The 3,734 day war was the much better-known “Vietnam War Phase II”, involving USA, Australia, Vietnam, Cambodia and others.</p>

<p>Here’s the code to import the data from the Correlates of War project and draw that first density plot:</p>

<figure class="highlight"><pre>library(tidyverse)
library(lubridate)
library(janitor)
library(glue)
library(ggrepel)
library(scales)

# https://correlatesofwar.org/data-sets/cow-war/


#----- import interstate war data----------------------

interstate &lt;- read_csv(&quot;https://correlatesofwar.org/wp-content/uploads/Inter-StateWarData_v4.0.csv&quot;) |&gt; 
  clean_names() |&gt; 
  mutate(start_date = as.Date(sprintf(&quot;%04d-%02d-%02d&quot;, start_year1, start_month1, start_day1)),
         end_date = as.Date(sprintf(&quot;%04d-%02d-%02d&quot;, end_year1, end_month1, end_day1)))

interstate_wars &lt;- interstate |&gt; 
  group_by(war_num, war_name) |&gt; 
  summarise(earliest_start= min(start_date),
            latest_end = max(end_date),
            bat_death = sum(bat_death)) |&gt; 
  mutate(duration = as.numeric(latest_end - earliest_start),
         start_year = year(earliest_start)) |&gt; 
  ungroup()

# what years covered? 1823 to 2003 at time of writing
range(interstate_wars$start_year)

#==========================plots=================
 
simple_caption &lt;- &quot;Source: Correlates of War, Inter-State War Data; analysis by freerangestats.info&quot;

#-----------------distribution of duration------------
summary(interstate_wars$duration)

sim_norm &lt;- data.frame(duration = 10 ^ (rnorm(1e6, 
                                        mean = log10(interstate_wars$duration), 
                                        sd = sd(log10(interstate_wars$duration)))))

interstate_wars |&gt; 
  ggplot(aes(x = duration)) +
  geom_density() +
  geom_rug() +
  geom_density(data = sim_norm, colour = &quot;orange&quot;) +
  annotate(&quot;text&quot;, x= 1, y = 0.18, label = &quot;Simulated log-normal distribution&quot;, 
           colour = &quot;orange&quot;, hjust = 0) +
  annotate(&quot;text&quot;, x= 300, y = 0.51, label = &quot;Empirical distribution of war durations&quot;, 
           colour = &quot;black&quot;, hjust = 0) +
  # carefully chosen labels for x axis:
  scale_x_log10(label = comma, breaks = c(range(interstate_wars$duration), 10, 100, 1000)) +
  labs(x = &quot;Duration of wars (in days, logarithmic scale)&quot;,
       y = &quot;Density&quot;,
       title = &quot;Distribution of war durations, 1823 to 2003&quot;,
       subtitle = &quot;More concentrated, less-fat tails than a log-normal distribution&quot;,
       caption = simple_caption) +
  # use coord to limit x axis so statistical calculations are all done on full data:
  coord_cartesian(xlim = c(1, 8000))</pre></figure>

<p>OK, so my main analytical task here is to work out the conditional expected duration of a war that has reached 74 days &#8211; the length so far of the USA-Israel-Iran war. Yes, I know there’s an incompletely observed ceasefire, but there’s also a blockade (or two), and that’s unambiguously an act of war under international law. So I’m counting the war as ongoing.</p>

<p>My chart to answer this question is this one:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0321-cumulative-distribution.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0321-cumulative-distribution.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>What’s happening here is:</p>

<ul>
  <li>the empirical cumulative distribution function of durations is the dark line &#8211; basically the cumulative frequency on the vertical axis, but expressed as a proportion.</li>
  <li>the grey line is a simple LOESS smoother of that cumulative frequency, useful for modelling values that aren’t exactly matched in the data.</li>
  <li>the red lines show the duration of the current war, and where it would fit in the distribution of 1823 to 2003 wars. It’s about 0.33 (defined in the code below as the variable <code>current_cf</code>), meaning that the current war is already longer than about 33% of wars.</li>
  <li>the horizontal blue line is half way in the vertical space between the horizontal red line and 1. Where it meets the smoothed line and drops a vertical blue line shows the expected median duration of a war that has gotten to this 0.33 point on the cumulative frequency.</li>
</ul>

<p>So we see that of wars that get as long as 74 days, we expect the median total length to be 261 days. That’s a bit grim for those of us who think that even extending into June is going to be very bad indeed for the world economy, but it’s good to know. Of course, there’s plenty of wars that get to 74 days and then stop soon after, so there’s hope there too.</p>

<p>Here’s the code to do that bit of statistical inference and draw the chart:</p>

<figure class="highlight"><pre>#-------------------cumulative distribution--------------
interstate_cumulative &lt;- interstate_wars |&gt; 
  arrange(duration) |&gt; 
  mutate(cumulative_freq = 1:n() / n()) 

# smoothed model of the cumulative distribution, including estimates of where
# the Iran war is on it:
model &lt;- loess(cumulative_freq ~ log(duration), data = interstate_cumulative)
current_dur &lt;- 74 # as at 13 May 2025 - war started 28 February 2026
current_cf &lt;- predict(model, newdata = data.frame(duration = current_dur))

# inverse model to estimate duration given a cumulative frequency, useful for
# annotations on the chart:
inv_model &lt;- loess(duration ~ x, 
                   data = data.frame(duration = interstate_cumulative$duration, 
                                     x = fitted(model)))

# of wars that last this long, what is the median cumulative frequency (i.e. half-way to 1):
conditional_median_freq &lt;- (1 + current_cf) / 2
# of wars with that median cumulative frequency, convert it back into a duration,
conditional_median_dur &lt;- predict(inv_model, data.frame(x = conditional_median_freq))

# Draw chart of cumulative distribution:
interstate_cumulative |&gt; 
  ggplot(aes(x = duration, y = cumulative_freq)) +
  geom_smooth(method = &quot;loess&quot;, colour = &quot;grey80&quot;) +
  geom_line() +
  # note that (seems a bit odd) need to manually do the scale transform to geom_segment here:
  geom_segment(x = log10(current_dur), xend = log10(current_dur), y = -Inf, yend = current_cf, colour = &quot;red&quot;) +
  geom_segment(x = 0, xend = log10(current_dur), y = current_cf, yend = current_cf, colour = &quot;red&quot;) +
  geom_segment(x = log10(conditional_median_dur), xend = log10(conditional_median_dur), y = -Inf, yend = conditional_median_freq, colour = &quot;blue&quot;) +
  geom_segment(x = 0, xend = log10(conditional_median_dur), y = conditional_median_freq, yend = conditional_median_freq, colour = &quot;blue&quot;) +
  
  annotate(&quot;text&quot;, x = current_dur * 0.95, y = 0.39, label = &quot;Current Iran war&quot;, colour = &quot;red&quot;, hjust = 1) +
  annotate(&quot;text&quot;, x = conditional_median_dur * 1.05, y = 0.62, colour = &quot;blue&quot;, hjust = 0, vjust = 1, 
           label = glue(&quot;Median expectation conditional 
on at least {current_dur} days&quot;)) +
  scale_x_log10(label = comma, breaks = c(10, current_dur, 100, conditional_median_dur, 1000)) +
  labs(x = &quot;Total duration of war (in days, logarithmic scale)&quot;,
       y = &quot;Cumulative frequency of wars&quot;,
       title = &quot;Expectations of duration of Iran war, based on modern inter-state wars' duration&quot;,
       subtitle = glue(&quot;Comparison to wars from 1823 to 2003. The median war that lasts {current_dur} days goes on to last {round(conditional_median_dur)} days.&quot;),
       caption = simple_caption)</pre></figure>

<p>We can use the same approach to calculate not just the median war duration (conditional on getting to 74 days) but other percentiles. For example, in the below we can construct an 80% prediction interval (between the 0.1 and 0.9 quantiles) of total duration of 94.9 and 1,752 days. To put this another way, from this 74 day point, only 10% of wars will have a total duration of 94.9 or less days (ie another 21 days).</p>

<p>All up, that’s a big range of course; the main thing it tells us is that wars last longer than many people would like, and there’s a big variation in wars’ duration.</p>

<figure class="highlight"><pre># some prediction intervals, conditional on getting to 74 days:
probs &lt;- c(0.05, 0.1, 0.5, 0.8, 0.9, 0.95)
more_freqs &lt;- probs * (1 - current_cf) + current_cf
conditional_dur &lt;- predict(inv_model, data.frame(x = more_freqs))
tibble(probability = probs, duration = conditional_dur)
# so 80% of wars that reach 74 days will have a total duration between 95 and 1,752 days</pre></figure>

<pre>  probability duration
        &lt;dbl&gt;    &lt;dbl&gt;
1        0.05     82.3
2        0.1      94.9
3        0.5     261. 
4        0.8    1141. 
5        0.9    1752. 
6        0.95   2119. 
</pre>

<h2 id="duration-and-other-factors">Duration and other factors</h2>

<p>So I’d answered my main question but I was naturally curious about some other relationships too. Obviously one expects longer wars to have more deaths in battle; can we see this in the data? Yes we can:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0321-duration-deaths.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0321-duration-deaths.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>I like this chart as presenting the scale of nearly two centuries of inter-state war in one easy visualisation.</p>

<p>We also see that if there’s a pattern in relationship between duration, deaths and when the war started (the starting year mapped to colour in the chart above) it’s not an obvious one. We’ll come back to that in the next chart, but first, here’s the code to create the scatter plot above.</p>

<figure class="highlight"><pre>#------------------Compare duration and number of deaths----------------
interstate_wars |&gt; 
  ggplot(aes(x = duration, y = bat_death, label = war_name)) +
  geom_point(aes(colour = start_year), size = 3.5) +
  geom_text_repel(colour = &quot;grey50&quot;, size = 2, seed = 123) +
  scale_y_log10(label = comma) +
  scale_x_log10(label = comma) +
  scale_colour_viridis_c() +
  labs(title = &quot;Inter-state wars, 1823-2003&quot;,
       colour = &quot;Starting year&quot;,
       x = &quot;Duration in days&quot;,
       y = &quot;Number of battle deaths&quot;,
       caption = simple_caption) +
  theme(legend.position = c(0.15, 0.8))</pre></figure>

<p>I was a bit worried about that “two centuries” thing. Are recent wars all much shorter, or perhaps much longer, than older wars? If so it would be a big limitation on my inference about likely war length. So I prepared one more plot to check out if there was an obvious relationship, more rigorously than just eye-balling colour on the previous plot. I was a bit surprised to see that actually there is no real growth or reduction in war duration over time:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0321-duration-history.svg" width="450"><img src="https://i0.wp.com/freerangestats.info/img/0321-duration-history.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>I also quite like this chart as giving us an instant comparison of our current USA-Israel-Iran war with some of those in history. We can see that it is already longer than the Boxer Rebellion, but not quite as long as the Falkland Islands or the War for Kosovo (for all of these names I am using those provided by the Correlates of War project - I’m well aware that these are contested labels).</p>

<p>Here’s my final chunk of code drawing that last chart:</p>

<figure class="highlight"><pre>#------------Compare duration with when in history it happened---------------
interstate_wars |&gt; 
  arrange(bat_death) |&gt; 
  ggplot(aes(x = earliest_start, y = duration)) +
  geom_hline(yintercept = current_dur, colour = &quot;red&quot;) +
  geom_point(aes(size = bat_death), shape = 1) +
  geom_text_repel(aes(label = war_name), colour = &quot;steelblue&quot;, size = 3, seed = 123) +
  annotate(&quot;text&quot;, x= as.Date(&quot;1820-01-01&quot;), y = current_dur + 8, hjust = 0,
           label = &quot;Duration of 2026 US-Israel-Iran war so far&quot;, colour = &quot;red&quot;) +
  scale_y_log10(label = comma) +
  scale_size_area(label = comma, max_size = 25) +
  labs(title = &quot;Inter-state wars, 1823-2003&quot;,
       subtitle = glue(&quot;Compared to the USA-Israel-Iran war as at {Sys.Date()}&quot;),
       x = &quot;Start of war&quot;,
       y = &quot;Duration of war (days)&quot;,
       size = &quot;Number of batlle deaths:&quot;,
       caption = simple_caption)</pre></figure>

<p>That’s all folks. Stay safe out there.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://freerangestats.info/blog/2026/05/13/war-durations"> free range statistics - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/durations-of-wars-by-ellis2013nz/">Durations of wars by @ellis2013nz</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401218</post-id>	</item>
		<item>
		<title>Learning Data Science: Why a High R^2 Can Be Misleading</title>
		<link>https://www.r-bloggers.com/2026/05/learning-data-science-why-a-high-r2-can-be-misleading/</link>
		
		<dc:creator><![CDATA[Learning Machines]]></dc:creator>
		<pubDate>Mon, 11 May 2026 15:09:30 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.ephorie.de/?p=7048</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> A high can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high is not always a sign of a good model, read on! In the post, Learning Data Science: Modelling Basics, we built a simple model to predict ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/learning-data-science-why-a-high-r2-can-be-misleading/">Learning Data Science: Why a High R^2 Can Be Misleading</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.ephorie.de/learning-data-science-why-a-high-r2-can-be-misleading?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=learning-data-science-why-a-high-r2-can-be-misleading"> R-Bloggers – Learning Machines</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><img loading="lazy" fetchpriority="high" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/uploads/2020/01/discount-1015451_1280-e1568068884649-300x277.jpg?resize=300%2C277&#038;ssl=1" alt="" width="300" height="277" class="alignleft size-medium wp-image-2386" srcset_temp="https://i1.wp.com/blog.ephorie.de/wp-content/uploads/2020/01/discount-1015451_1280-e1568068884649-300x277.jpg?resize=300%2C277&#038;ssl=1 300w, https://blog.ephorie.de/wp-content/uploads/2020/01/discount-1015451_1280-e1568068884649-768x709.jpg 768w, https://blog.ephorie.de/wp-content/uploads/2020/01/discount-1015451_1280-e1568068884649-840x776.jpg 840w, https://blog.ephorie.de/wp-content/uploads/2020/01/discount-1015451_1280-e1568068884649.jpg 1081w" sizes="(max-width: 300px) 85vw, 300px" data-recalc-dims="1" /></p>
<p>A high <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> is not always a sign of a good model, read on!</p>
<p><span id="more-7048"></span></p>
<p>In the post, <a href="https://blog.ephorie.de/learning-data-science-modelling-basics" rel="nofollow" target="_blank">Learning Data Science: Modelling Basics</a>, we built a simple model to predict income from age. R printed a model summary containing something called <code>R-squared</code>, but we did not yet discuss what that value actually means.</p>
<p>At first sight, a high <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> looks highly reassuring. In our example, the linear model achieved an <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> close to 90%. That sounds impressive.</p>
<p>However, just as high classification accuracy can be misleading — as discussed in <a href="https://blog.ephorie.de/zeror-the-simplest-possible-classifier-or-why-high-accuracy-can-be-misleading" rel="nofollow" target="_blank">ZeroR: The Simplest Possible Classifier, or Why High Accuracy can be Misleading</a> — a high <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> can also create a false sense of confidence.</p>
<p>To understand why, it helps to examine the formula itself and then revisit the three models from the previous post: the <em>mean model</em>, the <em>linear model</em>, and the <em>polynomial model</em>.</p>
<hr />
<h2>The Meaning of <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/></h2>
<p>The coefficient of determination is defined as:</p>
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-00e6249ac935aca67c46d1437aa02ef0_l3.png?resize=144%2C31&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2 = 1 - \frac{\sum (y_i-\hat y_i)^2}{\sum (y_i-\bar y)^2}" title="Rendered by QuickLaTeX.com" height="31" width="144" style="vertical-align: -11px;" data-recalc-dims="1"/></p>
<p>At first glance, the formula appears intimidating, but its basic idea is relatively simple.</p>
<p>The denominator</p>
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-2c3d465528e95ffd2ac7eee3afdd8fc0_l3.png?resize=149%2C20&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="SS_{tot} = \sum (y_i-\bar y)^2" title="Rendered by QuickLaTeX.com" height="20" width="149" style="vertical-align: -5px;" data-recalc-dims="1"/></p>
<p>measures the <em>total variation in the target variable</em>. It quantifies how strongly the observed values differ from their mean.</p>
<p>The numerator</p>
<p><img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-61749266165fa0e8fe29b5c6c993ee17_l3.png?resize=156%2C20&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="SS_{res} = \sum (y_i-\hat y_i)^2" title="Rendered by QuickLaTeX.com" height="20" width="156" style="vertical-align: -5px;" data-recalc-dims="1"/></p>
<p>measures the <em>remaining unexplained error after fitting the model</em>.</p>
<p>Thus, <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> measures the <em>proportion of variation explained by the model</em>.</p>
<p>An <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> of:</p>
<ul>
<li>0 means the model explains none of the variation,</li>
<li>1 means the model explains all variation perfectly.</li>
</ul>
<p>This sounds straightforward enough. The difficulty is that perfectly explaining the observed data is not necessarily the same thing as building a useful predictive model.</p>
<hr />
<h2>The Mean Model</h2>
<p>Let us begin with the simplest possible regression model.</p>
<p>Suppose we completely ignore age and simply predict the average income for every individual:</p>
<p><img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-6e7456d883a3a0982c0da3efb13957f8_l3.png?resize=47%2C16&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="\hat y_i = \bar y" title="Rendered by QuickLaTeX.com" height="16" width="47" style="vertical-align: -4px;" data-recalc-dims="1"/></p>
<p>This is effectively the regression equivalent of ZeroR. The model does not learn any relationship at all.</p>
<p>In this case:</p>
<p><img loading="lazy" decoding="async" src="https://i0.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-8b14fd9ed1319341f6f2f7ffa5ad52d6_l3.png?resize=119%2C16&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="y_i - \hat y_i = y_i - \bar y" title="Rendered by QuickLaTeX.com" height="16" width="119" style="vertical-align: -4px;" data-recalc-dims="1"/></p>
<p>Therefore, the residual sum of squares becomes identical to the total sum of squares:</p>
<p><img loading="lazy" decoding="async" src="https://i0.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-dad01239a88490f46b0c50e3d31d2e2e_l3.png?resize=199%2C20&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="\sum (y_i-\hat y_i)^2 = \sum (y_i-\bar y)^2" title="Rendered by QuickLaTeX.com" height="20" width="199" style="vertical-align: -5px;" data-recalc-dims="1"/></p>
<p>Substituting this into the formula gives:</p>
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-201f649c3db6193ecc7d5d6ec2cf3dab_l3.png?resize=145%2C24&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2 = 1 - \frac{SS_{tot}}{SS_{tot}} = 0" title="Rendered by QuickLaTeX.com" height="24" width="145" style="vertical-align: -8px;" data-recalc-dims="1"/></p>
<p>The model explains none of the variation in the data.</p>
<p>This corresponds to the <em>underfitting</em> case discussed previously: the model is too simple to capture the underlying structure.</p>
<hr />
<h2>The Polynomial Model</h2>
<p>Now consider the opposite extreme.</p>
<p>Instead of fitting a straight line, suppose we fit a polynomial of sufficiently high degree. In fact, if we have <img loading="lazy" decoding="async" src="https://i0.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-ec4217f4fa5fcd92a9edceba0e708cf7_l3.png?resize=11%2C8&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="n" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;" data-recalc-dims="1"/> observations with distinct age values, a polynomial of degree up to <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-3fd905b384548c9de7011828b88081d5_l3.png?resize=40%2C12&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="n-1" title="Rendered by QuickLaTeX.com" height="12" width="40" style="vertical-align: 0px;" data-recalc-dims="1"/> can pass exactly through all observed data points: </p>
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-2ba542bb2156501d745b39f95647edf6_l3.png?resize=291%2C19&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="y = a_0 + a_1x + a_2x^2 + \dots + a_{n-1}x^{n-1}" title="Rendered by QuickLaTeX.com" height="19" width="291" style="vertical-align: -4px;" data-recalc-dims="1"/></p>
<p>In that case:</p>
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-3723adf0d61a6d5758ddf7bbbe0865d5_l3.png?resize=52%2C16&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="y_i = \hat y_i" title="Rendered by QuickLaTeX.com" height="16" width="52" style="vertical-align: -4px;" data-recalc-dims="1"/></p>
<p>for all observations, implying:</p>
<p><img loading="lazy" decoding="async" src="https://i0.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-67e17d4eacf7a7e6995c47e968ca1464_l3.png?resize=123%2C20&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="\sum (y_i-\hat y_i)^2 = 0" title="Rendered by QuickLaTeX.com" height="20" width="123" style="vertical-align: -5px;" data-recalc-dims="1"/></p>
<p>and therefore:</p>
<p><img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-449b1f5b7fe03b52c0d2a080f98ea2ed_l3.png?resize=53%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2 = 1" title="Rendered by QuickLaTeX.com" height="15" width="53" style="vertical-align: 0px;" data-recalc-dims="1"/></p>
<p>The model achieves a perfect fit.</p>
<p>At first sight, this appears ideal. In practice, however, such a model often performs poorly on unseen data because it has adapted itself not only to the underlying relationship, but also to random fluctuations and noise within the training data.</p>
<p>This is the classical <em>overfitting</em> problem.</p>
<p>A perfect <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> may therefore indicate not a particularly good model, but a model that has become too flexible.</p>
<hr />
<h2>The Linear Model</h2>
<p>The linear model from the previous post lies between these two extremes.</p>
<p>It is simple enough to avoid memorizing every random fluctuation, yet flexible enough to capture a meaningful trend in the data.</p>
<p>This balance between simplicity and flexibility is one of the central themes in statistical learning.</p>
<p>The idea was summarized in the previous post with the following plot:</p>
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/blog.ephorie.de/wp-content/uploads/2019/02/mb4.png?w=450&#038;ssl=1" alt="" class="aligncenter size-full wp-image-554" srcset_temp="https://i2.wp.com/blog.ephorie.de/wp-content/uploads/2019/02/mb4.png?w=450&#038;ssl=1 534w, https://blog.ephorie.de/wp-content/uploads/2019/02/mb4-300x231.png 300w" sizes="auto, (max-width: 534px) 85vw, 534px" data-recalc-dims="1" /></p>
<p>and by the famous observation attributed to George Box:</p>
<blockquote><p>
“All models are wrong, but some are useful.”
</p></blockquote>
<p>The objective in modelling is therefore not to maximize complexity or maximize <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/>, but to find a model that generalizes well beyond the observed sample.</p>
<hr />
<h2>Why <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> Alone Is Insufficient</h2>
<p>The key limitation of <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> is that it evaluates fit on the observed data only.</p>
<p>It does not directly measure:</p>
<ul>
<li>predictive performance on unseen data,</li>
<li>robustness,</li>
<li>causal validity, or</li>
<li>generalization ability.</li>
</ul>
<p>As model complexity increases, <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> almost always increases as well. A sufficiently flexible model can often achieve values very close to 1 even when its predictions on new data are poor.</p>
<p>For this reason, practical data science relies on additional evaluation methods such as:</p>
<ul>
<li>train-test splits,</li>
<li>cross-validation,</li>
<li>regularization,</li>
<li>adjusted <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/>, and</li>
<li>out-of-sample testing.</li>
</ul>
<p>The goal is not to reproduce historical observations perfectly, but to construct models that remain useful when confronted with new data.</p>
<p>A high <img loading="lazy" decoding="async" src="https://i1.wp.com/blog.ephorie.de/wp-content/ql-cache/quicklatex.com-c7d6931063ed333ca39b952ccfd482b8_l3.png?resize=21%2C15&#038;ssl=1" class="ql-img-inline-formula quicklatex-auto-format" alt="R^2" title="Rendered by QuickLaTeX.com" height="15" width="21" style="vertical-align: 0px;" data-recalc-dims="1"/> can therefore mean two very different things:</p>
<ul>
<li>the model has identified a genuine structure,</li>
<li>or the model has merely adapted itself too closely to the training data.</li>
</ul>
<p>Distinguishing between these possibilities is one of the central challenges of machine learning and statistical modelling.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.ephorie.de/learning-data-science-why-a-high-r2-can-be-misleading?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=learning-data-science-why-a-high-r2-can-be-misleading"> R-Bloggers – Learning Machines</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/learning-data-science-why-a-high-r2-can-be-misleading/">Learning Data Science: Why a High R^2 Can Be Misleading</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401170</post-id>	</item>
		<item>
		<title>How to Build an Expected Goals (xG) Model in R with worldfootballR</title>
		<link>https://www.r-bloggers.com/2026/05/how-to-build-an-expected-goals-xg-model-in-r-with-worldfootballr/</link>
		
		<dc:creator><![CDATA[rprogrammingbooks]]></dc:creator>
		<pubDate>Sat, 09 May 2026 23:13:30 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://rprogrammingbooks.com/?p=2554</guid>

					<description><![CDATA[<p>Expected goals has become one of the most important concepts in modern football analytics. Instead of judging a team only by goals scored, xG helps us estimate the quality of the chances created. In this tutorial, we will build a practical expected goals model in R using football data, feature ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/how-to-build-an-expected-goals-xg-model-in-r-with-worldfootballr/">How to Build an Expected Goals (xG) Model in R with worldfootballR</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://rprogrammingbooks.com/expected-goals-model-r-worldfootballr/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=expected-goals-model-r-worldfootballr"> Blog - R Programming Books</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p><strong>Expected goals</strong> has become one of the most important concepts in modern football analytics. Instead of judging a team only by goals scored, xG helps us estimate the quality of the chances created. In this tutorial, we will build a practical expected goals model in R using football data, feature engineering, logistic regression, model evaluation, and visualization.</p>

<p>This is a hands-on guide for analysts who want to move beyond simple football statistics and start building reproducible soccer analytics workflows in R.</p>

<h2>What Is Expected Goals?</h2>

<p>Expected goals, usually written as xG, measures the probability that a shot becomes a goal. A shot from two meters in front of goal will usually have a high xG value, while a long-range shot from outside the box will usually have a low xG value.</p>

<p>An xG model can use variables such as:</p>

<ul>
  <li>Shot distance</li>
  <li>Shot angle</li>
  <li>Body part used</li>
  <li>Game state</li>
  <li>Minute of the match</li>
  <li>Shot type</li>
  <li>Set-piece situation</li>
  <li>Home or away context</li>
</ul>

<p>In this post, we will build a clean starter model using R. You can later extend it with richer event data, tracking data, or more advanced machine learning models.</p>

<h2>Install and Load R Packages</h2>

<pre># Core data science packages
install.packages(c(
  &quot;tidyverse&quot;,
  &quot;ggplot2&quot;,
  &quot;dplyr&quot;,
  &quot;readr&quot;,
  &quot;janitor&quot;,
  &quot;broom&quot;,
  &quot;yardstick&quot;,
  &quot;rsample&quot;,
  &quot;pROC&quot;,
  &quot;patchwork&quot;
))

# Football data package
install.packages(&quot;worldfootballR&quot;)

library(tidyverse)
library(ggplot2)
library(dplyr)
library(readr)
library(janitor)
library(broom)
library(yardstick)
library(rsample)
library(pROC)
library(patchwork)
library(worldfootballR)</pre>

<h2>Create a Simple Shot Dataset</h2>

<p>Different public football data sources structure shot data differently. To make this tutorial reproducible, we will first create a synthetic shot dataset that behaves like real football event data. Later, you can replace this with your own data from FBref, StatsBomb open data, Wyscout-style exports, or custom event feeds.</p>

<pre>set.seed(123)

n_shots &lt;- 5000

shots &lt;- tibble(
  shot_id = 1:n_shots,
  player = sample(
    c(&quot;Player A&quot;, &quot;Player B&quot;, &quot;Player C&quot;, &quot;Player D&quot;, &quot;Player E&quot;),
    n_shots,
    replace = TRUE
  ),
  team = sample(
    c(&quot;Team Red&quot;, &quot;Team Blue&quot;, &quot;Team Green&quot;, &quot;Team White&quot;),
    n_shots,
    replace = TRUE
  ),
  minute = sample(1:95, n_shots, replace = TRUE),
  x_location = runif(n_shots, min = 70, max = 120),
  y_location = runif(n_shots, min = 0, max = 80),
  body_part = sample(
    c(&quot;Right Foot&quot;, &quot;Left Foot&quot;, &quot;Header&quot;, &quot;Other&quot;),
    n_shots,
    replace = TRUE,
    prob = c(0.43, 0.32, 0.20, 0.05)
  ),
  situation = sample(
    c(&quot;Open Play&quot;, &quot;Corner&quot;, &quot;Free Kick&quot;, &quot;Penalty&quot;, &quot;Counter Attack&quot;),
    n_shots,
    replace = TRUE,
    prob = c(0.68, 0.12, 0.08, 0.03, 0.09)
  ),
  home_away = sample(c(&quot;Home&quot;, &quot;Away&quot;), n_shots, replace = TRUE)
)

glimpse(shots)</pre>

<h2>Engineer Shot Distance and Angle</h2>

<p>Distance and angle are two of the most important features in a basic xG model. We will assume the goal is centered at x = 120 and y = 40.</p>

<pre>goal_x &lt;- 120
goal_y &lt;- 40

shots &lt;- shots %&gt;%
  mutate(
    distance_to_goal = sqrt(
      (goal_x - x_location)^2 + (goal_y - y_location)^2
    ),
    angle_to_goal = atan2(
      abs(goal_y - y_location),
      goal_x - x_location
    ),
    angle_degrees = angle_to_goal * 180 / pi
  )

shots %&gt;%
  select(shot_id, x_location, y_location, distance_to_goal, angle_degrees) %&gt;%
  head()</pre>

<h2>Create a Goal Outcome</h2>

<p>For demonstration, we will simulate goals using realistic football logic. Shots closer to goal should be more likely to become goals. Penalties should have higher probability. Headers and long-range attempts should usually be harder.</p>

<pre>shots &lt;- shots %&gt;%
  mutate(
    linear_probability =
      -2.8 -
      0.08 * distance_to_goal +
      0.025 * angle_degrees +
      if_else(body_part == &quot;Header&quot;, -0.35, 0) +
      if_else(body_part == &quot;Other&quot;, -0.60, 0) +
      if_else(situation == &quot;Penalty&quot;, 3.00, 0) +
      if_else(situation == &quot;Counter Attack&quot;, 0.35, 0) +
      if_else(situation == &quot;Free Kick&quot;, -0.45, 0),
    
    goal_probability = plogis(linear_probability),
    goal = rbinom(n(), size = 1, prob = goal_probability)
  )

shots %&gt;%
  summarise(
    total_shots = n(),
    total_goals = sum(goal),
    conversion_rate = mean(goal)
  )</pre>

<h2>Explore the Shot Data</h2>

<pre>shots %&gt;%
  count(body_part, goal) %&gt;%
  group_by(body_part) %&gt;%
  mutate(rate = n / sum(n))
shots %&gt;%
  group_by(situation) %&gt;%
  summarise(
    shots = n(),
    goals = sum(goal),
    conversion_rate = mean(goal),
    avg_distance = mean(distance_to_goal),
    .groups = &quot;drop&quot;
  ) %&gt;%
  arrange(desc(conversion_rate))</pre>

<h2>Visualize Shot Locations</h2>

<pre>ggplot(shots, aes(x = x_location, y = y_location, color = factor(goal))) +
  geom_point(alpha = 0.35) +
  coord_fixed() +
  labs(
    title = &quot;Shot Map&quot;,
    x = &quot;Pitch Length&quot;,
    y = &quot;Pitch Width&quot;,
    color = &quot;Goal&quot;
  ) +
  theme_minimal()</pre>

<h2>Split Data into Training and Testing Sets</h2>

<pre>set.seed(123)

shot_split &lt;- initial_split(shots, prop = 0.80, strata = goal)

train_data &lt;- training(shot_split)
test_data  &lt;- testing(shot_split)

nrow(train_data)
nrow(test_data)</pre>

<h2>Build a Logistic Regression xG Model</h2>

<p>Expected goals is naturally suited to logistic regression because the outcome is binary: goal or no goal.</p>

<pre>xg_model &lt;- glm(
  goal ~ distance_to_goal +
    angle_degrees +
    body_part +
    situation +
    home_away +
    minute,
  data = train_data,
  family = binomial()
)

summary(xg_model)</pre>

<h2>Convert Model Output into xG Values</h2>

<pre>test_predictions &lt;- test_data %&gt;%
  mutate(
    xg = predict(xg_model, newdata = test_data, type = &quot;response&quot;)
  )

test_predictions %&gt;%
  select(player, team, goal, xg, distance_to_goal, angle_degrees) %&gt;%
  head(10)</pre>

<h2>Evaluate the xG Model</h2>

<p>A good xG model should not only predict goals, but also produce well-calibrated probabilities. If 100 shots each have an xG of 0.10, we would expect roughly 10 goals over a large enough sample.</p>

<pre>test_predictions %&gt;%
  summarise(
    actual_goals = sum(goal),
    expected_goals = sum(xg),
    avg_xg = mean(xg),
    actual_conversion = mean(goal)
  )</pre>

<h3>ROC AUC</h3>

<pre>roc_obj &lt;- roc(
  response = test_predictions$goal,
  predictor = test_predictions$xg
)

auc(roc_obj)
plot(
  roc_obj,
  main = &quot;ROC Curve for xG Model&quot;
)</pre>

<h3>Brier Score</h3>

<pre>brier_score &lt;- mean((test_predictions$xg - test_predictions$goal)^2)

brier_score</pre>

<h2>Create xG Buckets for Calibration</h2>

<pre>calibration_table &lt;- test_predictions %&gt;%
  mutate(
    xg_bucket = cut(
      xg,
      breaks = seq(0, 1, by = 0.05),
      include.lowest = TRUE
    )
  ) %&gt;%
  group_by(xg_bucket) %&gt;%
  summarise(
    shots = n(),
    avg_xg = mean(xg),
    actual_goal_rate = mean(goal),
    goals = sum(goal),
    .groups = &quot;drop&quot;
  ) %&gt;%
  filter(shots &gt;= 10)

calibration_table
ggplot(calibration_table, aes(x = avg_xg, y = actual_goal_rate)) +
  geom_point(size = 3) +
  geom_abline(intercept = 0, slope = 1, linetype = &quot;dashed&quot;) +
  labs(
    title = &quot;xG Model Calibration&quot;,
    x = &quot;Average Predicted xG&quot;,
    y = &quot;Actual Goal Rate&quot;
  ) +
  theme_minimal()</pre>

<h2>Player-Level xG Analysis</h2>

<p>Once every shot has an xG value, we can aggregate by player. This allows us to compare goals, expected goals, overperformance, and shot volume.</p>

<pre>player_xg &lt;- test_predictions %&gt;%
  group_by(player) %&gt;%
  summarise(
    shots = n(),
    goals = sum(goal),
    xg = sum(xg),
    goals_minus_xg = goals - xg,
    xg_per_shot = mean(xg),
    conversion_rate = mean(goal),
    .groups = &quot;drop&quot;
  ) %&gt;%
  arrange(desc(xg))

player_xg
ggplot(player_xg, aes(x = reorder(player, xg), y = xg)) +
  geom_col() +
  coord_flip() +
  labs(
    title = &quot;Expected Goals by Player&quot;,
    x = &quot;Player&quot;,
    y = &quot;Total xG&quot;
  ) +
  theme_minimal()</pre>

<h2>Team-Level xG Analysis</h2>

<pre>team_xg &lt;- test_predictions %&gt;%
  group_by(team) %&gt;%
  summarise(
    shots = n(),
    goals = sum(goal),
    xg = sum(xg),
    goals_minus_xg = goals - xg,
    avg_xg_per_shot = mean(xg),
    .groups = &quot;drop&quot;
  ) %&gt;%
  arrange(desc(xg))

team_xg
ggplot(team_xg, aes(x = reorder(team, goals_minus_xg), y = goals_minus_xg)) +
  geom_col() +
  coord_flip() +
  labs(
    title = &quot;Goals Minus xG by Team&quot;,
    x = &quot;Team&quot;,
    y = &quot;Goals - Expected Goals&quot;
  ) +
  theme_minimal()</pre>

<h2>Shot Quality Distribution</h2>

<pre>ggplot(test_predictions, aes(x = xg)) +
  geom_histogram(bins = 40) +
  labs(
    title = &quot;Distribution of Shot Quality&quot;,
    x = &quot;Expected Goals&quot;,
    y = &quot;Number of Shots&quot;
  ) +
  theme_minimal()</pre>

<h2>Compare Goals and xG by Situation</h2>

<pre>situation_xg &lt;- test_predictions %&gt;%
  group_by(situation) %&gt;%
  summarise(
    shots = n(),
    goals = sum(goal),
    xg = sum(xg),
    avg_xg = mean(xg),
    .groups = &quot;drop&quot;
  ) %&gt;%
  arrange(desc(avg_xg))

situation_xg
situation_long &lt;- situation_xg %&gt;%
  select(situation, goals, xg) %&gt;%
  pivot_longer(
    cols = c(goals, xg),
    names_to = &quot;metric&quot;,
    values_to = &quot;value&quot;
  )

ggplot(situation_long, aes(x = reorder(situation, value), y = value, fill = metric)) +
  geom_col(position = &quot;dodge&quot;) +
  coord_flip() +
  labs(
    title = &quot;Goals vs Expected Goals by Situation&quot;,
    x = &quot;Situation&quot;,
    y = &quot;Value&quot;,
    fill = &quot;Metric&quot;
  ) +
  theme_minimal()</pre>

<h2>Build a More Advanced xG Model with Interactions</h2>

<p>A simple model is useful, but football is full of interactions. For example, distance may affect headers differently than footed shots. We can include interaction terms in the model.</p>

<pre>xg_model_interaction &lt;- glm(
  goal ~ distance_to_goal * body_part +
    angle_degrees +
    situation +
    home_away +
    minute,
  data = train_data,
  family = binomial()
)

summary(xg_model_interaction)
test_predictions_interaction &lt;- test_data %&gt;%
  mutate(
    xg_interaction = predict(
      xg_model_interaction,
      newdata = test_data,
      type = &quot;response&quot;
    )
  )

mean((test_predictions_interaction$xg_interaction - test_predictions_interaction$goal)^2)</pre>

<h2>Compare Two xG Models</h2>

<pre>model_comparison &lt;- tibble(
  model = c(&quot;Basic Logistic Regression&quot;, &quot;Interaction Logistic Regression&quot;),
  brier_score = c(
    mean((test_predictions$xg - test_predictions$goal)^2),
    mean((test_predictions_interaction$xg_interaction - test_predictions_interaction$goal)^2)
  ),
  total_predicted_goals = c(
    sum(test_predictions$xg),
    sum(test_predictions_interaction$xg_interaction)
  ),
  actual_goals = c(
    sum(test_predictions$goal),
    sum(test_predictions_interaction$goal)
  )
)

model_comparison</pre>

<h2>Create a Reusable xG Prediction Function</h2>

<pre>predict_xg &lt;- function(model, new_shots) {
  new_shots %&gt;%
    mutate(
      predicted_xg = predict(
        model,
        newdata = new_shots,
        type = &quot;response&quot;
      )
    )
}

new_predictions &lt;- predict_xg(xg_model, test_data)

head(new_predictions)</pre>

<h2>Create a Custom Shot Example</h2>

<pre>custom_shot &lt;- tibble(
  distance_to_goal = 12,
  angle_degrees = 28,
  body_part = &quot;Right Foot&quot;,
  situation = &quot;Open Play&quot;,
  home_away = &quot;Home&quot;,
  minute = 62
)

predict(
  xg_model,
  newdata = custom_shot,
  type = &quot;response&quot;
)</pre>

<h2>Use worldfootballR for Real Football Workflows</h2>

<p>For real projects, you can use packages such as <code>worldfootballR</code> to collect football data from public sources and build reproducible analysis pipelines. The exact available columns depend on the source and endpoint, so always inspect your data before modeling.</p>

<pre>library(worldfootballR)
library(tidyverse)

# Example: get FBref match results
# Adjust country, gender, season_end_year, and tier depending on your project

premier_league_results &lt;- fb_match_results(
  country = &quot;ENG&quot;,
  gender = &quot;M&quot;,
  season_end_year = 2025,
  tier = &quot;1st&quot;
)

glimpse(premier_league_results)
premier_league_results %&gt;%
  clean_names() %&gt;%
  head()</pre>

<p>If you are building a full football analytics pipeline with FBref, Transfermarkt, and Understat-style workflows, a more structured project template can save a lot of time. I cover that type of end-to-end workflow in <a href="https://rprogrammingbooks.com/product/mastering-football-data-worldfootballr/" rel="nofollow" target="_blank">Mastering Football Data with worldfootballR</a>, especially for readers who want reusable R scripts, clean folders, and practical football data examples.</p>

<h2>Example: Clean Match Results Data</h2>

<pre>clean_results &lt;- premier_league_results %&gt;%
  clean_names()

clean_results %&gt;%
  glimpse()
# Example structure will depend on the returned data
# Always check column names first

names(clean_results)</pre>

<h2>Build a Match-Level Team Summary</h2>

<pre># This is an example pattern.
# You may need to adjust column names depending on your data source.

team_summary_example &lt;- clean_results %&gt;%
  summarise(
    matches = n()
  )

team_summary_example</pre>

<h2>Save Your xG Model</h2>

<p>Once you have trained a model, save it so you can reuse it later in reports, dashboards, APIs, or automated pipelines.</p>

<pre>saveRDS(xg_model, &quot;xg_model_logistic_regression.rds&quot;)

loaded_xg_model &lt;- readRDS(&quot;xg_model_logistic_regression.rds&quot;)

predict(
  loaded_xg_model,
  newdata = custom_shot,
  type = &quot;response&quot;
)</pre>

<h2>Create an xG Report Table</h2>

<pre>xg_report &lt;- test_predictions %&gt;%
  group_by(team, player) %&gt;%
  summarise(
    shots = n(),
    goals = sum(goal),
    xg = round(sum(xg), 2),
    goals_minus_xg = round(goals - sum(xg), 2),
    xg_per_shot = round(mean(xg), 3),
    .groups = &quot;drop&quot;
  ) %&gt;%
  arrange(desc(xg))

xg_report
write_csv(xg_report, &quot;xg_player_report.csv&quot;)</pre>

<h2>Create an xG Shot Map</h2>

<pre>ggplot(test_predictions, aes(x = x_location, y = y_location)) +
  geom_point(aes(size = xg, alpha = xg)) +
  coord_fixed() +
  labs(
    title = &quot;xG Shot Map&quot;,
    x = &quot;Pitch Length&quot;,
    y = &quot;Pitch Width&quot;,
    size = &quot;xG&quot;,
    alpha = &quot;xG&quot;
  ) +
  theme_minimal()</pre>

<h2>Create a High-Value Chances Table</h2>

<pre>big_chances &lt;- test_predictions %&gt;%
  filter(xg &gt;= 0.30) %&gt;%
  arrange(desc(xg)) %&gt;%
  select(
    player,
    team,
    minute,
    body_part,
    situation,
    distance_to_goal,
    angle_degrees,
    xg,
    goal
  )

big_chances %&gt;%
  head(20)</pre>

<h2>Model Improvement Ideas</h2>

<p>This starter xG model can be improved in many ways. A professional football analytics workflow may include:</p>

<ul>
  <li>More accurate shot coordinates</li>
  <li>Goalkeeper position</li>
  <li>Defender pressure</li>
  <li>Pass type before the shot</li>
  <li>Through balls and cutbacks</li>
  <li>Shot speed</li>
  <li>First-time shots</li>
  <li>Game state</li>
  <li>Team strength</li>
  <li>Player finishing history</li>
</ul>

<h2>Train an XGBoost-Style Model Later</h2>

<p>Logistic regression is interpretable and a good starting point. For higher predictive performance, you can later compare it with random forests, gradient boosting, or Bayesian models.</p>

<pre># Example packages for future model upgrades
# install.packages(c(&quot;xgboost&quot;, &quot;ranger&quot;, &quot;tidymodels&quot;))

library(tidymodels)

# A future tidymodels workflow could look like this:

xg_recipe &lt;- recipe(
  goal ~ distance_to_goal + angle_degrees + body_part + situation + home_away + minute,
  data = train_data
) %&gt;%
  step_dummy(all_nominal_predictors()) %&gt;%
  step_normalize(all_numeric_predictors())

xg_recipe</pre>

<h2>Build a Tidymodels Logistic Regression Workflow</h2>

<pre>logistic_spec &lt;- logistic_reg() %&gt;%
  set_engine(&quot;glm&quot;) %&gt;%
  set_mode(&quot;classification&quot;)

xg_workflow &lt;- workflow() %&gt;%
  add_recipe(xg_recipe) %&gt;%
  add_model(logistic_spec)

xg_fit &lt;- fit(
  xg_workflow,
  data = train_data %&gt;%
    mutate(goal = factor(goal, levels = c(0, 1)))
)

xg_fit
tidy(xg_fit)</pre>

<h2>Predict Probabilities with Tidymodels</h2>

<pre>tidy_predictions &lt;- predict(
  xg_fit,
  new_data = test_data,
  type = &quot;prob&quot;
) %&gt;%
  bind_cols(test_data %&gt;% mutate(goal = factor(goal, levels = c(0, 1))))

head(tidy_predictions)
tidy_predictions %&gt;%
  roc_auc(
    truth = goal,
    .pred_1
  )</pre>

<h2>Turn xG into Match Insights</h2>

<p>The real value of expected goals is not just predicting whether one shot becomes a goal. The value comes from aggregation. Once every shot has a probability, you can create match-level and season-level insights.</p>

<pre>match_shots &lt;- test_predictions %&gt;%
  mutate(
    match_id = sample(1:100, n(), replace = TRUE)
  )

match_xg &lt;- match_shots %&gt;%
  group_by(match_id, team) %&gt;%
  summarise(
    shots = n(),
    goals = sum(goal),
    xg = sum(xg),
    .groups = &quot;drop&quot;
  )

match_xg %&gt;%
  arrange(match_id, desc(xg)) %&gt;%
  head(20)</pre>

<h2>Find Teams Creating Better Chances</h2>

<pre>team_chance_quality &lt;- test_predictions %&gt;%
  group_by(team) %&gt;%
  summarise(
    shots = n(),
    total_xg = sum(xg),
    avg_xg_per_shot = mean(xg),
    big_chances = sum(xg &gt;= 0.30),
    low_quality_shots = sum(xg &lt;= 0.05),
    .groups = &quot;drop&quot;
  ) %&gt;%
  arrange(desc(avg_xg_per_shot))

team_chance_quality</pre>

<h2>Final Thoughts</h2>

<p>Building an expected goals model in R is one of the best ways to learn football analytics because it combines data cleaning, feature engineering, statistical modeling, visualization, and interpretation. A simple logistic regression model can already teach you a lot about shot quality, player performance, and team attacking style.</p>

<p>From here, the next steps are clear: use richer football data, improve your features, compare different models, evaluate calibration, and build repeatable workflows that can be updated every week during the season.</p>

<p>Expected goals is not the final answer to football analysis, but it is one of the best starting points for serious soccer data science in R.</p>
<p>The post <a href="https://rprogrammingbooks.com/expected-goals-model-r-worldfootballr/" rel="nofollow" target="_blank">How to Build an Expected Goals (xG) Model in R with worldfootballR</a> appeared first on <a href="https://rprogrammingbooks.com/" rel="nofollow" target="_blank">R Programming Books</a>.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rprogrammingbooks.com/expected-goals-model-r-worldfootballr/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=expected-goals-model-r-worldfootballr"> Blog - R Programming Books</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/how-to-build-an-expected-goals-xg-model-in-r-with-worldfootballr/">How to Build an Expected Goals (xG) Model in R with worldfootballR</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401149</post-id>	</item>
		<item>
		<title>One interface, (Almost) Every Classifier (and Regressor): unifiedml v0.3.0</title>
		<link>https://www.r-bloggers.com/2026/05/one-interface-almost-every-classifier-and-regressor-unifiedml-v0-3-0/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/05/09/r/New-UnifiedML</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> News from R package unifiedml, that offers a unified interface to R machine learning models</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/one-interface-almost-every-classifier-and-regressor-unifiedml-v0-3-0/">One interface, (Almost) Every Classifier (and Regressor): unifiedml v0.3.0</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/09/r/New-UnifiedML"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>In the new version of <a href="https://cran.r-project.org/web/packages/unifiedml/index.html" rel="nofollow" target="_blank"><code>unifiedml</code></a> available on CRAN, you can benchmark different models using k-fold cross-validation (section 1 of this blog post), and there’s a unified interface for predicting model probabilities (section 2 of this blog post).</p>

<pre>install.packages(&quot;unifiedml&quot;)

install.packages(c(&quot;e1071&quot;, &quot;randomForest&quot;, &quot;caret&quot;))

install.packages(&quot;glmnet&quot;)

library(unifiedml)
</pre>

<h1 id="1---benchmarking-models">1 &#8211; Benchmarking models</h1>

<pre>set.seed(123)

X &lt;- iris[, 1:4]
y &lt;- iris$Species

models &lt;- list( # `Model` is exported from package 'unifiedml'
  glm  = Model$new(caret::train), # caret can be used (see https://topepo.github.io/caret/available-models.html)
  rf   = Model$new(randomForest::randomForest), # or a native pkg
  svm  = Model$new(e1071::svm) # or another pkg
)

params &lt;- list(
  glm = list(method = &quot;glmnet&quot;,
             tuneGrid = data.frame(alpha = 0, lambda = 0.01), # for caret model, all hyperparameters must be provided
             trControl = trainControl(method = &quot;none&quot;)),
  rf  = list(ntree = 150), # Not necessarily needing to specify all hyperparameters
  svm = list(kernel = &quot;radial&quot;,
             cost = 1,
             gamma = 0.1)
)

results &lt;- unifiedml::benchmark(models, X, y, cv = 5, params = params)

[1/3] Fitting model: glm
Mean CV score for glm: 0.9533

[2/3] Fitting model: rf
Mean CV score for rf: 0.9600

[3/3] Fitting model: svm
Mean CV score for svm: 0.9733

print(results) # 5-fold cross-validation results

$glm
$glm$avg_score
[1] 0.9533333

$glm$scores
    fold1     fold2     fold3     fold4     fold5 
0.9333333 0.9666667 0.9333333 0.9333333 1.0000000 


$rf
$rf$avg_score
[1] 0.96

$rf$scores
    fold1     fold2     fold3     fold4     fold5 
0.9333333 1.0000000 0.9333333 0.9333333 1.0000000 


$svm
$svm$avg_score
[1] 0.9733333

$svm$scores
    fold1     fold2     fold3     fold4     fold5 
0.9666667 1.0000000 0.9666667 0.9333333 1.0000000 

# initialize empty vectors
model_vec &lt;- c()
fold_vec  &lt;- c()
score_vec &lt;- c()

for (model in names(results)) {
  scores &lt;- results[[model]]$scores

  model_vec &lt;- c(model_vec, rep(model, length(scores)))
  fold_vec  &lt;- c(fold_vec, names(scores))
  score_vec &lt;- c(score_vec, as.numeric(scores))
}

df &lt;- data.frame(
  model = model_vec,
  fold  = fold_vec,
  score = score_vec
)

library(ggplot2)

ggplot(df, aes(x = model, y = score, fill = model)) +
  geom_violin(trim = FALSE, alpha = 0.6) +
  geom_jitter(width = 0.08, size = 2) +
  theme_minimal() +
  labs(
    title = &quot;Cross-validation score distribution&quot;,
    x = &quot;Model&quot;,
    y = &quot;Score&quot;
  ) +
  theme(legend.position = &quot;none&quot;)
</pre>

<p><img src="https://i0.wp.com/thierrymoudiki.github.io/images/2026-05-09/2026-05-09-New-UnifiedML_9_0.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>

<h1 id="2---unified-interface-for-predicting-probabilities">2 - Unified interface for predicting probabilities</h1>

<pre># Load required packages
library(unifiedml)
library(randomForest)
library(nnet)
library(e1071)

# Load iris dataset
data(iris)

# Setup reproducible data
set.seed(42)

# Create feature matrix (all 4 numeric features)
X &lt;- as.matrix(iris[, 1:4])
colnames(X) &lt;- c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;, &quot;Petal.Length&quot;, &quot;Petal.Width&quot;)

# Target: Species (multi-class with 3 levels)
y_multiclass &lt;- iris$Species

# Create binary classification target (Versicolor vs others)
y_binary &lt;- factor(
  ifelse(iris$Species == &quot;versicolor&quot;, &quot;versicolor&quot;, &quot;other&quot;),
  levels = c(&quot;other&quot;, &quot;versicolor&quot;)
)

# Split into train/test (75% train, 25% test)
set.seed(42)
train_idx &lt;- sample(1:nrow(X), size = floor(0.75 * nrow(X)), replace = FALSE)
test_idx &lt;- setdiff(1:nrow(X), train_idx)

X_train &lt;- X[train_idx, ]
X_test &lt;- X[test_idx, ]
y_train_multiclass &lt;- y_multiclass[train_idx]
y_test_multiclass &lt;- y_multiclass[test_idx]
y_train_binary &lt;- y_binary[train_idx]
y_test_binary &lt;- y_binary[test_idx]

cat(&quot;\n&quot;)
cat(&quot;============================================================================\n&quot;)
cat(&quot;IRIS DATASET - Summary\n&quot;)
cat(&quot;============================================================================\n&quot;)
cat(sprintf(&quot;Training samples: %d\n&quot;, nrow(X_train)))
cat(sprintf(&quot;Test samples: %d\n&quot;, nrow(X_test)))
cat(sprintf(&quot;Features: %d\n&quot;, ncol(X_train)))
cat(sprintf(&quot;Classes: %s\n&quot;, paste(levels(y_multiclass), collapse = &quot;, &quot;)))

# ============================================================================
# EXAMPLE 1: randomForest - Multi-class Classification on IRIS
# ============================================================================

cat(&quot;\n&quot;)
cat(&quot;============================================================================\n&quot;)
cat(&quot;EXAMPLE 1: randomForest - Multi-class Classification\n&quot;)
cat(&quot;============================================================================\n&quot;)

mod_rf &lt;- Model$new(randomForest::randomForest)
mod_rf$fit(X_train, y_train_multiclass, ntree = 100)

cat(&quot;\nPredicting probabilities for first 5 test samples:\n&quot;)
probs_rf &lt;- mod_rf$predict_proba(X_test[1:5, ])

cat(&quot;\nProbability matrix:\n&quot;)
print(round(probs_rf, 3))

cat(&quot;\nInterpretation:\n&quot;)
for(i in 1:5) {
  cat(sprintf(&quot;\nSample %d (Actual: %s):\n&quot;, i, as.character(y_test_multiclass[i])))
  cat(sprintf(&quot;  setosa:     %.1f%%\n&quot;, probs_rf[i, &quot;setosa&quot;] * 100))
  cat(sprintf(&quot;  versicolor: %.1f%%\n&quot;, probs_rf[i, &quot;versicolor&quot;] * 100))
  cat(sprintf(&quot;  virginica:  %.1f%%\n&quot;, probs_rf[i, &quot;virginica&quot;] * 100))
  cat(sprintf(&quot;  Predicted:  %s\n&quot;, colnames(probs_rf)[which.max(probs_rf[i, ])]))
}

# Get class predictions
pred_classes_rf &lt;- mod_rf$predict(X_test[1:5, ], type = &quot;class&quot;)
cat(&quot;\nPredicted classes (first 5):&quot;, as.character(pred_classes_rf), &quot;\n&quot;)
cat(&quot;Actual classes (first 5):   &quot;, as.character(y_test_multiclass[1:5]), &quot;\n&quot;)

# Calculate accuracy on full test set
probs_all_rf &lt;- mod_rf$predict_proba(X_test)
pred_all_rf &lt;- colnames(probs_all_rf)[apply(probs_all_rf, 1, which.max)]
accuracy_rf &lt;- mean(pred_all_rf == as.character(y_test_multiclass))
cat(sprintf(&quot;\nTest set accuracy: %.1f%%\n&quot;, accuracy_rf * 100))

# ============================================================================
# EXAMPLE 2: nnet - Multi-class Classification on IRIS
# ============================================================================

cat(&quot;\n&quot;)
cat(&quot;============================================================================\n&quot;)
cat(&quot;EXAMPLE 2: nnet - Multi-class Classification\n&quot;)
cat(&quot;============================================================================\n&quot;)

mod_nnet &lt;- Model$new(nnet::nnet)
mod_nnet$fit(X_train, y_train_multiclass, size = 10, maxit = 200, trace = FALSE)

cat(&quot;\nPredicting probabilities for first 5 test samples:\n&quot;)
probs_nnet &lt;- mod_nnet$predict_proba(X_test[1:5, ])

cat(&quot;\nProbability matrix (all 3 classes):\n&quot;)
print(round(probs_nnet, 3))

cat(&quot;\nDetailed predictions:\n&quot;)
for(i in 1:5) {
  cat(sprintf(&quot;\nSample %d (Actual: %s):\n&quot;, i, as.character(y_test_multiclass[i])))
  cat(sprintf(&quot;  setosa:     %.1f%%\n&quot;, probs_nnet[i, &quot;setosa&quot;] * 100))
  cat(sprintf(&quot;  versicolor: %.1f%%\n&quot;, probs_nnet[i, &quot;versicolor&quot;] * 100))
  cat(sprintf(&quot;  virginica:  %.1f%%\n&quot;, probs_nnet[i, &quot;virginica&quot;] * 100))
  cat(sprintf(&quot;  Predicted:  %s\n&quot;, colnames(probs_nnet)[which.max(probs_nnet[i, ])]))
}

# Get class predictions
pred_classes_nnet &lt;- mod_nnet$predict(X_test[1:5, ], type = &quot;class&quot;)
cat(&quot;\nPredicted classes (first 5):&quot;, as.character(pred_classes_nnet), &quot;\n&quot;)
cat(&quot;Actual classes (first 5):   &quot;, as.character(y_test_multiclass[1:5]), &quot;\n&quot;)

# Calculate accuracy
probs_all_nnet &lt;- mod_nnet$predict_proba(X_test)
pred_all_nnet &lt;- colnames(probs_all_nnet)[apply(probs_all_nnet, 1, which.max)]
accuracy_nnet &lt;- mean(pred_all_nnet == as.character(y_test_multiclass))
cat(sprintf(&quot;\nTest set accuracy: %.1f%%\n&quot;, accuracy_nnet * 100))

# ============================================================================
# EXAMPLE 3: SVM - Multi-class Classification on IRIS
# ============================================================================

cat(&quot;\n&quot;)
cat(&quot;============================================================================\n&quot;)
cat(&quot;EXAMPLE 3: SVM - Multi-class Classification\n&quot;)
cat(&quot;============================================================================\n&quot;)

mod_svm &lt;- Model$new(e1071::svm)
mod_svm$fit(X_train, y_train_multiclass, probability = TRUE, kernel = &quot;radial&quot;)

cat(&quot;\nPredicting probabilities for first 5 test samples:\n&quot;)
probs_svm &lt;- mod_svm$predict_proba(X_test[1:5, ])

cat(&quot;\nProbability matrix:\n&quot;)
print(round(probs_svm, 4))

cat(&quot;\nDetailed predictions:\n&quot;)
for(i in 1:5) {
  cat(sprintf(&quot;\nSample %d (Actual: %s):\n&quot;, i, as.character(y_test_multiclass[i])))
  cat(sprintf(&quot;  setosa:     %.1f%%\n&quot;, probs_svm[i, &quot;setosa&quot;] * 100))
  cat(sprintf(&quot;  versicolor: %.1f%%\n&quot;, probs_svm[i, &quot;versicolor&quot;] * 100))
  cat(sprintf(&quot;  virginica:  %.1f%%\n&quot;, probs_svm[i, &quot;virginica&quot;] * 100))
  cat(sprintf(&quot;  Predicted:  %s\n&quot;, colnames(probs_svm)[which.max(probs_svm[i, ])]))
}

# Calculate accuracy
probs_all_svm &lt;- mod_svm$predict_proba(X_test)
pred_all_svm &lt;- colnames(probs_all_svm)[apply(probs_all_svm, 1, which.max)]
accuracy_svm &lt;- mean(pred_all_svm == as.character(y_test_multiclass))
cat(sprintf(&quot;\nTest set accuracy: %.1f%%\n&quot;, accuracy_svm * 100))

============================================================================
IRIS DATASET - Summary
============================================================================
Training samples: 112
Test samples: 38
Features: 4
Classes: setosa, versicolor, virginica

============================================================================
EXAMPLE 1: randomForest - Multi-class Classification
============================================================================

Predicting probabilities for first 5 test samples:

Probability matrix:
  setosa versicolor virginica
1      1          0         0
2      1          0         0
3      1          0         0
4      1          0         0
5      1          0         0
attr(,&quot;assign&quot;)
[1] 1 1 1
attr(,&quot;contrasts&quot;)
attr(,&quot;contrasts&quot;)$pred
[1] &quot;contr.treatment&quot;

attr(,&quot;extraction_method&quot;)
[1] &quot;fallback::1&quot;
attr(,&quot;model_class&quot;)
[1] &quot;randomForest.formula&quot;

Interpretation:

Sample 1 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 2 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 3 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 4 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 5 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Predicted classes (first 5): setosa setosa setosa setosa setosa 
Actual classes (first 5):    setosa setosa setosa setosa setosa 

Test set accuracy: 94.7%

============================================================================
EXAMPLE 2: nnet - Multi-class Classification
============================================================================

Predicting probabilities for first 5 test samples:

Probability matrix (all 3 classes):
  setosa versicolor virginica
1      1          0         0
2      1          0         0
3      1          0         0
4      1          0         0
5      1          0         0
attr(,&quot;extraction_method&quot;)
[1] &quot;fallback::5&quot;
attr(,&quot;model_class&quot;)
[1] &quot;nnet.formula&quot;

Detailed predictions:

Sample 1 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 2 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 3 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 4 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 5 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Predicted classes (first 5): setosa setosa setosa setosa setosa 
Actual classes (first 5):    setosa setosa setosa setosa setosa 

Test set accuracy: 97.4%

============================================================================
EXAMPLE 3: SVM - Multi-class Classification
============================================================================

Predicting probabilities for first 5 test samples:

Probability matrix:
  setosa versicolor virginica
1      1          0         0
2      1          0         0
3      1          0         0
4      1          0         0
5      1          0         0
attr(,&quot;assign&quot;)
[1] 1 1 1
attr(,&quot;contrasts&quot;)
attr(,&quot;contrasts&quot;)$pred
[1] &quot;contr.treatment&quot;

attr(,&quot;extraction_method&quot;)
[1] &quot;fallback::1&quot;
attr(,&quot;model_class&quot;)
[1] &quot;svm.formula&quot;

Detailed predictions:

Sample 1 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 2 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 3 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 4 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Sample 5 (Actual: setosa):
  setosa:     100.0%
  versicolor: 0.0%
  virginica:  0.0%
  Predicted:  setosa

Test set accuracy: 94.7%
</pre>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/05/09/r/New-UnifiedML"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/one-interface-almost-every-classifier-and-regressor-unifiedml-v0-3-0/">One interface, (Almost) Every Classifier (and Regressor): unifiedml v0.3.0</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401135</post-id>	</item>
		<item>
		<title>Edge detection in Python</title>
		<link>https://www.r-bloggers.com/2026/05/edge-detection-in-python/</link>
		
		<dc:creator><![CDATA[Francisco de Abreu e Lima]]></dc:creator>
		<pubDate>Fri, 08 May 2026 19:56:22 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://poissonisfish.com/?p=10082</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Great strides in artificial intelligence development during the last five years produced agents that are now commonplace at work and home. It is humbling to note that virtually all frontier large language models today trace back to a preprint introducing the transformer neural network architecture – a fifteen-page paper that profoundly ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/edge-detection-in-python/">Edge detection in Python</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/"> poissonisfish</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<figure class="wp-block-image"><img data-attachment-id="10046" data-permalink="https://poissonisfish.com/?attachment_id=10046" data-orig-file="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/butterfly_canny.png?w=578&#038;ssl=1" data-orig-size="2756,1824" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="butterfly_canny" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/butterfly_canny.png?w=578&#038;ssl=1?w=1024" src="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/butterfly_canny.png?w=578&#038;ssl=1" alt="" class="wp-image-10046" data-recalc-dims="1" /><figcaption class="wp-element-caption">Edge detection is ubiquitous in animal vision and yet poorly understood. Canny edge detection on <em>Polygonia c-album</em> (Portugal, 2010)</figcaption></figure>



<p class="wp-block-paragraph">Great strides in artificial intelligence development during the last five years produced agents that are now commonplace at work and home. It is humbling to note that virtually all frontier large language models today trace back to a preprint introducing the transformer neural network architecture<sup data-fn="563b7add-8b04-4fd2-a688-2383895c42c9" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#563b7add-8b04-4fd2-a688-2383895c42c9" id="563b7add-8b04-4fd2-a688-2383895c42c9-link" rel="nofollow" target="_blank">1</a></sup> – a fifteen-page paper that profoundly rocked the world through waves of excitement and angst.</p>



<p class="wp-block-paragraph">This paradigm shift in model design has also heavily influenced computer vision, leading to a surge in vision-language models (VLMs). Not only can such systems easily generalize across tasks such as segmentation, depth estimation and image generation or editing<sup data-fn="bd3dda1f-27f7-44e9-8d0f-9a42c28ed201" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#bd3dda1f-27f7-44e9-8d0f-9a42c28ed201" id="bd3dda1f-27f7-44e9-8d0f-9a42c28ed201-link" rel="nofollow" target="_blank">2</a></sup>, they have also blown legacy models out of the water in object detection benchmarks, with little to no fine-tuning<sup data-fn="63b9bf52-e78a-405a-bc33-5082dc51f74e" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#63b9bf52-e78a-405a-bc33-5082dc51f74e" id="63b9bf52-e78a-405a-bc33-5082dc51f74e-link" rel="nofollow" target="_blank">3</a></sup>.</p>



<p class="wp-block-paragraph">However, it should not be lightly assumed that the transformer architecture is the only path forward to a more meaningful, cost-effective or even better-performing AI – not when we are still having <a href="https://techcrunch.com/2024/08/27/why-ai-cant-spell-strawberry/" rel="nofollow" target="_blank">trouble counting “r” in the word <em>strawberry</em></a>. Neuromorphic computation<sup data-fn="9740d4a1-35a7-4003-9e9f-63a4fa16b90b" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#9740d4a1-35a7-4003-9e9f-63a4fa16b90b" id="9740d4a1-35a7-4003-9e9f-63a4fa16b90b-link" rel="nofollow" target="_blank">4</a></sup>, photonic neural networks<sup data-fn="58f9bd6d-ec82-4b9b-8285-5f8b083184ad" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#58f9bd6d-ec82-4b9b-8285-5f8b083184ad" id="58f9bd6d-ec82-4b9b-8285-5f8b083184ad-link" rel="nofollow" target="_blank">5</a></sup>, JEPA<sup data-fn="7b848804-3070-4e9f-bc5f-ea3e60f0bf14" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#7b848804-3070-4e9f-bc5f-ea3e60f0bf14" id="7b848804-3070-4e9f-bc5f-ea3e60f0bf14-link" rel="nofollow" target="_blank">6</a></sup> and many other techniques have recently shown us different ways to design and implement intelligent systems that produce optimal solutions for a variety of problems.</p>



<p class="wp-block-paragraph">Today I want to focus on a topic from a timeless book that inspired me to think differently and, particularly, to effectively apply a first principles approach to problem-solving. The topic is edge detection, and that book is <em>Vision</em>, by David Marr<sup data-fn="d6a1d31f-a93f-4844-8b90-70b4da955016" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#d6a1d31f-a93f-4844-8b90-70b4da955016" id="d6a1d31f-a93f-4844-8b90-70b4da955016-link" rel="nofollow" target="_blank">7</a></sup>. Just as <em>On the Origin of Species</em><sup data-fn="210cf9a3-ea19-4402-b37e-c2535bd96366" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#210cf9a3-ea19-4402-b37e-c2535bd96366" id="210cf9a3-ea19-4402-b37e-c2535bd96366-link" rel="nofollow" target="_blank">8</a></sup> and <em>On Growth and Form</em><sup data-fn="8d6afae8-ee06-495e-9b78-9705d6088f63" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#8d6afae8-ee06-495e-9b78-9705d6088f63" id="8d6afae8-ee06-495e-9b78-9705d6088f63-link" rel="nofollow" target="_blank">9</a></sup>, this is yet another masterpiece that brought together different disciplines – in this case neurophysiology and computer vision – to revolutionise science.</p>



<p class="wp-block-paragraph">In this blog post we will define and compare algorithms for image edge detection, and explore their remarkable similarity with neurophysiological readings.</p>



<h1 class="wp-block-heading">Introduction</h1>



<p class="wp-block-paragraph">Modern computer vision is deeply rooted in Marr’s pioneering work. To understand any information-processing system, Marr argued, one must describe it at three interdependent levels of analysis:</p>



<ul class="wp-block-list">
<li>The computational level – <strong>what</strong> problem is being solved and <strong>why</strong> (e.g. edge detection)</li>



<li>The algorithmic level – <strong>how</strong> it is solved, and what representations and procedures are used (e.g. the Laplacian transform)</li>



<li>The implementational level – <strong>where</strong> it is physically realised (e.g. <em>in vivo</em>, <em>in silico</em>)</li>
</ul>



<p class="wp-block-paragraph">This layered thinking is what makes the book so enduring. Marr was not merely describing the visual system, he was arguing that to truly understand it you had to explain it at all three levels simultaneously. The book also features memorable passages on random dot stereograms<sup data-fn="ab32dfeb-f0f5-476f-9b2a-ad0939f8514f" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#ab32dfeb-f0f5-476f-9b2a-ad0939f8514f" id="ab32dfeb-f0f5-476f-9b2a-ad0939f8514f-link" rel="nofollow" target="_blank">10</a></sup>, binocular disparity and motion perception – overall, highly recommended for science enthusiasts.</p>



<p class="wp-block-paragraph">Let us now introduce the key concepts underlying edge detection that leveraged this structured approach, to gain a better understanding of how it can be solved in practice.</p>



<h2 class="wp-block-heading">Zero-crossing <img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/270f.png?w=578&#038;ssl=1" alt="&#x270f;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></h2>



<p class="wp-block-paragraph">From a computational perspective, edges are fundamentally spatial discontinuities in images. If for a brief moment we consider a simple greyscale image, edges are wherever dark-to-light and light-to-dark transitions occur, in whatever direction.</p>



<p class="wp-block-paragraph">Because such transitions mathematically translate to local changes in pixel intensity, the most natural approach to identify edges is to compute <strong>image gradients</strong>, the two-dimensional equivalent of the derivative. The first derivative of image intensity evaluated across an edge produces a peak (for a dark-to-bright transition) or a trough (for a bright-to-dark transition), depending on the direction. However, the <strong>second derivative</strong> provides not only the means to identify both transition types, but also a beautifully simple detection mechanism: it crosses zero at the precise location of the edge. This is the essence of the <strong>zero-crossing</strong>.</p>



<figure class="wp-block-image size-full"><img data-attachment-id="9914" data-permalink="https://poissonisfish.com/?attachment_id=9914" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/03/step_edge_derivatives.png" data-orig-size="2139,606" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0"}" data-image-title="step_edge_derivatives" data-image-description="" data-image-caption="<p>First and second derivatives of a one-dimensional signal. Left: signal displaying low-to-high (dark-to-bright) transition. Middle: first derivative of the signal, capturing that transition as a peak. Right: second derivative of the signal, exhibiting the zero-crossing.</p>
&#8221; data-large-file=&#8221;https://poissonisfish.com/wp-content/uploads/2026/03/step_edge_derivatives.png?w=1024&#8243; src=&#8221;https://poissonisfish.com/wp-content/uploads/2026/03/step_edge_derivatives.png&#8221; alt=&#8221;&#8221; class=&#8221;wp-image-9914&#8243; /><figcaption class="wp-element-caption">First and second derivatives of a one-dimensional signal. Left: signal displaying low-to-high (dark-to-bright) transition. Middle: first derivative of the signal, capturing that transition as a peak. Right: second derivative of the signal, exhibiting the zero-crossing.</figcaption></figure>



<p class="wp-block-paragraph">Marr and Hildreth formalised this insight by proposing the <strong>Laplacian of Gaussian</strong> (LoG) as the operator of choice<sup data-fn="4b5583df-7b37-4973-9b87-cd2a13af4711" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#4b5583df-7b37-4973-9b87-cd2a13af4711" id="4b5583df-7b37-4973-9b87-cd2a13af4711-link" rel="nofollow" target="_blank">11</a></sup>. The Laplacian <img src="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\nabla^2 " class="latex" /> is the sum of second partial derivatives in both spatial dimensions:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+f+%3D+%5Cfrac%7B%5Cpartial%5E2+f%7D%7B%5Cpartial+x%5E2%7D+%2B+%5Cfrac%7B%5Cpartial%5E2+f%7D%7B%5Cpartial+y%5E2%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=2&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+f+%3D+%5Cfrac%7B%5Cpartial%5E2+f%7D%7B%5Cpartial+x%5E2%7D+%2B+%5Cfrac%7B%5Cpartial%5E2+f%7D%7B%5Cpartial+y%5E2%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+f+%3D+%5Cfrac%7B%5Cpartial%5E2+f%7D%7B%5Cpartial+x%5E2%7D+%2B+%5Cfrac%7B%5Cpartial%5E2+f%7D%7B%5Cpartial+y%5E2%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002&#038;zoom=4.5 4x" alt="\nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} " class="latex" /></p>



<p class="wp-block-paragraph">Applied directly to a noisy image, the Laplacian amplifies every small intensity fluctuation. The Gaussian pre-filter <img src="https://s0.wp.com/latex.php?latex=G_%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=G_%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=G_%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="G_\sigma " class="latex" /> solves this by smoothing the image at a chosen scale <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> before differentiation. Because convolution is associative, the two steps can be combined into a single kernel – the LoG, also denoted <img src="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+G+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+G+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+G+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\nabla^2 G " class="latex" />:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+G_%5Csigma%28x%2C+y%29+%3D+-%5Cfrac%7B1%7D%7B%5Cpi%5Csigma%5E4%7D%5Cleft%281+-+%5Cfrac%7Bx%5E2+%2B+y%5E2%7D%7B2%5Csigma%5E2%7D%5Cright%29e%5E%7B-%5Cfrac%7Bx%5E2%2By%5E2%7D%7B2%5Csigma%5E2%7D%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=2&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+G_%5Csigma%28x%2C+y%29+%3D+-%5Cfrac%7B1%7D%7B%5Cpi%5Csigma%5E4%7D%5Cleft%281+-+%5Cfrac%7Bx%5E2+%2B+y%5E2%7D%7B2%5Csigma%5E2%7D%5Cright%29e%5E%7B-%5Cfrac%7Bx%5E2%2By%5E2%7D%7B2%5Csigma%5E2%7D%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Cnabla%5E2+G_%5Csigma%28x%2C+y%29+%3D+-%5Cfrac%7B1%7D%7B%5Cpi%5Csigma%5E4%7D%5Cleft%281+-+%5Cfrac%7Bx%5E2+%2B+y%5E2%7D%7B2%5Csigma%5E2%7D%5Cright%29e%5E%7B-%5Cfrac%7Bx%5E2%2By%5E2%7D%7B2%5Csigma%5E2%7D%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002&#038;zoom=4.5 4x" alt="\nabla^2 G_\sigma(x, y) = -\frac{1}{\pi\sigma^4}\left(1 - \frac{x^2 + y^2}{2\sigma^2}\right)e^{-\frac{x^2+y^2}{2\sigma^2}} " class="latex" /></p>



<p class="wp-block-paragraph">This kernel, which resembles an inverted sombrero and is sometimes called the <strong>Mexican hat wavelet</strong>, produces a response that crosses zero exactly at an intensity edge. The width of the Gaussian <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> determines the scale of detection: small <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> preserves fine detail, large <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> captures only coarse structure. Marr argued that the visual system operates simultaneously at multiple scales – an idea that would later resonate in scale-space theory and, much later, in the multi-scale feature maps of deep convolutional networks.</p>


<div class="wp-block-image wp-image-10036 size-large">
<figure class="aligncenter size-full is-resized"><img data-attachment-id="10036" data-permalink="https://poissonisfish.com/?attachment_id=10036" data-orig-file="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/laplacian_of_gaussian.png?w=578&#038;ssl=1" data-orig-size="1179,980" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="laplacian_of_gaussian" data-image-description="" data-image-caption="" data-large-file="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/laplacian_of_gaussian.png?w=578&#038;ssl=1?w=1024" src="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/laplacian_of_gaussian.png?w=578&#038;ssl=1" alt="" class="wp-image-10036" style="aspect-ratio:1.2048252458447897;object-fit:cover;width:650px" data-recalc-dims="1" /><figcaption class="wp-element-caption">One-dimensional cross-section of the Laplacian of Gaussian. The characteristic positive central lobe flanked by two negative side lobes – the “Mexican hat” – is what produces a zero-crossing wherever image intensity changes sharply.</figcaption></figure>
</div>


<p class="wp-block-paragraph">Marr went further and showed that the centre-surround organisation of <strong>retinal ganglion cell</strong> receptive fields – which he modelled as a <strong>Difference of Gaussians</strong> (DoG), the difference between a narrow excitatory Gaussian and a broader inhibitory one – is a close biological approximation of the LoG. Put differently, your retina is already computing zero-crossings before the signal ever reaches the visual cortex. The agreement between computational predictions and <em>in vivo</em> electrophysiological measurements, documented in <em>Vision</em> (p. 64), remains one of the most compelling examples of theory meeting experiment in all of neuroscience.</p>


<div class="wp-block-image wp-image-10033 size-large">
<figure class="aligncenter"><img loading="lazy" data-attachment-id="10033" data-permalink="https://poissonisfish.com/?attachment_id=10033" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png" data-orig-size="3537,2418" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="zero_cross_deblurred" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=450&#038;ssl=1" src="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=450&#038;ssl=1" alt="" class="wp-image-10033" srcset_temp="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=2048 2048w, https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><figcaption class="wp-element-caption">Response of the LoG operator to three idealised stimuli: a step edge, a thin bar and a wide bar (columns). Top: input intensity profile. Middle: filter response. Bottom: histogram of intracellular recordings from cat retinal X-cells exposed to analogous stimuli, after Marr (1982). The agreement is striking and forms the basis for the claim that retinal ganglion cells implement a biological LoG.</figcaption></figure>
</div>


<p class="wp-block-paragraph">Zero-crossings are theoretically elegant, but the workhorse operators most computer-vision tools reach for – including, as we will see, the Canny detector itself – operate on first-derivative gradients. Let us look at those.</p>



<h2 class="wp-block-heading">Image gradients <img src="https://i1.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f4d0.png?w=578&#038;ssl=1" alt="&#x1f4d0;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></h2>



<p class="wp-block-paragraph">In practice, image gradients are computed using <strong>convolution filters</strong> – small kernels that slide across the image and produce a weighted local sum at each pixel, as illustrated in my <a href="https://poissonisfish.com/2018/07/08/convolutional-neural-networks-in-r/" rel="nofollow" target="_blank">post on convolutional neural networks</a>. The two most widely used first-order gradient operators are:</p>



<p class="wp-block-paragraph"><strong>Sobel:</strong> weights the central row and column more heavily, providing a modest degree of smoothing:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=G_x+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+0+%26+1+%5C%5C+-2+%26+0+%26+2+%5C%5C+-1+%26+0+%26+1+%5Cend%7Bbmatrix%7D%2C+%5Cquad+G_y+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+-2+%26+-1+%5C%5C+0+%26+0+%26+0+%5C%5C+1+%26+2+%26+1+%5Cend%7Bbmatrix%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=2&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=G_x+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+0+%26+1+%5C%5C+-2+%26+0+%26+2+%5C%5C+-1+%26+0+%26+1+%5Cend%7Bbmatrix%7D%2C+%5Cquad+G_y+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+-2+%26+-1+%5C%5C+0+%26+0+%26+0+%5C%5C+1+%26+2+%26+1+%5Cend%7Bbmatrix%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=G_x+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+0+%26+1+%5C%5C+-2+%26+0+%26+2+%5C%5C+-1+%26+0+%26+1+%5Cend%7Bbmatrix%7D%2C+%5Cquad+G_y+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+-2+%26+-1+%5C%5C+0+%26+0+%26+0+%5C%5C+1+%26+2+%26+1+%5Cend%7Bbmatrix%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002&#038;zoom=4.5 4x" alt="G_x = \begin{bmatrix} -1 &#038; 0 &#038; 1 \\ -2 &#038; 0 &#038; 2 \\ -1 &#038; 0 &#038; 1 \end{bmatrix}, \quad G_y = \begin{bmatrix} -1 &#038; -2 &#038; -1 \\ 0 &#038; 0 &#038; 0 \\ 1 &#038; 2 &#038; 1 \end{bmatrix} " class="latex" /></p>



<p class="wp-block-paragraph"><strong>Prewitt:</strong> weights all neighbours equally:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=G_x+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+0+%26+1+%5C%5C+-1+%26+0+%26+1+%5C%5C+-1+%26+0+%26+1+%5Cend%7Bbmatrix%7D%2C+%5Cquad+G_y+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+-1+%26+-1+%5C%5C+0+%26+0+%26+0+%5C%5C+1+%26+1+%26+1+%5Cend%7Bbmatrix%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=2&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=G_x+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+0+%26+1+%5C%5C+-1+%26+0+%26+1+%5C%5C+-1+%26+0+%26+1+%5Cend%7Bbmatrix%7D%2C+%5Cquad+G_y+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+-1+%26+-1+%5C%5C+0+%26+0+%26+0+%5C%5C+1+%26+1+%26+1+%5Cend%7Bbmatrix%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=G_x+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+0+%26+1+%5C%5C+-1+%26+0+%26+1+%5C%5C+-1+%26+0+%26+1+%5Cend%7Bbmatrix%7D%2C+%5Cquad+G_y+%3D+%5Cbegin%7Bbmatrix%7D+-1+%26+-1+%26+-1+%5C%5C+0+%26+0+%26+0+%5C%5C+1+%26+1+%26+1+%5Cend%7Bbmatrix%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=2&#038;c=20201002&#038;zoom=4.5 4x" alt="G_x = \begin{bmatrix} -1 &#038; 0 &#038; 1 \\ -1 &#038; 0 &#038; 1 \\ -1 &#038; 0 &#038; 1 \end{bmatrix}, \quad G_y = \begin{bmatrix} -1 &#038; -1 &#038; -1 \\ 0 &#038; 0 &#038; 0 \\ 1 &#038; 1 &#038; 1 \end{bmatrix} " class="latex" /></p>



<p class="wp-block-paragraph">In both cases the gradient magnitude at each pixel is <img src="https://s0.wp.com/latex.php?latex=%5C%7C%5Cnabla+f%5C%7C+%3D+%5Csqrt%7BG_x%5E2+%2B+G_y%5E2%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5C%7C%5Cnabla+f%5C%7C+%3D+%5Csqrt%7BG_x%5E2+%2B+G_y%5E2%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5C%7C%5Cnabla+f%5C%7C+%3D+%5Csqrt%7BG_x%5E2+%2B+G_y%5E2%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\|\nabla f\| = \sqrt{G_x^2 + G_y^2} " class="latex" />, and the gradient direction is <img src="https://s0.wp.com/latex.php?latex=%5Ctheta+%3D+%5Carctan%28G_y+%2F+G_x%29+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Ctheta+%3D+%5Carctan%28G_y+%2F+G_x%29+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Ctheta+%3D+%5Carctan%28G_y+%2F+G_x%29+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\theta = \arctan(G_y / G_x) " class="latex" />. Where the magnitude is large, a transition is occurring; where it is small, the neighbourhood is uniform.</p>



<figure class="wp-block-image size-full"><img data-attachment-id="10175" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/prewitt_sobel/" data-orig-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/prewitt_sobel.png?w=578&#038;ssl=1" data-orig-size="2800,1000" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="prewitt_sobel" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/prewitt_sobel.png?w=578&#038;ssl=1?w=1024" src="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/prewitt_sobel.png?w=578&#038;ssl=1" alt="" class="wp-image-10175" data-recalc-dims="1" /><figcaption class="wp-element-caption">Effect of Prewitt and Sobel operators on a natural image of a brick floor (gradient magnitude).</figcaption></figure>



<p class="wp-block-paragraph">The limitation of these operators is that they are sensitive to noise and produce thick, diffuse edges. Every pixel with a large gradient is flagged regardless of whether it truly lies on the edge or merely near it. This is precisely the problem that John Canny set out to solve.</p>



<h2 class="wp-block-heading">Canny edge detection <img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f50d.png?w=578&#038;ssl=1" alt="&#x1f50d;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></h2>



<p class="wp-block-paragraph">Published in 1986, John Canny’s paper <em>A Computational Approach to Edge Detection</em><sup data-fn="b51508a4-906a-492d-87d6-4a3a4971d9cc" class="fn"><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#b51508a4-906a-492d-87d6-4a3a4971d9cc" id="b51508a4-906a-492d-87d6-4a3a4971d9cc-link" rel="nofollow" target="_blank">12</a></sup> remains one of the most cited works in computer vision. Canny framed edge detection as an explicit optimisation problem and derived a detector that simultaneously maximises three criteria: <em>i</em>) good detection (few missed edges, few “false alarms”), <em>ii</em>) good localisation (detected edges close to true edges), and <em>iii</em>) single response (one response per edge, not many). The resulting algorithm is a four-step pipeline outlined below:</p>



<h3 class="wp-block-heading">Step 1 – Gaussian smoothing</h3>



<p class="wp-block-paragraph">As with the LoG, the first step is to suppress noise by convolving the image with a Gaussian kernel of width <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" />. The choice of <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> directly governs the trade-off between noise suppression and fine detail preservation. A larger <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> removes more noise but blurs genuine edges.</p>



<h3 class="wp-block-heading">Step 2 – Gradient computation</h3>



<p class="wp-block-paragraph">The smoothed image is then differentiated, typically using Sobel kernels – to obtain the gradient magnitude <img src="https://s0.wp.com/latex.php?latex=%5C%7C%5Cnabla+f%5C%7C+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5C%7C%5Cnabla+f%5C%7C+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5C%7C%5Cnabla+f%5C%7C+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\|\nabla f\| " class="latex" /> and direction <img src="https://s0.wp.com/latex.php?latex=%5Ctheta+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Ctheta+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Ctheta+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\theta " class="latex" /> at every pixel.</p>



<h3 class="wp-block-heading">Step 3 – Non-maximum suppression (NMS)</h3>



<p class="wp-block-paragraph">This step thins the edges. For each pixel, Canny checks whether its gradient magnitude is a local maximum along the gradient direction – that is, whether it is larger than its two neighbours in the direction <img src="https://s0.wp.com/latex.php?latex=%5Ctheta+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Ctheta+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Ctheta+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\theta " class="latex" />. If it is not, it is suppressed to zero. The result is a set of thin, one-pixel-wide candidate edges.</p>



<h3 class="wp-block-heading">Step 4 – Hysteresis thresholding</h3>



<p class="wp-block-paragraph">The final step uses <strong>two thresholds</strong>, <img src="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="T_\text{high} " class="latex" /> and <img src="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="T_\text{low} " class="latex" />, to prune candidate pixels. A pixel whose gradient magnitude exceeds <img src="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="T_\text{high} " class="latex" /> is accepted as an edge, and conversely a pixel below <img src="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="T_\text{low} " class="latex" /> is rejected. A pixel between the two thresholds is accepted only if it is connected, directly or through other such pixels, to a strong edge pixel. This connectivity analysis – the defining feature of hysteresis – ensures that long, continuous edges are preserved even when their local gradient fluctuates, while isolated noise responses are discarded. For a more visual understanding of NMS and hysteresis I recommend reading the <a href="https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html" rel="nofollow" target="_blank">Canny edge detection</a> documentation from OpenCV.</p>


<div class="wp-block-image wp-image-10050 size-large">
<figure class="aligncenter"><img loading="lazy" data-attachment-id="10050" data-permalink="https://poissonisfish.com/?attachment_id=10050" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png" data-orig-size="3004,1638" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="canny_workflow" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=450&#038;ssl=1" src="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=450&#038;ssl=1" alt="" class="wp-image-10050" srcset_temp="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=2048 2048w, https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><figcaption class="wp-element-caption">The Canny pipeline applied to a sample image. From left to right: original greyscale input, Gaussian-blurred image, Sobel gradient magnitude, output of NMS, and final edge map after hysteresis thresholding. Each step removes a specific failure mode of the previous one (AI-generated image)</figcaption></figure>
</div>


<p class="wp-block-paragraph">The elegance of Canny lies in how each step addresses a specific failure mode of earlier operators. In essence Gaussian smoothing tackles noise, NMS tackles thick edges and hysteresis tackles the false edge / broken edge trade-off that a single threshold cannot solve.</p>



<h1 class="wp-block-heading">Let’s get started with Python</h1>



<p class="wp-block-paragraph">Time to practice! We will first build the separate components (Gaussian blur, Sobel gradients, LoG zero-crossings), then run the full Canny pipeline and explore how its parameters trade off recall against noise. We will use <code>opencv</code> and <code>scikit-image</code> alongside the usual suspects <code>numpy</code> and <code>matplotlib</code>. You can install all packages using the following shell command:</p>



<p class="wp-block-paragraph"><code>pip install opencv-python scikit-image matplotlib numpy</code></p>



<h2 class="wp-block-heading">Image loading and preprocessing</h2>



<p class="wp-block-paragraph">We start by loading a greyscale image. For demonstration purposes I use a stock picture from <code>scikit-image</code> – feel free to use any other image of your choice.</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre>
import cv2import numpy as npimport matplotlib.pyplot as pltfrom skimage import datafrom scipy.ndimage import gaussian_laplace# Load a greyscale test image (uint8, values 0–255)image = data.camera() fig, ax = plt.subplots(figsize=(5, 5))ax.imshow(image, cmap=&apos;gray&apos;)ax.set_title(&apos;Original image&apos;)ax.axis(&apos;off&apos;)plt.tight_layout()plt.show()</pre>
		</div>
	</div>
</div>

<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img data-attachment-id="10134" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/camera_original/" data-orig-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_original-1.png?w=578&#038;ssl=1" data-orig-size="1000,1000" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="camera_original" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_original-1.png?w=578&#038;ssl=1?w=1000" src="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_original-1.png?w=578&#038;ssl=1" alt="" class="wp-image-10134" style="width:650px" data-recalc-dims="1" /></figure>
</div>


<h2 class="wp-block-heading">The Laplacian of Gaussian and zero-crossings</h2>



<p class="wp-block-paragraph">Let us inspect the LoG response and its zero-crossings – the theoretical backbone we discussed earlier. </p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre>
# LoG: positive sigma = apply Gaussian of that std, then Laplacianlog_response = gaussian_laplace(image.astype(float), sigma=2.0) # Zero-crossings: sign changes between neighbouring pixelsdef zero_crossings(log_img):    &quot;&quot;&quot;Return a binary mask of zero-crossing locations.&quot;&quot;&quot;    zc = np.zeros_like(log_img, dtype=bool)    # Check horizontal and vertical sign changes    for shift in [(0, 1), (1, 0)]:        shifted = np.roll(log_img, shift=shift, axis=(0, 1))        zc |= (np.sign(log_img) != np.sign(shifted))    return zc zc_mask = zero_crossings(log_response) fig, axes = plt.subplots(1, 2, figsize=(10, 4))axes[0].imshow(log_response, cmap=&apos;RdBu_r&apos;)axes[0].set_title(&apos;LoG response (σ=2.0)&apos;)axes[0].axis(&apos;off&apos;)axes[1].imshow(zc_mask, cmap=&apos;gray&apos;)axes[1].set_title(&apos;Zero-crossings&apos;)axes[1].axis(&apos;off&apos;)plt.tight_layout()plt.show()</pre>
		</div>
	</div>
</div>


<figure class="wp-block-image size-large"><img loading="lazy" data-attachment-id="10137" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/camera_zerocross/" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png" data-orig-size="2000,800" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="camera_zerocross" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=450&#038;ssl=1" src="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=450&#038;ssl=1" alt="" class="wp-image-10137" srcset_temp="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=1440 1440w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png 2000w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>



<p class="wp-block-paragraph">Notice how the zero-crossing map already captures much of the scene’s edge structure, but it is sensitive to low-level noise and retains spurious responses in flat regions. This motivates the additional refinement steps of the Canny algorithm.</p>



<h2 class="wp-block-heading">Gaussian smoothing and gradient computation</h2>



<p class="wp-block-paragraph">Before running the full Canny pipeline, it is instructive to inspect the intermediate steps. Here we apply a Gaussian blur and then compute Sobel gradients manually.</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre>
# Gaussian blur — sigma controlled by ksize (must be odd) and sigmaXblurred = cv2.GaussianBlur(image, ksize=(5, 5), sigmaX=1.4) # Sobel gradients in x and yGx = cv2.Sobel(blurred, cv2.CV_64F, dx=1, dy=0, ksize=3)Gy = cv2.Sobel(blurred, cv2.CV_64F, dx=0, dy=1, ksize=3) # Gradient magnitudemagnitude = np.sqrt(Gx**2 + Gy**2)magnitude = (magnitude / magnitude.max() * 255).astype(np.uint8) fig, axes = plt.subplots(1, 2, figsize=(10, 4))for ax, img, title in zip(axes, [blurred, magnitude], [&apos;Gaussian blur (σ=1.4)&apos;, &apos;|∇f| - Sobel magnitude&apos;]):    ax.imshow(img, cmap=&apos;gray&apos;)    ax.set_title(title)    ax.axis(&apos;off&apos;)plt.tight_layout()plt.show()</pre>
		</div>
	</div>
</div>


<figure class="wp-block-image size-large"><img loading="lazy" data-attachment-id="10136" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/camera_smooth_grad/" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png" data-orig-size="2000,800" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="camera_smooth_grad" data-image-description="" data-image-caption="" data-large-file="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=450&#038;ssl=1" src="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=450&#038;ssl=1" alt="" class="wp-image-10136" srcset_temp="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=1440 1440w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png 2000w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>



<h2 class="wp-block-heading">Canny edge detection with OpenCV</h2>



<p class="wp-block-paragraph">The OpenCV <code>Canny()</code> function accepts the image, the two hysteresis thresholds, and an optional aperture size for the Sobel operator. Crucially, the Gaussian smoothing step should be applied manually beforehand so you have full control over <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" />.</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre>
def run_canny(image, sigma, t_low, t_high, aperture=3):    &quot;&quot;&quot;Apply Gaussian blur then Canny edge detection.&quot;&quot;&quot;    # Kernel size: 2 * ceil(3*sigma) + 1 ensures the kernel covers ±3σ    ksize = 2 * int(np.ceil(3 * sigma)) + 1    blurred = cv2.GaussianBlur(image, (ksize, ksize), sigmaX=sigma)    edges = cv2.Canny(blurred, threshold1=t_low, threshold2=t_high, apertureSize=aperture)    return edgesedges = run_canny(image, sigma=1.4, t_low=50, t_high=150)fig, axes = plt.subplots(1, 2, figsize=(10, 4))axes[0].imshow(image, cmap=&apos;gray&apos;)axes[0].set_title(&apos;Original&apos;)axes[0].axis(&apos;off&apos;)axes[1].imshow(edges, cmap=&apos;gray&apos;)axes[1].set_title(&apos;Canny edges (σ=1.4, T_low=50, T_high=150)&apos;)axes[1].axis(&apos;off&apos;)plt.tight_layout()plt.show()</pre>
		</div>
	</div>
</div>


<figure class="wp-block-image size-large"><img loading="lazy" data-attachment-id="10132" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/camera_canny/" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png" data-orig-size="2000,800" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="camera_canny" data-image-description="" data-image-caption="" data-large-file="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=450&#038;ssl=1" src="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=450&#038;ssl=1" alt="" class="wp-image-10132" srcset_temp="https://i1.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=1440 1440w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png 2000w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>



<h2 class="wp-block-heading">The effect of hysteresis thresholds</h2>



<p class="wp-block-paragraph">The ratio <img src="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+%2F+T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+%2F+T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=T_%5Ctext%7Bhigh%7D+%2F+T_%5Ctext%7Blow%7D+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="T_\text{high} / T_\text{low} " class="latex" /> is at least as important as the absolute values. A common rule of thumb is to use a 2:1 or 3:1 ratio. Let us explore this now.</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre>
fig, axes = plt.subplots(2, 3, figsize=(14, 8))configs = [    (1.4, 20, 60, &apos;↑ Recall, ↑ Noise\nσ=1.4, T=20/60&apos;),    (1.4, 50, 150, &apos;Balanced\nσ=1.4, T=50/150&apos;),    (1.4, 100, 200, &apos;↓ Recall, ↓ Noise\nσ=1.4, T=100/200&apos;),    (0.5, 50, 150, &apos;Fine scale\nσ=0.5, T=50/150&apos;),    (2.0, 50, 150, &apos;Coarse scale\nσ=2.0, T=50/150&apos;),    (4.0, 50, 150, &apos;Very coarse scale\nσ=4.0, T=50/150&apos;),] for ax, (sigma, tl, th, title) in zip(axes.ravel(), configs):    result = run_canny(image, sigma=sigma, t_low=tl, t_high=th)    ax.imshow(result, cmap=&apos;gray&apos;)    ax.set_title(title, fontsize=9)    ax.axis(&apos;off&apos;) plt.tight_layout()plt.show()</pre>
		</div>
	</div>
</div>


<figure class="wp-block-image size-large"><img loading="lazy" data-attachment-id="10133" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/camera_hyst_thresh/" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png" data-orig-size="2800,1600" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="camera_hyst_thresh" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=450&#038;ssl=1" src="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=450&#038;ssl=1" alt="" class="wp-image-10133" srcset_temp="https://i0.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=2048 2048w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>



<p class="wp-block-paragraph">The top row demonstrates the threshold effect – lower thresholds recover more edges but also more noise, higher thresholds yield cleaner output at the cost of broken contours. The bottom row shows the scale effect governed by <img src="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;%23038;fg=555555&#038;%23038;s=0&#038;%23038;c=20201002" srcset_temp="https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Csigma+&#038;bg=f9f9f9&#038;fg=555555&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="\sigma " class="latex" /> – at small scales the detector responds to fine texture and noise, at large scales only the dominant structural boundaries survive.</p>



<h2 class="wp-block-heading">Overlaying edges on the original image</h2>



<p class="wp-block-paragraph">A useful visualisation is overlaying detected edges on the original image. This facilitates the quality assessment of our workflow.</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre>
# Convert to RGB so we can draw edges in redoverlay = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)overlay[edges &gt; 0] = [220, 40, 40] # red edges fig, ax = plt.subplots(figsize=(6, 6))ax.imshow(overlay)ax.set_title(&apos;Canny edges overlaid (σ=1.4, T=50/150)&apos;)ax.axis(&apos;off&apos;)plt.tight_layout()plt.show()</pre>
		</div>
	</div>
</div>

<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img loading="lazy" data-attachment-id="10135" data-permalink="https://poissonisfish.com/2026/05/08/edge-detection-in-python/camera_overlay/" data-orig-file="https://poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png" data-orig-size="1200,1200" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="camera_overlay" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=450&#038;ssl=1" src="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=450&#038;ssl=1" alt="" class="wp-image-10135" style="width:650px" srcset_temp="https://i2.wp.com/poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=450&#038;ssl=1 1024w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=150 150w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=300 300w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=768 768w, https://poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png 1200w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>
</div>


<p class="wp-block-paragraph">Similarly to the butterfly picture, here we find a solution that precisely identifies the sharpest edges from the image.</p>



<h1 class="wp-block-heading">Conclusion <img src="https://i1.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f3c1.png?w=578&#038;ssl=1" alt="&#x1f3c1;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></h1>



<p class="wp-block-paragraph">We have traced a line from the centre-surround receptive fields of retinal ganglion cells to the LoG operator and Marr’s zero-crossings, and from there to the Canny detector – one of the most popular algorithms in image processing. The key ideas are worth summarising:</p>



<ul class="wp-block-list">
<li><strong>Edges are zero-crossings of the second derivative</strong> of image intensity, a principle Marr derived from first principles and validated against neurophysiology <img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f9e0.png?w=578&#038;ssl=1" alt="&#x1f9e0;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></li>



<li><strong>The LoG operator</strong> implements this computationally: a Gaussian pre-filter controls scale and suppresses noise, whereas the Laplacian finds sign changes <img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f4bb.png?w=578&#038;ssl=1" alt="&#x1f4bb;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></li>



<li><strong>Canny refines the idea</strong> with NMS for thin, well-localised edges, and hysteresis thresholding to preserve continuous contours without fragmenting them <img src="https://i1.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f3d9.png?w=578&#038;ssl=1" alt="&#x1f3d9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></li>
</ul>



<p class="wp-block-paragraph">Edge detection may seem like a solved problem in an era of end-to-end learned vision systems, but it remains the conceptual foundation of a surprisingly wide range of techniques. Some practical applications worth exploring on your own include:</p>



<ul class="wp-block-list">
<li><strong>Hough transform</strong> for line and circle detection – it operates directly on edge maps</li>



<li><strong>Contour-based object detection</strong> – a classical pre-deep-learning approach that is still competitive in constrained domains</li>



<li><strong>Medical image segmentation</strong> – where edge-based pre-processing still complements learned models for thin-structure detection</li>
</ul>



<p class="wp-block-paragraph">That brings us to a close – thanks for reading, I hope this post was insightful and entertaining. Stay curious! <img src="https://i1.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f4a1.png?w=578&#038;ssl=1" alt="&#x1f4a1;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></p>



<h1 class="wp-block-heading">References <img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f4d6.png?w=578&#038;ssl=1" alt="&#x1f4d6;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" /></h1>


<ol class="wp-block-footnotes"><li id="563b7add-8b04-4fd2-a688-2383895c42c9">Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., &#038; Polosukhin, I. (2017). <em>Attention Is All You Need.</em> arXiv:1706.03762. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#563b7add-8b04-4fd2-a688-2383895c42c9-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="bd3dda1f-27f7-44e9-8d0f-9a42c28ed201">Vision Banana team, Google DeepMind (2026). <em>Image Generators are Generalist Vision Learners.</em> arXiv:2604.20329. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#bd3dda1f-27f7-44e9-8d0f-9a42c28ed201-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="63b9bf52-e78a-405a-bc33-5082dc51f74e">Robicheaux, P. <em>et al.</em> (2025). <em>RF-DETR: Neural Architecture Search for Real-Time Detection Transformers.</em> arXiv:2511.09554 (ICLR 2026). <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#63b9bf52-e78a-405a-bc33-5082dc51f74e-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="9740d4a1-35a7-4003-9e9f-63a4fa16b90b">Kudithipudi, D. <em>et al.</em> (2025). <em>Neuromorphic computing at scale.</em> Nature 637, 801–812. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#9740d4a1-35a7-4003-9e9f-63a4fa16b90b-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="58f9bd6d-ec82-4b9b-8285-5f8b083184ad">Ashtiani, F., Idjadi, M. H., &#038; Kim, K. (2026). <em>Integrated photonic neural network with on-chip backpropagation training.</em> Nature 651, 927–932. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#58f9bd6d-ec82-4b9b-8285-5f8b083184ad-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="7b848804-3070-4e9f-bc5f-ea3e60f0bf14">Assran, M. <em>et al.</em> (2023). <em>Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA).</em> arXiv:2301.08243. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#7b848804-3070-4e9f-bc5f-ea3e60f0bf14-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="d6a1d31f-a93f-4844-8b90-70b4da955016">Marr, D. (1982). <em>Vision: A Computational Investigation into the Human Representation and Processing of Visual Information.</em> W. H. Freeman; reissued by MIT Press (2010). <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#d6a1d31f-a93f-4844-8b90-70b4da955016-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="210cf9a3-ea19-4402-b37e-c2535bd96366">Darwin, C. (1859). <em>On the Origin of Species by Means of Natural Selection.</em> John Murray, London. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#210cf9a3-ea19-4402-b37e-c2535bd96366-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="8d6afae8-ee06-495e-9b78-9705d6088f63">Thompson, D’A. W. (1917). <em>On Growth and Form.</em> Cambridge University Press. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#8d6afae8-ee06-495e-9b78-9705d6088f63-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="ab32dfeb-f0f5-476f-9b2a-ad0939f8514f">Julesz, B. (1971). <em>Foundations of Cyclopean Perception.</em> University of Chicago Press. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#ab32dfeb-f0f5-476f-9b2a-ad0939f8514f-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="4b5583df-7b37-4973-9b87-cd2a13af4711">Marr, D., &#038; Hildreth, E. (1980). <em>Theory of edge detection.</em> Proceedings of the Royal Society of London B, 207(1167), 187–217. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#4b5583df-7b37-4973-9b87-cd2a13af4711-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li><li id="b51508a4-906a-492d-87d6-4a3a4971d9cc">Canny, J. (1986). <em>A Computational Approach to Edge Detection.</em> IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6), 679–698. <a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/#b51508a4-906a-492d-87d6-4a3a4971d9cc-link" rel="nofollow" target="_blank"><img src="https://i0.wp.com/s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/21a9.png?w=578&#038;ssl=1" alt="&#x21a9;" class="wp-smiley" style="height: 1em; max-height: 1em;" data-recalc-dims="1" />︎</a></li></ol>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://poissonisfish.com/2026/05/08/edge-detection-in-python/"> poissonisfish</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/edge-detection-in-python/">Edge detection in Python</a>]]></content:encoded>
					
		
		<enclosure url="https://1.gravatar.com/avatar/ddaccde3ecefe0821900911d3cd41d541083048d067f5e78cd9d597f0ea3ceaa?s=96&#038;d=identicon&#038;r=G" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/butterfly_canny.png" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/03/step_edge_derivatives.png" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/laplacian_of_gaussian.png" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/zero_cross_deblurred.png?w=1024" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/prewitt_sobel.png" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/canny_workflow.png?w=1024" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/camera_original-1.png" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/camera_zerocross-1.png?w=1024" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/camera_smooth_grad-1.png?w=1024" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/camera_canny-1.png?w=1024" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/camera_hyst_thresh-1.png?w=1024" length="0" type="" />
<enclosure url="https://poissonisfish.com/wp-content/uploads/2026/05/camera_overlay-1.png?w=1024" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401133</post-id>	</item>
		<item>
		<title>Differencing: A Transformation or a Trap?</title>
		<link>https://www.r-bloggers.com/2026/05/differencing-a-transformation-or-a-trap/</link>
		
		<dc:creator><![CDATA[M. Fatih Tüzen]]></dc:creator>
		<pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>1 Introduction<br />
Differencing is one of the most common transformations in time series analysis.<br />
It is also one of the easiest transformations to misunderstand.<br />
In many ARIMA-style workflows, differencing is introduced almost mechanically: i...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/differencing-a-transformation-or-a-trap/">Differencing: A Transformation or a Trap?</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/"> A Statistician&#039;s R Notebook</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i1.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/timeseries_differencing.png?w=578&#038;ssl=1" class="img-fluid quarto-figure quarto-figure-center figure-img" data-recalc-dims="1"></p>
</figure>
</div>
<section id="introduction" class="level1" data-number="1">
<h1 data-number="1"><span class="header-section-number">1</span> Introduction</h1>
<p>Differencing is one of the most common transformations in time series analysis.</p>
<p>It is also one of the easiest transformations to misunderstand.</p>
<p>In many ARIMA-style workflows, differencing is introduced almost mechanically: if a series is not stationary, take a difference; if it still appears non-stationary, take another one. While this advice is not entirely wrong, it can quietly create a dangerous habit. Differencing is not merely a technical preprocessing step — it changes the object of analysis itself.</p>
<p>In the previous article of this series, <em>Why Most Time Series Models Fail Before They Start</em>, we explored stationarity using real CPI data and discussed why many forecasting problems begin long before model estimation. The central idea was simple but important: unstable statistical properties can make even sophisticated models misleading.</p>
<p>You can read the first article here:</p>
<p><a href="https://mfatihtuzen.github.io/posts/2026-04-16_timeseries_stationary/" class="uri" rel="nofollow" target="_blank">https://mfatihtuzen.github.io/posts/2026-04-16_timeseries_stationary/</a></p>
<p>This article continues that discussion with a more subtle question:</p>
<blockquote class="blockquote">
<p>What exactly happens when we difference a time series?</p>
</blockquote>
<p>To explore this question, we will use the <strong>S&#038;P CoreLogic Case-Shiller U.S. National Home Price Index</strong>, available from the Federal Reserve Economic Data (FRED) database under the code <code>CSUSHPINSA</code>.</p>
<p>FRED series link:</p>
<p><a href="https://fred.stlouisfed.org/series/CSUSHPINSA" class="uri" rel="nofollow" target="_blank">https://fred.stlouisfed.org/series/CSUSHPINSA</a></p>
<p>The series tracks U.S. national home prices and provides a rich real-world example: long-run growth, sharp reversals during the housing crisis, and rapid post-pandemic acceleration.</p>
<p>That makes it an ideal setting for a deeper lesson:</p>
<blockquote class="blockquote">
<p>Differencing can stabilize a series, but it can also reshape the structure of the signal.</p>
</blockquote>
</section>
<section id="setup" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> Setup</h1>
<p>The data used in this article can be downloaded directly from FRED:</p>
<p><a href="https://fred.stlouisfed.org/graph/fredgraph.csv?id=CSUSHPINSA" class="uri" rel="nofollow" target="_blank">https://fred.stlouisfed.org/graph/fredgraph.csv?id=CSUSHPINSA</a></p>
<p>For reproducibility, the CSV file is saved in the same folder as this Quarto document.</p>
<div class="cell">
<pre>library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(slider)
library(forecast)
library(tseries)
library(scales)

theme_set(theme_minimal(base_size = 13))</pre>
</div>
<div class="cell">
<pre>hpi &lt;- read_csv(&quot;CSUSHPINSA.csv&quot;, show_col_types = FALSE) %&gt;%
  transmute(
    date = as.Date(observation_date),
    hpi  = as.numeric(CSUSHPINSA)
  ) %&gt;%
  arrange(date) %&gt;%
  filter(!is.na(date), !is.na(hpi))

hpi %&gt;% slice_head(n = 5)</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 5 × 2
  date         hpi
  &lt;date&gt;     &lt;dbl&gt;
1 1987-01-01  63.7
2 1987-02-01  64.1
3 1987-03-01  64.5
4 1987-04-01  65.0
5 1987-05-01  65.5</pre>
</div>
</div>
<p>We will create several transformed versions of the series.</p>
<div class="cell">
<pre>hpi &lt;- hpi %&gt;%
  mutate(
    diff_1   = hpi - lag(hpi),
    diff_2   = diff_1 - lag(diff_1),
    log_hpi  = log(hpi),
    log_diff = log_hpi - lag(log_hpi)
  )</pre>
</div>
<p>The variables have different meanings:</p>
<ul>
<li><code>hpi</code>: the index level itself</li>
<li><code>diff_1</code>: monthly absolute change in the index</li>
<li><code>diff_2</code>: change in the monthly change</li>
<li><code>log_diff</code>: approximate monthly proportional change</li>
</ul>
<p>This distinction matters. Transformations are not neutral. Each one changes what the series represents.</p>
</section>
<section id="the-raw-series-persistence-everywhere" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> The raw series: persistence everywhere</h1>
<p>Let us begin with the raw housing price index.</p>
<div class="cell">
<pre>ggplot(hpi, aes(date, hpi)) +
  geom_line(linewidth = 0.8, color = &quot;#1f4e5f&quot;) +
  labs(
    title = &quot;Raw Housing Price Index: Strong Persistence and Long-Run Trend&quot;,
    subtitle = &quot;S&P CoreLogic Case-Shiller U.S. National Home Price Index&quot;,
    x = NULL,
    y = &quot;Index&quot;
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/raw-series-plot-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>Even before running a formal statistical test, the plot already reveals something important. The series does not fluctuate around a stable mean. Instead, it exhibits strong persistence and a pronounced long-run upward movement.</p>
<p>Several major economic episodes are immediately visible: the housing boom of the mid-2000s, the collapse following the global financial crisis, the gradual recovery during the 2010s, and the rapid acceleration after 2020.</p>
<p>This is clearly not a series that looks ready for direct stationary modeling.</p>
<p>But the key issue is not simply the presence of trend. The trend itself carries economic meaning. Housing prices are not merely noisy observations around a fixed level; they reflect long-run structural forces such as credit conditions, interest rates, demographic demand, construction constraints, and broader macroeconomic cycles.</p>
<p>This creates the central tension behind differencing:</p>
<blockquote class="blockquote">
<p>By removing persistence, we may improve the statistical properties of the series — while simultaneously weakening part of its long-run economic signal.</p>
</blockquote>
</section>
<section id="the-acf-of-the-raw-series" class="level1" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> The ACF of the raw series</h1>
<p>The autocorrelation function provides another perspective on the same phenomenon.</p>
<div class="cell">
<pre>forecast::ggAcf(na.omit(hpi$hpi), lag.max = 26) +
  labs(
    title = &quot;ACF of Raw Housing Price Index&quot;,
    x = &quot;Lag&quot;,
    y = &quot;ACF&quot;
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i1.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/raw-acf-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The ACF declines extremely slowly and remains strongly positive even at relatively long lags. This is one of the classic visual signatures of a highly persistent process.</p>
<p>In practical terms, today’s housing price index is strongly related to its past values. That is not surprising. Housing markets do not reset from month to month; they evolve gradually through credit conditions, market expectations, supply constraints, and macroeconomic forces.</p>
<p>From a modeling standpoint, however, this dependence structure creates a challenge. Methods built around stationarity assumptions may struggle to distinguish genuine short-run dynamics from long-run drift if we model the raw level directly.</p>
<p>To formalize this intuition, let us turn to the Augmented Dickey–Fuller (ADF) test.</p>
<div class="cell">
<pre>adf_level &lt;- tseries::adf.test(na.omit(hpi$hpi))
adf_level</pre>
<div class="cell-output cell-output-stdout">
<pre>
    Augmented Dickey-Fuller Test

data:  na.omit(hpi$hpi)
Dickey-Fuller = -0.97386, Lag order = 7, p-value = 0.9427
alternative hypothesis: stationary</pre>
</div>
</div>
<p>The ADF test fails to reject the null hypothesis of a unit root for the raw series. In other words, there is no statistical evidence supporting stationarity in the housing price index at the level scale.</p>
<p>This result aligns closely with what we already observed visually: the series behaves more like a drifting process than a stable mean-reverting one.</p>
<p>So far, the standard recommendation appears sensible:</p>
<blockquote class="blockquote">
<p>If the series is non-stationary, take a difference.</p>
</blockquote>
</section>
<section id="first-differencing-less-trend-but-not-no-structure" class="level1" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> First differencing: less trend, but not no structure</h1>
<p>A first difference replaces the level of the series with its period-to-period change:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%20x_t%20=%20x_t%20-%20x_%7Bt-1%7D.%0A"></p>
<p>This operation is often described as “removing the trend.” That description is useful, but incomplete.</p>
<p>Let us now examine the first-differenced series.</p>
<div class="cell">
<pre>ggplot(hpi, aes(date, diff_1)) +
  geom_line(linewidth = 0.7, color = &quot;#d95f02&quot;, na.rm = TRUE) +
  labs(
    title = &quot;First Difference of the Housing Price Index&quot;,
    subtitle = &quot;Monthly absolute change in the index&quot;,
    x = NULL,
    y = &quot;Δ Index&quot;
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i1.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/first-difference-plot-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The transformation clearly changes the behavior of the data. The dominant upward drift visible in the raw housing price index is no longer the central feature. Instead, the series fluctuates around a much more stable level.</p>
<p>That is the good news.</p>
<p>But something equally important remains: the differenced series is still highly structured. It does not resemble random white noise. Distinct regimes, bursts of volatility, and recurring short-run movements are still visible throughout the series.</p>
<p>The periods surrounding the housing crisis and the post-pandemic surge are especially revealing. The magnitude of month-to-month changes increases sharply, and the volatility structure itself becomes more pronounced.</p>
<p>In other words, differencing reduced the trend — but it did not eliminate dependence.</p>
<p>This is a crucial distinction.</p>
<p>A transformed series can become statistically more manageable while still retaining meaningful internal structure. That is precisely why treating differencing as a mechanical preprocessing step can be misleading.</p>
</section>
<section id="the-acf-after-first-differencing" class="level1" data-number="6">
<h1 data-number="6"><span class="header-section-number">6</span> The ACF after first differencing</h1>
<p>Let us now inspect the autocorrelation structure after first differencing.</p>
<div class="cell">
<pre>forecast::ggAcf(na.omit(hpi$diff_1), lag.max = 26) +
  labs(
    title = &quot;ACF of First Difference&quot;,
    x = &quot;Lag&quot;,
    y = &quot;ACF&quot;
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/first-diff-acf-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The ACF is no longer dominated by the extremely slow decay observed in the raw housing price index. This is an important change. The transformation has substantially reduced the long-run persistence associated with the level series.</p>
<p>But the structure has not vanished.</p>
<p>Several lags remain clearly significant, and the series still exhibits meaningful short-run dynamics. Cyclical patterns and medium-range dependence are still visible, suggesting that the transformation reduced the trend without erasing the internal behavior of the process.</p>
<p>This is particularly important in housing markets, where adjustments tend to occur gradually rather than instantaneously. Prices respond over time through financing conditions, supply rigidities, expectations, and broader economic cycles.</p>
<p>A common beginner misconception is that differencing should transform a series into white noise. It should not. If every form of dependence disappeared completely, there would be little left to model.</p>
<p>The goal of differencing is not to destroy structure. The goal is to remove problematic non-stationarity while preserving meaningful dynamics.</p>
<p>The Augmented Dickey–Fuller test now tells a very different statistical story.</p>
<div class="cell">
<pre>adf_diff1 &lt;- tseries::adf.test(na.omit(hpi$diff_1))
adf_diff1</pre>
<div class="cell-output cell-output-stdout">
<pre>
    Augmented Dickey-Fuller Test

data:  na.omit(hpi$diff_1)
Dickey-Fuller = -3.9775, Lag order = 7, p-value = 0.01019
alternative hypothesis: stationary</pre>
</div>
</div>
<p>The ADF test rejects the null hypothesis of a unit root for the first-differenced series. Statistically speaking, the transformation appears successful: the series is now much more compatible with stationarity assumptions.</p>
<p>But this is where a subtle danger begins.</p>
<p>Once a transformation starts “working,” it becomes tempting to continue applying it mechanically. And that raises an important question:</p>
<blockquote class="blockquote">
<p>What happens if we difference the series again?</p>
</blockquote>
</section>
<section id="second-differencing-cleaner-or-distorted" class="level1" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> Second differencing: cleaner or distorted?</h1>
<p>If one difference helps, should two differences help even more?</p>
<p>This is where the trap begins.</p>
<p>A second difference is defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%5E2%20x_t%20=%20%5CDelta%20x_t%20-%20%5CDelta%20x_%7Bt-1%7D.%0A"></p>
<p>Conceptually, it measures the change in the change. In our case, the transformation no longer asks how much housing prices change from one month to the next. Instead, it asks whether those monthly changes themselves are accelerating or decelerating.</p>
<p>That is a fundamentally different question.</p>
<div class="cell">
<pre>ggplot(hpi, aes(date, diff_2)) +
  geom_line(linewidth = 0.7, color = &quot;#7b3294&quot;, na.rm = TRUE) +
  labs(
    title = &quot;Second Difference of the Housing Price Index&quot;,
    subtitle = &quot;Change in the monthly change&quot;,
    x = NULL,
    y = &quot;Δ² Index&quot;
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i1.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/second-difference-plot-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The second-differenced series appears more centered and much more aggressively oscillatory. It reacts strongly to turning points, reversals, and short-run fluctuations. At the same time, however, it becomes increasingly difficult to interpret in economic terms.</p>
<p>This is where the statistical and substantive perspectives begin to diverge.</p>
<p>From a purely statistical viewpoint, the second difference may appear attractive because the series now looks even more stationary. But statistical improvement alone does not guarantee that the transformed series remains meaningful for analysis or forecasting.</p>
<p>The key question is no longer:</p>
<blockquote class="blockquote">
<p>“Did we remove non-stationarity?”</p>
</blockquote>
<p>The key question becomes:</p>
<blockquote class="blockquote">
<p>“What happened to the original signal?”</p>
</blockquote>
</section>
<section id="the-acf-after-second-differencing" class="level1" data-number="8">
<h1 data-number="8"><span class="header-section-number">8</span> The ACF after second differencing</h1>
<p>The autocorrelation structure after second differencing makes the issue even clearer.</p>
<div class="cell">
<pre>forecast::ggAcf(na.omit(hpi$diff_2), lag.max = 26) +
  labs(
    title = &quot;ACF of Second Difference&quot;,
    x = &quot;Lag&quot;,
    y = &quot;ACF&quot;
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i1.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/second-diff-acf-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The pattern is now fundamentally different from what we observed earlier. The raw housing price index exhibited strong long-run persistence, while the first difference retained a more moderate and interpretable dependence structure. The second difference, however, introduces a much more alternating and oscillatory behavior.</p>
<p>This is one of the classic warning signs of over-differencing.</p>
<p>Excessive differencing can artificially induce negative dependence and amplify short-run fluctuations that were far less dominant in the original data. In practical terms, the transformation may begin to reshape the signal rather than simply stabilize it.</p>
<p>In other words:</p>
<blockquote class="blockquote">
<p>The second difference may look statistically cleaner, while simultaneously becoming substantively less meaningful.</p>
</blockquote>
<p>Let us now examine the Augmented Dickey–Fuller result.</p>
<div class="cell">
<pre>adf_diff2 &lt;- tseries::adf.test(na.omit(hpi$diff_2))
adf_diff2</pre>
<div class="cell-output cell-output-stdout">
<pre>
    Augmented Dickey-Fuller Test

data:  na.omit(hpi$diff_2)
Dickey-Fuller = -16.035, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary</pre>
</div>
</div>
<p>The ADF test strongly rejects the null hypothesis of a unit root for the second-differenced series. In fact, the warning message suggests that the p-value is even smaller than the value printed by the function.</p>
<p>From a purely statistical perspective, this might appear highly desirable. The transformation seems extremely successful at producing stationarity.</p>
<p>But this creates a useful paradox.</p>
<blockquote class="blockquote">
<p>The test becomes increasingly confident — but should we?</p>
</blockquote>
<p>A more stationary series is not automatically a better modeling target. Sometimes it is simply a more aggressively transformed version of the original data, with less economically meaningful structure left to explain.</p>
<p>At this point, another question naturally emerges:</p>
<blockquote class="blockquote">
<p>Is repeated ordinary differencing always the most meaningful transformation for economic time series?</p>
</blockquote>
</section>
<section id="a-brief-note-on-log-differencing" class="level1" data-number="9">
<h1 data-number="9"><span class="header-section-number">9</span> A brief note on log differencing</h1>
<p>So far, we have focused on ordinary differencing based on absolute changes. But in many economic and financial applications, analysts often prefer log differences instead.</p>
<p>Why?</p>
<p>Because the interpretation of absolute changes becomes increasingly problematic when the scale of a series evolves over time. A one-point increase in a housing price index does not carry the same meaning when the index is near 80 and when it exceeds 300.</p>
<p>Log differencing addresses this issue by focusing on proportional change rather than absolute change:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%20%5Clog(x_t)%20=%20%5Clog(x_t)%20-%20%5Clog(x_%7Bt-1%7D).%0A"></p>
<p>For relatively small changes, this quantity closely approximates the growth rate:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%20%5Clog(x_t)%20%5Capprox%20%5Cfrac%7Bx_t%20-%20x_%7Bt-1%7D%7D%7Bx_%7Bt-1%7D%7D.%0A"></p>
<p>This is one reason why log differences are widely used in macroeconomics, inflation analysis, and financial modeling. They often provide a more interpretable representation of economic dynamics because they express changes relative to the current scale of the series.</p>
<p>But an important caution remains.</p>
<p>Log differencing does not eliminate the broader trade-offs discussed in this article. It still transforms the dependence structure of the data, and it still changes the underlying modeling question.</p>
<p>The key lesson is therefore not:</p>
<blockquote class="blockquote">
<p>“Which transformation is universally correct?”</p>
</blockquote>
<p>The real question is:</p>
<blockquote class="blockquote">
<p>“Which transformation preserves the most meaningful structure for the problem we are trying to study?”</p>
</blockquote>
</section>
<section id="comparing-the-three-versions" class="level1" data-number="10">
<h1 data-number="10"><span class="header-section-number">10</span> Comparing the three versions</h1>
<p>A direct comparison makes the effect of differencing much easier to see. The figure below summarizes the central theme of this article.</p>
<div class="cell">
<pre>hpi_long &lt;- hpi %&gt;%
  select(date, hpi, diff_1, diff_2) %&gt;%
  pivot_longer(
    cols = c(hpi, diff_1, diff_2),
    names_to = &quot;series&quot;,
    values_to = &quot;value&quot;
  ) %&gt;%
  mutate(
    series = recode(
      series,
      hpi = &quot;Raw level&quot;,
      diff_1 = &quot;First difference&quot;,
      diff_2 = &quot;Second difference&quot;
    ),
    series = factor(series, levels = c(&quot;Raw level&quot;, &quot;First difference&quot;, &quot;Second difference&quot;))
  )

ggplot(hpi_long, aes(date, value)) +
  geom_line(linewidth = 0.7, color = &quot;#2c3e50&quot;, na.rm = TRUE) +
  facet_wrap(~ series, scales = &quot;free_y&quot;, ncol = 1) +
  labs(
    title = &quot;Raw series, first difference, and second difference&quot;,
    subtitle = &quot;Each transformation changes both the statistical properties and the interpretation&quot;,
    x = NULL,
    y = NULL
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i0.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/compare-series-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The raw housing price index contains long-run persistence, structural trend, and broad economic cycles. The first difference shifts attention toward month-to-month changes and substantially reduces the long-run drift. The second difference goes even further, emphasizing acceleration and deceleration in those monthly movements.</p>
<p>Each transformation produces a series with different statistical properties.</p>
<p>But more importantly, each transformation changes the interpretation of the data itself.</p>
<p>That is the crucial point.</p>
<p>These are not simply cleaner or noisier versions of the same series. They are fundamentally different analytical objects, each answering a different question about the underlying process.</p>
</section>
<section id="rolling-volatility-transformation-does-not-solve-everything" class="level1" data-number="11">
<h1 data-number="11"><span class="header-section-number">11</span> Rolling volatility: transformation does not solve everything</h1>
<p>Differencing may stabilize the mean of a series, but it does not guarantee stable variance.</p>
<div class="cell">
<pre>hpi &lt;- hpi %&gt;%
  mutate(
    roll_sd_diff1 = slider::slide_dbl(diff_1, sd, .before = 24, .complete = TRUE),
    roll_sd_diff2 = slider::slide_dbl(diff_2, sd, .before = 24, .complete = TRUE)
  )</pre>
</div>
<div class="cell">
<pre>hpi %&gt;%
  select(date, roll_sd_diff1, roll_sd_diff2) %&gt;%
  pivot_longer(
    cols = c(roll_sd_diff1, roll_sd_diff2),
    names_to = &quot;series&quot;,
    values_to = &quot;rolling_sd&quot;
  ) %&gt;%
  mutate(
    series = recode(
      series,
      roll_sd_diff1 = &quot;First difference&quot;,
      roll_sd_diff2 = &quot;Second difference&quot;
    )
  ) %&gt;%
  ggplot(aes(date, rolling_sd, color = series)) +
  geom_line(linewidth = 0.8, na.rm = TRUE) +
  scale_color_manual(values = c(&quot;First difference&quot; = &quot;#d95f02&quot;, &quot;Second difference&quot; = &quot;#7b3294&quot;)) +
  labs(
    title = &quot;24-month rolling standard deviation&quot;,
    subtitle = &quot;Differencing changes the volatility structure too&quot;,
    x = NULL,
    y = &quot;Rolling SD&quot;,
    color = NULL
  )</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i0.wp.com/mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/index_files/figure-html/rolling-volatility-plot-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The rolling standard deviation highlights an important lesson that is often overlooked in introductory time series discussions: stationarity is not a single on–off property. A transformation can improve one aspect of the data while leaving other forms of instability unresolved.</p>
<p>The housing price series illustrates this clearly. Even after differencing, the post-2020 period remains substantially more volatile than earlier decades. Large swings, volatility bursts, and changing dispersion are still visible in both transformed series.</p>
<p>This matters because many classical time series models implicitly assume not only stable mean behavior, but also relatively stable variance structure.</p>
<p>A model that ignores changing volatility may appear statistically successful while still producing fragile forecasts and misleading uncertainty estimates in practice.</p>
<p>In other words:</p>
<blockquote class="blockquote">
<p>Differencing can reduce trend-related non-stationarity without fully stabilizing the broader dynamics of the process.</p>
</blockquote>
</section>
<section id="a-compact-comparison" class="level1" data-number="12">
<h1 data-number="12"><span class="header-section-number">12</span> A compact comparison</h1>
<p>The table below summarizes both the transformations examined directly in this article and closely related alternatives frequently used in applied time series analysis.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>Series version</th>
<th>What it represents</th>
<th>What improves</th>
<th>What may be lost</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Raw level</td>
<td>Housing price index itself</td>
<td>Preserves long-run economic structure and trend information</td>
<td>Strong persistence and unit-root-like behavior</td>
</tr>
<tr class="even">
<td>First difference</td>
<td>Monthly absolute change</td>
<td>Reduces long-run drift and improves stationarity properties</td>
<td>Level interpretation and part of the long-run dependence structure</td>
</tr>
<tr class="odd">
<td>Second difference</td>
<td>Change in monthly change</td>
<td>Produces an even stronger stationarity signal</td>
<td>Economic interpretability and smoother dependence dynamics</td>
</tr>
<tr class="even">
<td>Log difference</td>
<td>Approximate proportional change</td>
<td>Often provides a more scale-adjusted interpretation of change</td>
<td>May still contain volatility shifts, structural breaks, or persistence</td>
</tr>
</tbody>
</table>
<p>No transformation is universally best. The appropriate choice depends on the analytical question, the structure of the data, and the type of signal we want to preserve.</p>
</section>
<section id="the-paradox-of-differencing" class="level1" data-number="13">
<h1 data-number="13"><span class="header-section-number">13</span> The paradox of differencing</h1>
<p>Differencing is powerful because it reduces persistence.</p>
<p>But that is also where the danger begins.</p>
<p>Persistence is not always a statistical nuisance. In many economic and financial time series, persistence is part of the signal itself. Long-run movements in housing prices, inflation, production, or income are often economically meaningful features of the process rather than accidental artifacts.</p>
<p>This creates a practical tension at the heart of time series modeling.</p>
<p>If we difference too little, we may mistake long-run drift for stable structure.</p>
<p>If we difference too aggressively, we may weaken meaningful dependence and end up modeling noise-like fluctuations instead of economically relevant dynamics.</p>
<p>And if we difference mechanically, without thinking carefully about interpretation, we may ultimately answer a question nobody intended to ask.</p>
<p>That is why differencing should not be treated as a preprocessing ritual.</p>
<p>It is a modeling decision.</p>
</section>
<section id="common-mistakes" class="level1" data-number="14">
<h1 data-number="14"><span class="header-section-number">14</span> Common mistakes</h1>
<p>Most mistakes with differencing are not computational. They are conceptual.</p>
<p><strong>Mistake 1: assuming first differencing automatically solves the problem</strong></p>
<p>First differencing often reduces trend and improves stationarity properties, but it does not guarantee white noise, stable variance, or a well-specified model.</p>
<p><strong>Mistake 2: increasing the differencing order simply because a test improves</strong></p>
<p>A second difference may appear statistically “better” according to a unit root test, but that does not automatically make it a more meaningful modeling target.</p>
<p><strong>Mistake 3: forgetting that differencing changes the question</strong></p>
<p>Modeling levels, monthly changes, and changes in monthly changes are fundamentally different analytical tasks.</p>
<p><strong>Mistake 4: ignoring the ACF after transformation</strong></p>
<p>The ACF is not merely a diagnostic plot. It reveals how the dependence structure of the series has been reshaped by the transformation itself.</p>
<p><strong>Mistake 5: treating preprocessing as separate from modeling</strong></p>
<p>Every transformation changes what the model sees. And once the model sees a different series, the modeling problem itself has changed.</p>
</section>
<section id="practical-workflow" class="level1" data-number="15">
<h1 data-number="15"><span class="header-section-number">15</span> Practical workflow</h1>
<p>A sensible differencing workflow should not begin with the question:</p>
<blockquote class="blockquote">
<p>“How many differences do I need?”</p>
</blockquote>
<p>A better workflow is something closer to this:</p>
<ol type="1">
<li>Plot the raw series.</li>
<li>Ask what the level of the series actually represents.</li>
<li>Inspect the autocorrelation structure.</li>
<li>Apply the smallest transformation that addresses the main statistical problem.</li>
<li>Re-examine the transformed series visually.</li>
<li>Re-check the dependence structure using the ACF.</li>
<li>Use tests such as the ADF test as supporting evidence rather than final truth.</li>
<li>Ask whether the transformed series still answers the substantive question of interest.</li>
</ol>
<p>This workflow is slower than blindly calling <code>auto.arima()</code> and accepting whatever transformation it selects automatically. But it is also safer. And in real analytical work, safer usually wins.</p>
</section>
<section id="final-thoughts" class="level1" data-number="16">
<h1 data-number="16"><span class="header-section-number">16</span> Final thoughts</h1>
<p>Differencing is not a trap by itself.</p>
<p>It becomes a trap when we start treating it as a harmless default.</p>
<p>The housing price example illustrates this tension clearly. The raw series is highly persistent and visibly non-stationary. The first difference improves the statistical behavior of the data while still preserving meaningful short-run dynamics. The second difference pushes the series even further toward stationarity, but it also reshapes the dependence structure and weakens the direct economic interpretation.</p>
<p>This is the central trade-off behind differencing.</p>
<p>The real question is not:</p>
<blockquote class="blockquote">
<p>“Is the series stationary now?”</p>
</blockquote>
<p>The more difficult — and ultimately more useful — question is:</p>
<blockquote class="blockquote">
<p>“After transformation, am I still modeling the signal I actually care about?”</p>
</blockquote>
<p>That question matters far more than the differencing order itself.</p>
</section>
<section id="references-and-further-reading" class="level1" data-number="17">
<h1 data-number="17"><span class="header-section-number">17</span> References and further reading</h1>
<p><strong>Data source</strong></p>
<ul>
<li><p>Federal Reserve Bank of St. Louis. <em>S&P CoreLogic Case-Shiller U.S. National Home Price Index (CSUSHPINSA).</em><br>
<a href="https://fred.stlouisfed.org/series/CSUSHPINSA" class="uri" rel="nofollow" target="_blank">https://fred.stlouisfed.org/series/CSUSHPINSA</a></p></li>
<li><p>FRED CSV download link used in this article:<br>
<a href="https://fred.stlouisfed.org/graph/fredgraph.csv?id=CSUSHPINSA" class="uri" rel="nofollow" target="_blank">https://fred.stlouisfed.org/graph/fredgraph.csv?id=CSUSHPINSA</a></p></li>
</ul>
<p><strong>Core time series references</strong></p>
<ul>
<li><p>Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). <em>Time Series Analysis: Forecasting and Control.</em> Wiley.</p></li>
<li><p>Hyndman, R. J., & Athanasopoulos, G. (2021). <em>Forecasting: Principles and Practice</em> (3rd ed.).<br>
<a href="https://otexts.com/fpp3/" class="uri" rel="nofollow" target="_blank">https://otexts.com/fpp3/</a></p></li>
<li><p>Hamilton, J. D. (1994). <em>Time Series Analysis.</em> Princeton University Press.</p></li>
</ul>
<p><strong>Unit roots and differencing</strong></p>
<ul>
<li><p>Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. <em>Journal of the American Statistical Association.</em></p></li>
<li><p>Said, S. E., & Dickey, D. A. (1984). Testing for unit roots in autoregressive-moving average models of unknown order. <em>Biometrika.</em></p></li>
</ul>
<p><strong>Practical R resources</strong></p>
<ul>
<li><p>R Core Team. <em>R: A Language and Environment for Statistical Computing.</em><br>
<a href="https://www.r-project.org/" class="uri" rel="nofollow" target="_blank">https://www.r-project.org/</a></p></li>
<li><p>Hyndman, R. J. et al. <em>forecast package documentation.</em><br>
<a href="https://pkg.robjhyndman.com/forecast/" class="uri" rel="nofollow" target="_blank">https://pkg.robjhyndman.com/forecast/</a></p></li>
</ul>


<!-- -->

</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://mfatihtuzen.github.io/posts/2026-05-07_timeseries_differencing/"> A Statistician&#039;s R Notebook</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/differencing-a-transformation-or-a-trap/">Differencing: A Transformation or a Trap?</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401116</post-id>	</item>
		<item>
		<title>New Mentoring Team with Experienced Mentors and New Voices</title>
		<link>https://www.r-bloggers.com/2026/05/new-mentoring-team-with-experienced-mentors-and-new-voices/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/05/06/mentors-2026/</guid>

					<description><![CDATA[<p>Read it in: Español. We are excited to introduce the new team of mentors for the rOpenSci 2026 Champions Program! This year we have eleven individuals committed to open science, bringing together a rich diversity of backgrounds and perspectives. The t...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/new-mentoring-team-with-experienced-mentors-and-new-voices/">New Mentoring Team with Experienced Mentors and New Voices</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/05/06/mentors-2026/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p><a href='https://ropensci.org/es/blog/2026/05/26/mentoras_es-2026/' rel="nofollow" target="_blank">Read it in: Español</a>.</p> <p>We are excited to introduce the new team of mentors for the rOpenSci 2026 Champions Program! This year we have eleven individuals committed to open science, bringing together a rich diversity of backgrounds and perspectives. The team is made up of people joining the program for the first time, former Champions returning as mentors, and experienced mentors from previous cohorts returning to continue to strengthen this community.</p>
<p>This year’s mentors come from a variety of disciplines and countries, and are active voices in the R community in Latin America and beyond. With their guidance, the new group of Champions will not only develop their projects, but also grow as leaders in open science and research software development.</p>
<h2>
New mentors
</h2><h3>
Alber Hamersson Sánchez Ipia
</h3><figure class="pull-left"><img src="https://i0.wp.com/ropensci.org/img/team/alber-sanchez.jpg?w=250&#038;ssl=1"
alt="Profile photo of Alber Hamersson Sánchez Ipia"  data-recalc-dims="1"><figcaption>
<p><strong>Alber Hamersson Sánchez Ipia </br> Instituto Nacional de Investigación Espacial del Brasil </br> rOpenSci</strong></p>
</figcaption>
</figure>
<p>Hi! I’m Alber and I’m going to be an rOpenSci mentor this year.</p>
<p>I was born in Colombia, in the department of Cauca, in one of the country’s most mountainous regions, called Tierradentro.
I am a Cadastral and Geodetic Engineer from the Francisco José de Caldas District University in Colombia, where I earned a Master’s in Information and Communication Sciences;
additionally, I completed another Master’s degree in Geoinformatics
at the University of Münster in Germany, and later earned a PhD in Earth System Science at the National Institute for Space Research (INPE) in Brazil. Currently, I live and work in Brazil and serve as a research assistant at the same INPE.</p>
<p>Part of my daily work involves writing R code to process spatial data and ensure the reproducibility of the scientific experiment results, so I am familiar with R package development.
Additionally, I am a co-author of the segmetric package, which is currently available on CRAN,
and I maintain one of the Data Carpentry lessons,
specifically the introduction to R for geospatial data.</p>
<p>I am interested in sharing the knowledge and experience I have accumulated so far with anyone who is going to write scientific or statistical software,
particularly in Spanish.
For this reason I am joining rOpenSci,
where I hope to be part of and help build a community of developers.</p>
</br>
</br>
<h3>
Pablo Paccioretti
</h3><figure class="pull-right"><img src="https://i0.wp.com/ropensci.org/img/team/pablo-paccioretti.jpg?w=250&#038;ssl=1"
alt="Profile photo of Pablo Paccioretti"  data-recalc-dims="1"><figcaption>
<p><strong>Pablo Paccioretti </br> Universidad Nacional de Córdoba (UNC) &#038; CONICET, Argentina </br> rOpenSci</strong></p>
</figcaption>
</figure>
<p>Hello! I am Pablo, Agricultural Engineer and PhD from the National University of Córdoba (Argentina), where I work as a researcher and teacher. Since my student years I have been interested in Statistics, which directed my work towards data analysis. In particular, I apply and develop methodologies and software tools to analyze georeferenced data from field trials and agricultural monitoring platforms.</p>
<p>I am interested in the development of open tools for data processing and analysis. I have developed scientific software, including R packages for georeferenced data analysis.</p>
<p>My participation in the Champions Program arises from an interest in strengthening the links between applied data analysis and programming, and promoting good practices in both areas. Through this program I hope to contribute to the community by sharing experiences and resources, while also learning from other professionals working in different contexts and disciplines.</p>
</br>
</br>
<h2>
Champions to mentors
</h2><h3>
Erick Navarro Delgado
</h3><figure class="pull-left"><img src="https://i0.wp.com/ropensci.org/img/team/erick-navarro-delgado.jpg?w=250&#038;ssl=1"
alt="Profile photo of Erick Navarro Delgado"  data-recalc-dims="1"><figcaption>
<p><strong>Erick Navarro Delgado </br> The University of British Columbia </br> rOpenSci</strong></p>
</figcaption>
</figure>
<p>Hello! My name is Erick Navarro, and I have a degree in biology from the Universidad Nacional Autónoma de México, and I am a PhD candidate in Bioinformatics at The University of British Columbia. I was born and raised in Mexico City, but currently live in Vancouver, Canada. My line of research is focused on developing computational tools to understand how genetic factors and environmental exposures/lived experiences act together or separately to shape our molecular landscape.</p>
<p>I am excited to participate in the rOpenSci Champions Program because I believe that open and accessible science is essential for conducting relevant research whose results benefit everyone in our society. In this program I hope to connect with new members of the open science community, share my programming skills, and drive software development in Latin America.</p>
</br>
<h3>
Guadalupe Pascal
</h3><figure class="pull-right"><img src="https://i0.wp.com/ropensci.org/img/team/guadalupe-pascal.jpg?w=250&#038;ssl=1"
alt="Profile photo of Guadalupe Pascal"  data-recalc-dims="1"><figcaption>
<p><strong>Guadalupe Pascal </br> UNLZ-UCA-UGR </br>rOpenSci</strong></p>
</figcaption>
</figure>
<p>Hello! My name is Guadalupe.</p>
<p>I am a researcher and project coordinator in optimization and data science for decision making in social systems, with transfeminist, open science and regional perspectives. I am also an associate professor of optimization and quantitative methods (UNLZ-UCA) and professor in data science and artificial intelligence courses (UGR). I have a Master’s in Decision Systems Engineering from URJC (Spain) and am an industrial engineer from UNLZ (Argentina), a PhD student in Information Technology and Engineering (URJC-UNLZ), and hold deplomas in Gender and Society (UNLZ), Cognitive Neuroscience (Neurotransmitting), and Education in the Age of Artificial Intelligence (UMET). I am part of the Matilda Latin American Open Chair and Women in Engineering, as a founding member and representative of the Gender Network of Engineering Faculties of Argentina.</p>
<p>I am also currently part of the rOpenSci community as a 2025-2026 cohort Champion, and I am very excited to be a mentor in this program for several reasons. On the one hand, I have a deep gratification from being engaged with the current cohort. From a simple point of view, the quality and rigor with which the program is implemented in all its instances have a direct impact on the quality and rigor of my own work. And from an holistic point of view, this serves as extremely valuable and compelling evidence of the synergy within communities of practice in developing skills and producing situated knowledge: the rOpenSci Champions Program is a concrete and real example of how communities share knowledge and, fundamentally, values, perspectives and embodied learning. On the other hand, I am looking forward to the challenge of being a mentor in this program because, although it is a role that I have played in other environments, I have never mentored the development of someone else’s R package. Finally, I would like to work in this role to share my experiences as both a mentor and mentee with the community. I believe that accompanying each other in a formative and transformative process is one of the most human dimensions of this ecosystem in which we work.</p>
</br>
<h3>
Andrea Gomez Vargas
</h3><figure class="pull-left"><img src="https://i2.wp.com/ropensci.org/img/team/andrea-vargas.png?w=250&#038;ssl=1"
alt="Profile photo of Andrea Gomez Vargas"  data-recalc-dims="1"><figcaption>
<p><strong>Andrea Gomez Vargas </br> INDEC </br> R-Ladies, rOpenSci</strong></p>
</figcaption>
</figure>
<p>I am Colombian by origin and Argentinean by choice, where today I live, develop my career, and actively participate in the community of R. I am a sociologist and work in the national statistics office of Argentina, in the area of social statistics, where I analyze information about the population to understand inequalities and living conditions.</p>
<p>The R community is my favorite space to share knowledge and build collectively. Currently, I am co-organizer of <a href="https://renbaires.github.io/" rel="nofollow" target="_blank">R in Buenos Aires</a> and <a href="https://rse-argentina.github.io/" rel="nofollow" target="_blank">RSE Argentina</a>, and I also participate in communities such as R-Ladies, LatinR and rOpenSci, contributing to the strengthening of networks at local, regional and global levels, promoting the learning and use of open tools in data science.</p>
<p>I was a <a href="https://blog/2025/05/15/puentes-comunidades-campeones-ropensci/" rel="nofollow" target="_blank">Champion in the 2023-2024 cohort.</a> where I developed <a href="https://soyandrea.github.io/arcenso/" rel="nofollow" target="_blank">{ARcenso} a package that facilitates access to historical census data for Argentina</a>. I am motivated to continue in the program as a mentor to continue promoting open knowledge and to accompany other people in the development of projects with an impact on their communities.</p>
</br>
<h3>
Monika Avila Marquez
</h3><figure class="pull-right"><img src="https://i1.wp.com/ropensci.org/img/team/monika-avila-marquez.jpeg?w=250&#038;ssl=1"
alt="Profile photo of Monika Ávila Márquez"  data-recalc-dims="1"><figcaption>
<p><strong>Monika Ávila Márquez </br> Universidad de Ginebra </br> R-Ladies, rOpenSci</strong></p>
</figcaption>
</figure>
<p>Hi, I am Monika, a postdoctoral researcher in statistics at the University of Geneva, where I work on causal inference and machine learning methods for panel data. I have a PhD in econometrics and my research focuses on the development of semi-parametric estimators that combine machine learning techniques with econometric foundations for estimating panel data models. I also work on mixed effects model selection and causal inference with interference.</p>
<p>I am co-organizer of the R-Ladies Geneva chapter, where I strive to build an inclusive community of practice for people using R in research.</p>
<p>I am participating as a mentor in this program because I want to give back for all that rOpenSci has given me. This community has accompanied me in my professional development &#8211; as a source of resources, as a learning space and as an example of what it means to do open science with rigor and generosity. Today I have the opportunity to offer that same support to others, and that excites me deeply.</p>
</br>
<h2>
Returning mentors
</h2><h3>
Luis D. Verde Arregoitia
</h3><figure class="pull-left"><img src="https://i2.wp.com/ropensci.org/img/team/luis-verde.jpeg?w=250&#038;ssl=1"
alt="Profile photo of Luis D. Verde Arregoitia"  data-recalc-dims="1"><figcaption>
<p><strong>Luis D. Verde Arregoitia</br>Instituto de Ecología AC &#8211; INECOL</br>LatinR, rOpenSci, The Carpentries</strong></p>
</figcaption>
</figure>
<p>Hi, I’m Luis D. Verde Arregoitia, a Mexican living in Xalapa, Mexico. Biologist and PhD in Biological Sciences, I am a mammal specialist with experience in R programming for data analysis, visualization and statistical modeling. I am also a certified instructor and author of several packages.</p>
<p>I was a mentor in two previous cohorts of the program where I have supported software developers in Latin America and I return with much enthusiasm to this new cohort.</p>
</br>
</br>
</br>
</br>
</br>
</br>
<h3>
Pao Corrales
</h3><figure class="pull-right"><img src="https://i1.wp.com/ropensci.org/img/team/paola-corrales.png?w=250&#038;ssl=1"
alt="Profile photo of Pao Corrales"  data-recalc-dims="1"><figcaption>
<p><strong>Pao Corrales</br>Australian National University &#038; 21st century weather CoE </br> R-Ladies, LatinR, rOpenSci, The Carpentries, RForwards </strong></p>
</figcaption>
</figure>
<p>I have a PhD in Atmospheric Sciences from the University of Buenos Aires (Argentina) and am currently working in Australia at the <em>21st Century Weather Centre</em> as a research software engineer.</p>
<p>I actively participate in R-Ladies, R Forwards, The Carpentries, LatinR and rOpenSci, learning and sharing knowledge about R in the community. In 2023 I participated in the Champions Program as a Champion, submitting the agroclimate package to the rOpenSci peer review process. I learned a lot and connected with people from all over the world. Tt was an excellent experience!</p>
<p>I am passionate about teaching and helping other people grow in what they do, access new opportunities and develop professionally and as individuals. I am very excited to participate again this year as a mentor in the Latin America Champions Program.</p>
<h3>
Francisco Cardozo
</h3><figure class="pull-right"><img src="https://i2.wp.com/ropensci.org/img/team/francisco-cardozo.jpg?w=250&#038;ssl=1"
alt="Profile photo of Francisco Cardozo"  data-recalc-dims="1"><figcaption>
<p><strong>Francisco Cardozo</br>[Afiliacion universidad] </br> rOpenSci &#8211; The Carpentries</strong></p>
</figcaption>
</figure>
<p>My name is Francisco Cardozo. I am originally from Colombia and came to the United States to pursue my doctoral studies. I am currently working at the University of Miami as a postdoctoral researcher in the IMPAC research center, an institution dedicated to advancing our understanding of adolescent development. I have participated in the Champions Program on several occasions. Much of my professional work has focused on research design and the application of statistical methods, particularly through the use of the R software environment.</p>
</br>
</br>
</br>
</br>
<h3>
Milagros Mendoza
</h3><figure class="pull-left"><img src="https://i1.wp.com/ropensci.org/img/team/milagros-mendoza.jpeg?w=250&#038;ssl=1"
alt="Milagros Mendoza&#39;s Profile Picture "  data-recalc-dims="1"><figcaption>
<p><strong>Milagros Mendoza </br> Universidade Federal Rural de Pernambuco</br> R-Ladies Natal, rOpenSci</strong></p>
</figcaption>
</figure>
<p>Hello, my name is Milagros. I am an ecologist and statistician driven by a desire to understand the complex systems that intertwine nature, society, and data. Throughout my career, I have worked with interdisciplinary data in the fields of climate, demography, and ecology, always striving to translate that data into knowledge that engages with reality and contributes to more informed decision-making. I am currently pursuing a postdoctoral fellowship at the Vale Institute of Technology in Brazil, where I am part of the research group on territories and natural resources.</p>
<p>I decided to serve as a mentor at rOpenSci because I am motivated to help more people develop confidence in using scientific tools, strengthen their critical thinking, and actively engage within the academic community. In this sense, I view mentoring as a learning space focused on dialogue and mutual growth.</p>
<h3>
Elio Campitelli
</h3><figure class="pull-left"><img src="https://i1.wp.com/ropensci.org/img/team/elio-campitelli.jpg?w=250&#038;ssl=1"
alt="Profile photo of Elio Campitelli"  data-recalc-dims="1"><figcaption>
<p><strong>Elio Campitelli </br> Monash University &#8211; rOpenSci</strong></p>
</figcaption>
</figure>
<p>I am from Argentina but two years ago I moved to Australia because it is the only other country that starts with A and uses the same type of plug.</p>
<p>I am doing a postdoc at Monash University researching interactions between Antarctic sea ice and the atmosphere.</p>
<p>I have been a mentor to previous cohorts of the program. It was a great experience that I want to repeat once more.</p>
</br>
</br>
</br>
</br>
</br>
</br>
<h2>
What’s next
</h2><p>We are happy to have this diverse and talented team of mentors, who embody the values of collaboration and commitment to collective growth. Their support will be key to helping the new Champions move their ideas and projects forward and contribute to the development of a stronger and more diverse open science community.</p>
<p>The selection of Champions is now complete, and we’ll be announcing them soon.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/05/06/mentors-2026/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/new-mentoring-team-with-experienced-mentors-and-new-voices/">New Mentoring Team with Experienced Mentors and New Voices</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401076</post-id>	</item>
		<item>
		<title>Differential Machine Learning with Twin Networks in R: Forecasting Bitcoin with Volatility Proxies</title>
		<link>https://www.r-bloggers.com/2026/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/</link>
		
		<dc:creator><![CDATA[Selcuk Disci]]></dc:creator>
		<pubDate>Tue, 05 May 2026 14:04:43 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://datageeek.com/?p=11991</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Introduction Differential Machine Learning (DML), as introduced in the recent arXiv paper (Differential Machine Learning for 0DTE Options with Stochastic Volatility and Jumps), extends supervised learning by incorporating not only function values but also their derivatives. In financial contexts, this often means sensitivities such as Greeks. However, when direct derivatives ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/">Differential Machine Learning with Twin Networks in R: Forecasting Bitcoin with Volatility Proxies</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://datageeek.com/2026/05/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/"> DataGeeek</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<h2 class="wp-block-heading">Introduction</h2>



<p class="wp-block-paragraph">Differential Machine Learning (DML), as introduced in the recent <strong><em><a href="https://arxiv.org/html/2603.07600v1" rel="nofollow" target="_blank">arXiv paper (Differential Machine Learning for 0DTE Options with Stochastic Volatility and Jumps)</a></em></strong>, extends supervised learning by incorporating not only function values but also their derivatives. In financial contexts, this often means sensitivities such as Greeks. However, when direct derivatives are unavailable, we can approximate market dynamics using <strong>volatility indicators</strong>.</p>



<p class="wp-block-paragraph">In this project, we adapt DML to Bitcoin price forecasting. Instead of derivatives, we use <strong>RSI, MACD, and Bollinger Bands</strong> as proxies for volatility. These indicators capture momentum, trend strength, and price dispersion, providing a practical way to embed uncertainty into the learning process. To implement this, we design a <strong>twin-network architecture</strong> in Keras: one network learns price dynamics from time-based features, while the other learns volatility signals. Finally, we combine them via a stacking ensemble to achieve robust forecasts with confidence intervals.</p>



<h2 class="wp-block-heading">Why Volatility Variables Instead of Derivatives?</h2>



<ul class="wp-block-list">
<li><strong>RSI (Relative Strength Index)</strong>: Measures momentum and overbought/oversold conditions.</li>



<li><strong>MACD (Moving Average Convergence Divergence)</strong>: Captures trend direction and strength.</li>



<li><strong>Bollinger Bands (upper/lower bands, %B)</strong>: Quantifies price dispersion and volatility.</li>
</ul>



<p class="wp-block-paragraph">These indicators act as empirical substitutes for theoretical derivatives. While DML in its pure form requires sensitivities, in practice, these volatility proxies provide similar information about how prices respond to market forces.</p>



<h3 class="wp-block-heading">Why Twin Networks?</h3>



<p class="wp-block-paragraph">The idea is to separate the learning tasks:</p>



<ul class="wp-block-list">
<li>The <strong>primary network</strong> models the continuous component of the price process.</li>



<li>The <strong>auxiliary network</strong> models the volatility/jump component. Together, they mimic the decomposition found in stochastic models such as Bates or Heston, but implemented within a flexible neural framework.</li>
</ul>



<h2 class="wp-block-heading">Ensemble via Stacking</h2>



<p class="wp-block-paragraph">Once both networks are trained, their predictions are combined using a <strong>linear regression meta-model</strong>. This stacking ensemble learns the optimal weighting between the primary and auxiliary outputs. The result is a forecast that integrates both trend and volatility signals, significantly improving accuracy compared to either network alone.</p>



<h2 class="wp-block-heading">Evaluation</h2>



<figure data-wp-context="{"imageId":"69f9f92c89a12"}" data-wp-interactive="core/image" data-wp-key="69f9f92c89a12" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" data-attachment-id="12020" data-permalink="https://datageeek.com/2026/05/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/image-132/" data-orig-file="https://datageeek.com/wp-content/uploads/2026/05/image-1.png" data-orig-size="1012,353" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="image" data-image-description="" data-image-caption="" data-large-file="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/05/image-1.png?w=450&#038;ssl=1" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/05/image-1.png?w=450&#038;ssl=1" alt="" class="wp-image-12020" srcset_temp="https://datageeek.com/wp-content/uploads/2026/05/image-1.png 1012w, https://datageeek.com/wp-content/uploads/2026/05/image-1.png?w=150 150w, https://datageeek.com/wp-content/uploads/2026/05/image-1.png?w=300 300w, https://datageeek.com/wp-content/uploads/2026/05/image-1.png?w=768 768w" sizes="(max-width: 1012px) 100vw, 1012px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<ul class="wp-block-list">
<li>Metrics: RMSE and MAPE, computed with the <code>yardstick</code> package.</li>



<li>Results:
<ul class="wp-block-list">
<li>Individual networks → RMSE ~76,000, MAPE ~99%.</li>



<li>Stacking ensemble → RMSE ~3,030, MAPE ~3.65.</li>
</ul>
</li>
</ul>



<p class="wp-block-paragraph">This demonstrates the power of combining price and volatility signals in a unified framework.</p>



<h2 class="wp-block-heading">Confidence Intervals</h2>



<p class="wp-block-paragraph">To quantify uncertainty, we compute <strong>residual-based confidence intervals</strong> around the point forecasts:</p>



<p class="wp-block-paragraph"><math display="block"><mrow><msub><mover accent="true"><mi>y</mi><mo>^</mo></mover><mi>t</mi></msub><mo>±</mo><mn>1.96</mn><mo>⋅</mo><msub><mi>σ</mi><mtext>residuals</mtext></msub></mrow></math></p>



<p class="wp-block-paragraph">This approach uses the standard deviation of training residuals to generate 95% confidence bands. It provides interpretable uncertainty estimates without requiring explicit probabilistic modeling.</p>



<h2 class="wp-block-heading">Visualization</h2>



<p class="wp-block-paragraph">The forecasts are visualized with <code>ggplot2</code>:</p>



<ul class="wp-block-list">
<li><strong>Grey ribbon</strong> → confidence intervals.</li>



<li><strong>Red line</strong> → stacking ensemble forecast.</li>



<li><strong>Black line</strong> → actual BTC prices.</li>
</ul>



<figure data-wp-context="{"imageId":"69f9f92c8a8d9"}" data-wp-interactive="core/image" data-wp-key="69f9f92c8a8d9" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" data-attachment-id="12021" data-permalink="https://datageeek.com/2026/05/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/image-133/" data-orig-file="https://datageeek.com/wp-content/uploads/2026/05/image-2.png" data-orig-size="1673,592" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="image" data-image-description="" data-image-caption="" data-large-file="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/05/image-2.png?w=450&#038;ssl=1" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/05/image-2.png?w=450&#038;ssl=1" alt="" class="wp-image-12021" srcset_temp="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/05/image-2.png?w=450&#038;ssl=1 1024w, https://datageeek.com/wp-content/uploads/2026/05/image-2.png?w=150 150w, https://datageeek.com/wp-content/uploads/2026/05/image-2.png?w=300 300w, https://datageeek.com/wp-content/uploads/2026/05/image-2.png?w=768 768w, https://datageeek.com/wp-content/uploads/2026/05/image-2.png?w=1440 1440w, https://datageeek.com/wp-content/uploads/2026/05/image-2.png 1673w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">This design clearly communicates both the central forecast and the uncertainty range. The chart you will include at the end of the blog shows exactly this: a red forecast line, black actuals, and a grey confidence band, illustrating how the ensemble integrates volatility information into predictive intervals.</p>



<figure data-wp-context="{"imageId":"69f9f92c8b334"}" data-wp-interactive="core/image" data-wp-key="69f9f92c8b334" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" data-attachment-id="12007" data-permalink="https://datageeek.com/2026/05/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/btc_dml/" data-orig-file="https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png" data-orig-size="1112,646" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="btc_dml" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=450&#038;ssl=1" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i2.wp.com/datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=450&#038;ssl=1" alt="" class="wp-image-12007" srcset_temp="https://i2.wp.com/datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=450&#038;ssl=1 1024w, https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=150 150w, https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=300 300w, https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=768 768w, https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png 1112w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<h2 class="wp-block-heading">Keras3 in R: Flexible Deep Learning for Financial Forecasting</h2>



<h3 class="wp-block-heading">What is Keras3?</h3>



<p class="wp-block-paragraph"><strong><em><a href="https://keras3.posit.co/" rel="nofollow" target="_blank">Keras3</a></em></strong> is the modern R interface to the Keras deep learning library, built on top of TensorFlow. It allows R users to define, train, and evaluate neural networks with concise syntax while leveraging TensorFlow’s computational power. Unlike earlier versions, Keras3 is fully aligned with TensorFlow 2.x, ensuring long-term support and compatibility.</p>



<h3 class="wp-block-heading">How We Used Keras3</h3>



<p class="wp-block-paragraph">In our workflow, Keras3 was the backbone for implementing the <strong>twin-network architecture</strong>:</p>



<figure data-wp-context="{"imageId":"69f9f92c8bffa"}" data-wp-interactive="core/image" data-wp-key="69f9f92c8bffa" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" data-attachment-id="12017" data-permalink="https://datageeek.com/2026/05/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/image-131/" data-orig-file="https://datageeek.com/wp-content/uploads/2026/05/image.png" data-orig-size="1064,654" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="image" data-image-description="" data-image-caption="" data-large-file="https://i2.wp.com/datageeek.com/wp-content/uploads/2026/05/image.png?w=450&#038;ssl=1" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i2.wp.com/datageeek.com/wp-content/uploads/2026/05/image.png?w=450&#038;ssl=1" alt="" class="wp-image-12017" srcset_temp="https://i2.wp.com/datageeek.com/wp-content/uploads/2026/05/image.png?w=450&#038;ssl=1 1024w, https://datageeek.com/wp-content/uploads/2026/05/image.png?w=150 150w, https://datageeek.com/wp-content/uploads/2026/05/image.png?w=300 300w, https://datageeek.com/wp-content/uploads/2026/05/image.png?w=768 768w, https://datageeek.com/wp-content/uploads/2026/05/image.png 1064w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<h3 class="wp-block-heading">Why ReLU?</h3>



<ul class="wp-block-list">
<li><strong>ReLU (Rectified Linear Unit)</strong> is the activation function used in hidden layers.</li>



<li>Formula: <math><mrow><mtext>ReLU</mtext><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></math>.</li>



<li>Benefits:
<ul class="wp-block-list">
<li>Introduces non-linearity, enabling the network to learn complex relationships.</li>



<li>Efficient and helps avoid vanishing gradients.</li>



<li>Well-suited for financial data where signals can be sparse and directional.</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading">Why Adam?</h3>



<ul class="wp-block-list">
<li><strong>Adam (Adaptive Moment Estimation)</strong> is the optimizer chosen.</li>



<li>Combines <strong>momentum</strong> (using past gradients to accelerate learning) and <strong>adaptive learning rates</strong> (adjusting step sizes per parameter).</li>



<li>Benefits:
<ul class="wp-block-list">
<li>Robust for noisy, non-stationary data like cryptocurrency prices.</li>



<li>Requires minimal tuning, making it ideal for plug-and-play workflows.</li>



<li>Widely adopted in both academic and applied machine learning.</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading">Contribution to the R Ecosystem</h3>



<p class="wp-block-paragraph">Keras3 bridges the gap between R’s <strong>tidyverse/tidymodels ecosystem</strong> and modern deep learning:</p>



<ul class="wp-block-list">
<li>Integrates seamlessly with data preprocessing pipelines (<code>recipes</code>, <code>timetk</code>).</li>



<li>Allows financial analysts and data scientists to stay within R while accessing TensorFlow’s deep learning capabilities.</li>



<li>Encourages reproducibility: models can be defined, trained, and evaluated entirely in R, without switching to Python.</li>



<li>Expands R’s role beyond traditional statistical modeling into <strong>state-of-the-art AI applications</strong>.</li>
</ul>



<h2 class="wp-block-heading">Why It Matters for DML</h2>



<p class="wp-block-paragraph">By using Keras3:</p>



<ul class="wp-block-list">
<li>We could <strong>separate learning tasks</strong> into a primary network (trend/seasonality) and an auxiliary network (volatility/momentum).</li>



<li>Both networks were trained with ReLU activations and Adam optimization, ensuring stability and efficiency.</li>



<li>Their outputs were combined in a stacking ensemble, yielding forecasts that integrate both price dynamics and volatility signals.</li>
</ul>



<p class="wp-block-paragraph">This demonstrates how Keras3 empowers R users to implement advanced architectures like twin networks, making Differential Machine Learning concepts practical in financial forecasting.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p class="wp-block-paragraph">This case study demonstrates how Differential Machine Learning concepts can be adapted for financial forecasting in R:</p>



<ul class="wp-block-list">
<li>Volatility indicators serve as practical substitutes for derivatives.</li>



<li>Twin-network architecture in Keras captures both trend and volatility.</li>



<li>Stacking ensembles significantly improves predictive performance.</li>



<li>Residual-based confidence intervals provide interpretable uncertainty estimates.</li>
</ul>



<p class="wp-block-paragraph">By combining academic ideas with reproducible R workflows, we can build robust forecasting pipelines that bridge theory and practice.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://datageeek.com/2026/05/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/"> DataGeeek</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/differential-machine-learning-with-twin-networks-in-r-forecasting-bitcoin-with-volatility-proxies/">Differential Machine Learning with Twin Networks in R: Forecasting Bitcoin with Volatility Proxies</a>]]></content:encoded>
					
		
		<enclosure url="https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png" length="0" type="" />
<enclosure url="https://1.gravatar.com/avatar/db5e3f9ef188ea98fe38ab05c5a3fad9fb52fe3472715a8fc02f7ea41731f77c?s=96&#038;d=identicon&#038;r=G" length="0" type="" />
<enclosure url="https://datageeek.com/wp-content/uploads/2026/05/image-1.png?w=1012" length="0" type="" />
<enclosure url="https://datageeek.com/wp-content/uploads/2026/05/image-2.png?w=1024" length="0" type="" />
<enclosure url="https://datageeek.com/wp-content/uploads/2026/05/btc_dml.png?w=1024" length="0" type="" />
<enclosure url="https://datageeek.com/wp-content/uploads/2026/05/image.png?w=1024" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401060</post-id>	</item>
		<item>
		<title>Setting function parameters for debugging</title>
		<link>https://www.r-bloggers.com/2026/05/setting-function-parameters-for-debugging/</link>
		
		<dc:creator><![CDATA[Jason Bryer]]></dc:creator>
		<pubDate>Tue, 05 May 2026 04:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://bryer.org/posts/2026-05-05-Setting_Function_Parameters_for_Debugging.html</guid>

					<description><![CDATA[<p>I tend to write a lot of functions that create specific graphics implemented with ggplot2. Although I try to pick graphic parameters (e.g. colors, text size, etc.) that are reasonable, I will typically define all relevant aesthetics as param...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/setting-function-parameters-for-debugging/">Setting function parameters for debugging</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://bryer.org/posts/2026-05-05-Setting_Function_Parameters_for_Debugging.html"> Jason Bryer</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 




<p>I tend to write a lot of functions that create specific graphics implemented with <a href="https://ggplot2.tidyverse.org/" rel="nofollow" target="_blank"><code>ggplot2</code></a>. Although I try to pick graphic parameters (e.g. colors, text size, etc.) that are reasonable, I will typically define all relevant aesthetics as parameters to my function. As a result, my functions tend to have a lot of parameters. When I need to debug the function I need to have all those parameters set in the global environment which usually requires me highlighting each assignment and running it. This function automates this process. You can pass any function and it will attempt to set parameters to the given environment (the global environment by default). It will return a data frame with a column indicating if the variable was set and the value. This is useful to know what parameters don’t have a default value that need to be set yourself.</p>
<div class="cell">
<pre>#' Set function parameters to an environment.
#'
#' This function is designed to help debug functions. It will attempt to set all
#' the default parameter values to the specified environment (global environment
#' by default). This is useful for when you want to execute code within the 
#' function definition interactively but need the parameters set in the current 
#' environment.
#'
#' **Warning:** This function will modify the global environment and therefore 
#' violates CRAN policy
#' [&quot;Packages should not modify the global environment (user’s workspace)&quot;]
#' (https://cran.r-project.org/web/packages/policies.html#Source-packages).
#'
#' @param FUN the function to assign parameters to an environment.
#' @param envir the environment to assign the variables to. Defaults to the 
#'        global environment.
#' @param verbose whether to return the data frame invisibly or to print the results.
#' @return a data frame where row names correspond to the parameter name with 
#'        two columns: `set` which is logical indicating if the variable was set 
#'        and `value` with a character representation of the variable value.
set_function_params &lt;- function(FUN, envir = globalenv(), verbose = interactive()) {
    params &lt;- formals(FUN)
    params_set &lt;- data.frame(row.names = names(params),
                             set = rep(FALSE, length(params)),
                             value = rep(NA_character_, length(params)))
    for(param in names(params)) {
        value &lt;- params[[param]]
        if(!missing(value)) {
            if(is.character(value)) {
                assign(param, value, envir = envir)
                params_set[param,]$value &lt;- value
            } else {
                assign(param, eval(value), envir = envir)
                params_set[param,]$value &lt;- eval(value)
            }
            params_set[param,]$set &lt;- TRUE
        }
    }
    if(verbose) {
        return(params_set)
    } else {
        invisible(params_set)
    }
}</pre>
</div>
<p>Very recently I was trying to debug a function that creates profile plots for cluster analysis (<a href="https://github.com/jbryer/clav/blob/master/R/profile_plot.R" rel="nofollow" target="_blank"><code>clav::profile_plot()</code></a>, <a href="https://clav.bryer.org/reference/profile_plot.html" rel="nofollow" target="_blank">documentation</a>). This function has 23 parameters! Setting these all manually is pretty tedious.</p>
<div class="cell">
<pre># List objects in the current environment
ls()</pre>
<div class="cell-output cell-output-stdout">
<pre>[1] &quot;set_function_params&quot;</pre>
</div>
<pre># Call the function
param_set_result &lt;- set_function_params(clav::profile_plot)

# Check to see if the parameters are actually set
ls()</pre>
<div class="cell-output cell-output-stdout">
<pre> [1] &quot;bonferroni&quot;          &quot;center_alpha&quot;        &quot;center_band&quot;        
 [4] &quot;center_fill&quot;         &quot;cluster_label_hjust&quot; &quot;color_palette&quot;      
 [7] &quot;hjust&quot;               &quot;label_clusters&quot;      &quot;label_means&quot;        
[10] &quot;label_outcome_means&quot; &quot;label_profile_means&quot; &quot;param_set_result&quot;   
[13] &quot;point_size&quot;          &quot;se_factor&quot;           &quot;set_function_params&quot;
[16] &quot;standardize&quot;         &quot;text_size&quot;           &quot;title&quot;              
[19] &quot;ylab&quot;               </pre>
</div>
</div>
<p>We can examine the data frame which gives a summary of the parameters set (or not).</p>
<div class="cell">
<pre>param_set_result</pre>
<div class="cell-output cell-output-stdout">
<pre>                      set               value
df                  FALSE                &lt;NA&gt;
clusters            FALSE                &lt;NA&gt;
df_dep              FALSE                &lt;NA&gt;
standardize          TRUE                TRUE
bonferroni           TRUE                TRUE
label_means          TRUE                TRUE
label_profile_means  TRUE                TRUE
label_outcome_means  TRUE                TRUE
center_band          TRUE                0.25
center_fill          TRUE             #f0f9e8
center_alpha         TRUE                 0.1
text_size            TRUE                   4
hjust                TRUE                 0.5
point_size           TRUE                   2
se_factor            TRUE                1.96
color_palette        TRUE                   2
cluster_labels      FALSE                &lt;NA&gt;
cluster_order       FALSE                &lt;NA&gt;
label_clusters       TRUE                TRUE
cluster_label_x     FALSE                &lt;NA&gt;
cluster_label_hjust  TRUE                   5
ylab                 TRUE Mean Standard Score
title                TRUE    Cluster Profiles</pre>
</div>
</div>



 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://bryer.org/posts/2026-05-05-Setting_Function_Parameters_for_Debugging.html"> Jason Bryer</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/setting-function-parameters-for-debugging/">Setting function parameters for debugging</a>]]></content:encoded>
					
		
		<enclosure url="https://bryer.org/posts/2026-05-05-banner.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">401048</post-id>	</item>
		<item>
		<title>JAGS 5.0.0-beta is available</title>
		<link>https://www.r-bloggers.com/2026/05/jags-5-0-0-beta-is-available/</link>
		
		<dc:creator><![CDATA[Martyn]]></dc:creator>
		<pubDate>Mon, 04 May 2026 17:20:26 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://martynplummer.wordpress.com/?p=1992</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> JAGS 5.0.0-beta is now available from SourceForge. The beta release is for two groups of people: Please send feedback via the JAGS forums or file a bug report The JAGS library The following packages are available: The rjags package In … Continue reading →</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/jags-5-0-0-beta-is-available/">JAGS 5.0.0-beta is available</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://martynplummer.wordpress.com/2026/05/04/jags-5-0-0-beta-is-available/"> R – JAGS News</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<figure class="wp-block-image size-large"><a href="https://i1.wp.com/martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?ssl=1" rel="nofollow" target="_blank"><img loading="lazy" data-attachment-id="1993" data-permalink="https://martynplummer.wordpress.com/img_0083/" data-orig-file="https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg" data-orig-size="1500,2000" data-comments-opened="1" data-image-meta="{"aperture":"1.64","credit":"","camera":"iPhone 16e","caption":"","created_timestamp":"1777199209","copyright":"","focal_length":"4.2","iso":"32","shutter_speed":"0.0021052631578947","title":"","orientation":"1","alt":""}" data-image-title="img_0083" data-image-description="" data-image-caption="" data-large-file="https://i1.wp.com/martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=450&#038;ssl=1" src="https://i1.wp.com/martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=450&#038;ssl=1" alt="" class="wp-image-1993" srcset_temp="https://i1.wp.com/martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=450&#038;ssl=1 584w, https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=1168 1168w, https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=113 113w, https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=225 225w, https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=768 768w" sizes="(max-width: 584px) 100vw, 584px" data-recalc-dims="1" /></a></figure>



<p class="wp-block-paragraph">JAGS 5.0.0-beta is now available from SourceForge. </p>



<p class="wp-block-paragraph">The beta release is for two groups of people:</p>



<ul class="wp-block-list">
<li>People who have written software depending on JAGS, in particular authors of R packages that depend on one of the four interfaces between R and JAGS – rjags, runjags, R2jags, and jagsUI. Currently some of these packages do not pass the CRAN tests with the new version of JAGS. Some time to fix these problems before the official release is helpful.</li>



<li>Anyone who wants to try out the new version and find problems with it before the official release. </li>
</ul>



<p class="wp-block-paragraph">Please send feedback via the <a href="https://sourceforge.net/p/mcmc-jags/discussion/" rel="nofollow" target="_blank">JAGS forums</a> or file a <a href="https://sourceforge.net/p/mcmc-jags/bugs/" rel="nofollow" target="_blank">bug report</a></p>



<h1 class="wp-block-heading">The JAGS library</h1>



<p class="wp-block-paragraph">The following packages are available:</p>



<ul class="wp-block-list">
<li><a href="https://sourceforge.net/projects/mcmc-jags/files/JAGS/5.x/Source/" rel="nofollow" target="_blank">Source tarball</a></li>



<li><a href="https://sourceforge.net/projects/mcmc-jags/files/JAGS/5.x/Windows/" rel="nofollow" target="_blank">Windows binary</a> installer (x86_64)</li>



<li><a href="https://sourceforge.net/projects/mcmc-jags/files/JAGS/5.x/macOS/" rel="nofollow" target="_blank">macOS binary</a> installer
<ul class="wp-block-list">
<li>There is a single macOS installer for both x86_64 and arm64.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading">The rjags package</h2>



<p class="wp-block-paragraph">In order to interface to JAGS 5.0.0 from R you will need rjags_5-1. This is not yet available from CRAN because some of the reverse dependencies do not yet work with version 5.0.0 of JAGS. The following packages are provided:</p>



<ul class="wp-block-list">
<li><a href="https://sourceforge.net/projects/mcmc-jags/files/rjags/5/Source/" rel="nofollow" target="_blank">Source tarball</a></li>



<li><a href="https://sourceforge.net/projects/mcmc-jags/files/rjags/5/Source/" rel="nofollow" target="_blank">Windows binary</a> (x86_64)</li>



<li><a href="https://sourceforge.net/projects/mcmc-jags/files/rjags/5/macOS/" rel="nofollow" target="_blank">macOS binaries</a>
<ul class="wp-block-list">
<li>Separate binaries are provided for x86_64 and arm64 and for R version 4.5.3 and 4.6.0.</li>
</ul>
</li>
</ul>



<p class="wp-block-paragraph"></p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://martynplummer.wordpress.com/2026/05/04/jags-5-0-0-beta-is-available/"> R – JAGS News</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/05/jags-5-0-0-beta-is-available/">JAGS 5.0.0-beta is available</a>]]></content:encoded>
					
		
		<enclosure url="https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg" length="0" type="" />
<enclosure url="https://0.gravatar.com/avatar/fdc509bd31ae635d89cccbdc64ef09464ea1c20d7858c4089a07ea3bea91b8e3?s=96&#038;d=identicon&#038;r=G" length="0" type="" />
<enclosure url="https://martynplummer.wordpress.com/wp-content/uploads/2026/05/img_0083.jpg?w=584" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">401041</post-id>	</item>
	</channel>
</rss>
