<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://javiercancela.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://javiercancela.com/" rel="alternate" type="text/html" /><updated>2026-04-04T11:00:00+00:00</updated><id>https://javiercancela.com/feed.xml</id><title type="html">Rough Notes</title><subtitle>Personal page of Javier Cancela.</subtitle><entry><title type="html">Testing Qwen3-Coder-Next - II</title><link href="https://javiercancela.com/2026/03/31/testing-qwen3-coder-ii.html" rel="alternate" type="text/html" title="Testing Qwen3-Coder-Next - II" /><published>2026-03-31T00:00:00+00:00</published><updated>2026-03-31T00:00:00+00:00</updated><id>https://javiercancela.com/2026/03/31/testing-qwen3-coder-ii</id><content type="html" xml:base="https://javiercancela.com/2026/03/31/testing-qwen3-coder-ii.html"><![CDATA[<p>While writing a more challenging test, I saw a new <a href="https://www.reddit.com/r/LocalLLM/comments/1rz86mm/qwen3527bclaude46opusreasoningdistilled_legendary/">Qwen-based model in Reddit</a>. <a href="https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF">Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF</a> uses <a href="https://huggingface.co/learn/llm-course/chapter11/3#supervised-fine-tuning">Supervised Fine-Tuning</a> and <a href="https://huggingface.co/learn/llm-course/chapter11/4">LoRA</a> with more than 13,000 reasoning examples generated with Claude Opus (mostly 4.6). The datasets used are listed <a href="https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF#%F0%9F%93%9A-all-datasets-used">here</a>.</p>

<p>So I decided to test both Qwen3-Coder-Next and this new model to see the differences.</p>

<h1 id="running-the-models">Running the models</h1>
<p>As with Qwen3-Coder, I’ll use the 4-bit quantization, the only one that can reasonably run on my PC:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre> hf download /
    Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF   /
    <span class="nt">--include</span> <span class="s2">"Qwen3.5-27B.Q4_K_M.gguf"</span>   /
    <span class="nt">--local-dir</span> ./models/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2
</pre></td></tr></tbody></table></code></pre></div></div>

<p>To run the models, I’m making some compromises:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre>llama-server /
    <span class="nt">-m</span> ./models/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2/Qwen3.5-27B.Q4_K_M.gguf /
    <span class="nt">--jinja</span> /
    <span class="nt">--n-gpu-layers</span> auto /
    <span class="nt">--ctx-size</span> 16384 /
    <span class="nt">--batch-size</span> 1024 /
    <span class="nt">--ubatch-size</span> 512 /
    <span class="nt">-n</span> <span class="nt">-1</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>With 16GB of VRAM, not all the layers will fit on the GPU, so <code class="language-plaintext highlighter-rouge">--n-gpu-layers auto</code> let’s <code class="language-plaintext highlighter-rouge">llama.cpp</code> figure out the best configuration. For the context, <code class="language-plaintext highlighter-rouge">--ctx-size 16384</code> was my first try, and it happened to be the right size for this exercise. For Qwen3-Coder-Next, I used the same non-params config as before:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>llama-server /
    <span class="nt">-m</span> ./models/Qwen3-Coder-Next/Qwen3-Coder-Next-UD-Q4_K_XL.gguf
</pre></td></tr></tbody></table></code></pre></div></div>

<h1 id="the-test">The test</h1>

<p>I asked ChatGPT for a prompt for this test. He suggested this problem:</p>
<blockquote>
  <p>Implement a small local workflow engine in Python.</p>
</blockquote>

<p>The full prompt is <a href="/assets/files/qwen-test-prompt.md">here</a>. It defines some requirements (parallel execution, retries, replay,…), a specific <code class="language-plaintext highlighter-rouge">json</code> format, states and behavior, and some constraints. It will take me quite some time to implement this by myself.</p>

<h1 id="the-solutions">The solutions</h1>
<h2 id="qwen3-coder-next">Qwen3-Coder-Next</h2>

<p>It takes two and a half minutes to produce a solution. Less than 6,000 tokens, at 40 tokens/s</p>
<figure><img src="/assets/images/2026-03-31-testing-qwen3-coder-ii/2026-03-31-16-24-31.png" alt="Llama.cpp data for Qwen3-Coder-Next" /><figcaption>Llama.cpp data for Qwen3-Coder-Next</figcaption></figure>
<p><br />
The full Qwen3-Coder-Next answer is <a href="/assets/files/qwen3-coder-answer.md">here</a>. It makes some comments on the design, gives a proposed file structure, and then shows the code in just one file for simplicity.</p>

<p>Comments are sparse, which is fine by me, but mostly useless. Naming is ok. Too much indentation at times (an if inside an if inside a while inside a with…). I’m not looking at the logic, I’ll ask someone (something?) to do it for me.</p>

<p>The answer includes a table showing how the code implements each requirement, with special emphasis on how it avoids race conditions (<em>spoiler</em>: it doesn’t). It finishes with instructions to run the code and a list of possible improvements.</p>

<h2 id="qwen35-27b-claude-46-opus-reasoning-distilled-v2">Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2</h2>

<p>This is a thinking model, so we get some “thinking time” before getting the solution. Also, unlike the previous model, this one is not based on a MoE architecture, so all layers are always active, and we can’t move them from RAM to VRAM. Some layers will use the CPU instead of the GPU, reducing performance.</p>

<figure><img src="/assets/images/2026-03-31-testing-qwen3-coder-ii/2026-03-31-13-42-25.png" alt="Llama.cpp data for Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2" /><figcaption>Llama.cpp data for Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2</figcaption></figure>
<p><br />
The full answer is <a href="/assets/files/qwen3.5-opus-answer.md">here</a>. The answer starts with the reasoning context, where the model thinks of a plan before implementing the code.</p>

<p>The style is similar to Qwen3-Coder, but it looks less elegant: more loops, a long succession of if-elif statements for the events, function comments, but without any additional information.</p>

<p>Apart from that, instructions on running the code and the justification for some of the decisions. The justifications are too generic, so the relationship with the code isn’t clear.</p>

<h1 id="the-evaluations">The evaluations</h1>

<p>To decide how good the solutions are, I sent all the info to both ChatGPT (GPT5.4 Thinking) and Claude 4.6 Extended. This is the prompt I used:</p>
<blockquote>
  <p>The prompt.md file contains a coding problem to test two local LLM models. The answers from each model are qwen3-coder-answer.md and qwen3.5-opus-answer.md. Analyze and compare the answers. List the good and the not-so-good from their answers, and give a final score to each.</p>
</blockquote>

<p>The answers from each model are: <a href="/assets/files/chatgpt-qwen-test-review.md">ChatGPT</a>, <a href="/assets/files/claude-qwen-test-review.md">Claude</a>.</p>

<p>Here is the summary from GPT 5.4 Thinking (model A is <code class="language-plaintext highlighter-rouge">Qwen3-Coder-Next</code>, model B is <code class="language-plaintext highlighter-rouge">Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2</code>):</p>
<figure><img src="/assets/images/2026-03-31-testing-qwen3-coder-ii/2026-03-31-17-26-02.png" alt="Everybode passes! I wish I had a teacher like this at college." /><figcaption>Everybode passes! I wish I had a teacher like this at college.</figcaption></figure>
<p><br /></p>

<p>So it doesn’t look too bad, until you look at the details on the most basic feature requested.
For Qwen3-Coder:</p>
<blockquote>
  <ul>
    <li>Event log implementation is flawed
      <ul>
        <li>Uses temp + rename per append, but writes only one line → overwrites log</li>
        <li>This breaks persistence and replay (critical bug).</li>
      </ul>
    </li>
  </ul>
</blockquote>

<p>For Qwen3.5-Opus:</p>
<blockquote>
  <ul>
    <li>Event log correctness issue (same as Model A)
      <ul>
        <li>Same temp+rename pattern → overwrites file → catastrophic.</li>
      </ul>
    </li>
  </ul>
</blockquote>

<p>I don’t know if I’d give a 7.5 for something that doesn’t work at all.</p>

<p>Let’s see what Claude thinks:</p>
<figure><img src="/assets/images/2026-03-31-testing-qwen3-coder-ii/2026-03-31-17-32-55.png" alt="Not so fast!" /><figcaption>Not so fast!</figcaption></figure>
<p><br /></p>

<p>Now, that sounds like a more reasonable score if, indeed, the basic features don’t work (although Claude thinks that the Opus version is almost ok, mmm, suspicious).</p>

<p>So, who’s right? There is only one way to find out: <del>review the code myself</del> ask the models again! But this time with the evidence from the other model 😈:</p>
<blockquote>
  <p>This is Claude’s review of the same code. The conclusions are different. Review them and tell me if you agree or you think Claude is wrong.</p>
</blockquote>

<blockquote>
  <p>This is ChatGPT’s review of the same code. The conclusions are different. Review them and tell me if you agree or you think ChatGPT is wrong.</p>
</blockquote>

<p>These are the answers: <a href="/assets/files/chatgpt-qwen-test-second-review.md">ChatGPT</a>, <a href="/assets/files/claude-qwen-test-second-review.md">Claude</a>.</p>
<figure><img src="/assets/images/2026-03-31-testing-qwen3-coder-ii/2026-03-31-18-35-30.png" alt="Sorry Claude, you were right" /><figcaption>Sorry Claude, you were right</figcaption></figure>
<p><br /></p>
<figure><img src="/assets/images/2026-03-31-testing-qwen3-coder-ii/2026-03-31-18-36-27.png" alt="Ok, you were not that bad either, ChatGPT" /><figcaption>Ok, you were not that bad either, ChatGPT</figcaption></figure>
<p><br /></p>

<h1 id="summary">Summary</h1>

<p>In the end, neither version is good enough. <code class="language-plaintext highlighter-rouge">Qwen3-Coder-Next</code> makes a couple of really critical mistakes that make the whole project unusable, while <code class="language-plaintext highlighter-rouge">Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2</code> errors are not as dramatic, but are bad enough to need serious revisions. Interestingly, once the critical bugs are fixed, the <code class="language-plaintext highlighter-rouge">Qwen3-Coder-Next</code> solution is better.</p>

<p>I would say that we can use these models to assist in the coding, but not as autonomous agents.</p>

<p>As a reference, <a href="/assets/files/chatgpt-qwen-test-solution.md">this is a solution proposed by ChatGPT</a> with Claude’s approval.</p>]]></content><author><name></name></author><category term="local-llm" /><summary type="html"><![CDATA[While writing a more challenging test, I saw a new Qwen-based model in Reddit. Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF uses Supervised Fine-Tuning and LoRA with more than 13,000 reasoning examples generated with Claude Opus (mostly 4.6). The datasets used are listed here.]]></summary></entry><entry><title type="html">Testing Qwen3-Coder-Next</title><link href="https://javiercancela.com/2026/03/24/testing-qwen3-coder.html" rel="alternate" type="text/html" title="Testing Qwen3-Coder-Next" /><published>2026-03-24T00:00:00+00:00</published><updated>2026-03-24T00:00:00+00:00</updated><id>https://javiercancela.com/2026/03/24/testing-qwen3-coder</id><content type="html" xml:base="https://javiercancela.com/2026/03/24/testing-qwen3-coder.html"><![CDATA[<p>Are local models useful for coding on a normal PC? Let’s find out.</p>

<h1 id="using-llmfit">Using <code class="language-plaintext highlighter-rouge">llmfit</code></h1>

<p>A good first step is using <a href="https://github.com/AlexsJones/llmfit">llmfit</a> to figure out what some good options are:</p>

<figure><img src="/assets/images/2027-03-22-testing-qwen3-coder/2026-03-24-10-41-40.png" alt="llmfit running on my Linux box" /><figcaption>llmfit running on my Linux box</figcaption></figure>
<p><br /></p>

<p>The Inst column shows if the model is supported (<code class="language-plaintext highlighter-rouge">-</code>), if it is available (<code class="language-plaintext highlighter-rouge">✓</code>), if it is available through llama.cpp (<code class="language-plaintext highlighter-rouge">L</code>), Ollama (<code class="language-plaintext highlighter-rouge">O</code>), … Most of the data shown is scraped from the web, but for some models, it is calculated based on a few heuristics. This explains the difference between Qwen3-Coder-Next and Qwen3-Coder-Next-FP8. These two are the same model, with the -FP8 version already quantized to 8 bits.</p>

<p>But the way llmfit works is searching for the optimal quantized model (for example, from <a href="https://huggingface.co/unsloth">Unsloth</a>) and calculating the scores from there. The differences then come from the fact that it uses the scraped data for <code class="language-plaintext highlighter-rouge">Qwen3-Coder-Next</code>, but for <code class="language-plaintext highlighter-rouge">Qwen3-Coder-Next-FP8</code> it infers some params.</p>

<p>For example, these models (or model, because they are both the same) use a <a href="https://huggingface.co/blog/moe">MoE architecture</a>, where instead of a big FFN, we have several different small FFNs. According to the <a href="https://huggingface.co/Qwen/Qwen3-Coder-Next#model-overview">HF card</a>, <code class="language-plaintext highlighter-rouge">Qwen3-Coder-Next</code> (and the FP8 variant) has 512 experts, of which only 10 are activated at the same time, with one additional shared expert. Each expert has a hidden layer of 512 dimensions.</p>

<p>So the main difference is in the number of active parameters, that is, the number of params for the active experts: for Qwen3-Coder-Next we have the 3B set in the HF card, but for Qwen3-Coder-Next-FP8 we have almost 5.5B. So this would make a better use of the GPU, while also leaving more RAM free. Maybe even allowing a better quantization.</p>

<h1 id="downloading-the-model">Downloading the model</h1>

<p>In any case, this is all wrong: there is only one model to quantize, and the FP8 (80GB) version wouldn’t fit in my RAM + VRAM.</p>

<p>So I’m going to download this one:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>hf download unsloth/Qwen3-Coder-Next-GGUF   <span class="nt">--include</span> <span class="s2">"Qwen3-Coder-Next-UD-Q4_K_XL.gguf"</span>   <span class="nt">--local-dir</span> ./models/Qwen3-Coder-Next
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The UD-Q4_K_XL quantization is a special unsloth dynamic quantization that is supposed to perform better.</p>

<h1 id="testing-the-model">Testing the model</h1>

<p>Once downloaded, we run it with llama.cpp:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>llama-server <span class="nt">-m</span> ./models/Qwen3-Coder-Next/Qwen3-Coder-Next-UD-Q4_K_XL.gguf
</pre></td></tr></tbody></table></code></pre></div></div>

<figure><img src="/assets/images/2027-03-22-testing-qwen3-coder/2026-03-24-19-10-05.png" alt="A simple exercise" /><figcaption>A simple exercise</figcaption></figure>
<p><br /></p>

<p>The code generated is <a href="/assets/files/llama-server-qwen-test.py">in this file</a>. Qwen also added these comments:</p>

<figure><img src="/assets/images/2027-03-22-testing-qwen3-coder/2026-03-24-22-22-07.png" alt="Not bad!" /><figcaption>Not bad!</figcaption></figure>
<p><br /></p>

<p>The result isn’t bad: there are things to improve (project in <code class="language-plaintext highlighter-rouge">FROM</code> clause, location, no real reason for the <code class="language-plaintext highlighter-rouge">ORDER BY</code>), but the code works.</p>

<p>I will try something more complex when I have time.</p>]]></content><author><name></name></author><category term="images" /><summary type="html"><![CDATA[Are local models useful for coding on a normal PC? Let’s find out.]]></summary></entry><entry><title type="html">Using Telegram to control Claude</title><link href="https://javiercancela.com/2026/03/22/controling-claude-telegram.html" rel="alternate" type="text/html" title="Using Telegram to control Claude" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://javiercancela.com/2026/03/22/controling-claude-telegram</id><content type="html" xml:base="https://javiercancela.com/2026/03/22/controling-claude-telegram.html"><![CDATA[<p>Anthropic has just released a feature called <a href="https://code.claude.com/docs/en/channels-reference">Channels</a>, which allows a Claude session to communicate with an external system in one of two ways:</p>
<ul>
  <li>listening for events on an HTTP port</li>
  <li>polling an external API</li>
</ul>

<figure><img src="/assets/images/2026-03-23-10-10-10.png" alt="From https://code.claude.com/docs/en/channels-reference" /><figcaption><em>From https://code.claude.com/docs/en/channels-reference</em></figcaption></figure>
<p><br /></p>

<p>The channel is an MCP server automatically run when we use the <code class="language-plaintext highlighter-rouge">--channels</code> parameter when launching Claude.</p>

<h1 id="testing-channels-with-telegram">Testing channels with Telegram</h1>

<p>We first need to create a <a href="https://core.telegram.org/bots/features#creating-a-new-bot">Telegram bot</a> and safely store the bot token.</p>
<figure><img src="/assets/images/2026-03-23-11-12-20.png" alt="Creating a bot in Telegram" /><figcaption><em>Creating a bot in Telegram</em></figcaption></figure>
<p><br /></p>

<p>We also have to verify that Claude’s version is <code class="language-plaintext highlighter-rouge">2.1.80</code> or later. To run the MCP server, Claude also needs a JavaScript runtime. I’m using <a href="https://bun.sh/">bun</a>, but I think <a href="https://nodejs.org/en">node</a> and <a href="https://deno.com/">deno</a> are valid options too.</p>

<p>The next step is installing the Telegram plugin inside Claude:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>/plugin marketplace add anthropics/claude-plugins-official <span class="c"># In case it wasn't already added</span>
/plugin <span class="nb">install </span>telegram@claude-plugins-official
/reload-plugins
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Now the Telegram plugin should be available, and we just have to configure the bot token:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>/telegram:configure 123456789:AAHfiqksKZ8...
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The final step is to pair the bot with Claude. For that, we DM the bot we created before:</p>
<figure><img src="/assets/images/2026-03-23-controling-claude-telegram/2026-03-23-19-45-52.png" alt="If you did everything ok, you only need to say Hi! once" /><figcaption>If you did everything ok, you only need to say Hi! once</figcaption></figure>
<p><br /></p>

<p>The bot’s response includes the command to pair the session inside Claude:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>/telegram:access pair a12345
</pre></td></tr></tbody></table></code></pre></div></div>

<p>I said ‘Hi’ three times because the MCP server for the channel wasn’t working. I installed <code class="language-plaintext highlighter-rouge">bun</code> but I didn’t make sure the <code class="language-plaintext highlighter-rouge">PATH</code> was updated, so the MCP server was failing. You can check the MCP server status with <code class="language-plaintext highlighter-rouge">/mcp</code>:</p>

<figure><img src="/assets/images/2026-03-23-11-22-58.png" alt="Checking the mcp status" /><figcaption><em>Checking the mcp status</em></figcaption></figure>
<p><br /></p>

<p>And that’s it. Now I can use Telegram as a front-end for an existing Claude session:</p>
<figure><img src="/assets/images/2026-03-23-controling-claude-telegram/2026-03-24-09-16-47.png" alt="Now I want the same for Codex" /><figcaption>Now I want the same for Codex</figcaption></figure>
<p><br /></p>]]></content><author><name></name></author><category term="claude" /><summary type="html"><![CDATA[Anthropic has just released a feature called Channels, which allows a Claude session to communicate with an external system in one of two ways: listening for events on an HTTP port polling an external API]]></summary></entry><entry><title type="html">Old man tries things, Nano Banana edition</title><link href="https://javiercancela.com/2026/03/15/generating-images.html" rel="alternate" type="text/html" title="Old man tries things, Nano Banana edition" /><published>2026-03-15T00:00:00+00:00</published><updated>2026-03-15T00:00:00+00:00</updated><id>https://javiercancela.com/2026/03/15/generating-images</id><content type="html" xml:base="https://javiercancela.com/2026/03/15/generating-images.html"><![CDATA[<h1 id="-update-">** Update **</h1>
<p>This entry was written when the blog looked like this:</p>
<figure><img src="/assets/images/2026-03-15-generating-images/2026-03-24-09-34-34.png" alt="This blog used to be this ugly" /><figcaption>This blog used to be this ugly</figcaption></figure>
<p><br /></p>

<h1 id="original-text">Original text</h1>

<p>I use Gemini or ChatGPT to generate images for my posts. And they are getting really good at this, especially Gemini’s Nano Banana 2.</p>

<p>I used Nano Banana 2 to generate two images. With this prompt:</p>
<blockquote>
  <p>A 10-year old kid playing a conversational game with a ZX Spectrum. The monitor is an old TV. The image is photorealistic, but with aesthetics from the 80s, and a Polaroid-like look. Make the proportions 16-9.</p>
</blockquote>

<p>The first one was this:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-08-22-48-55.png" alt="Is the Spectrum too big or the rest of the image too small?" /><figcaption>Is the Spectrum too big or the rest of the image too small?</figcaption></figure>
<p><br />
The mix of right and wrong details is amusing. The general aesthetic is very good, the key elements (ZX Spectrum, game on old TV, kid) are quite good (the Spectrum is too big). The game text is legible and consistent with conversational games.</p>

<p>On the other hand, too many posters and books mean lots of garbled text or deformed faces.</p>

<p>This is the second one:</p>
<figure><img src="/assets/images/2027-01-01-generating-images/2026-03-12-22-49-43.png" alt="Greetings Thorin!" /><figcaption>Greetings Thorin!</figcaption></figure>
<p><br />
This one is even better, in part because most of the text is out of focus. Some text in the foreground is garbled, but some out-of-focus text looks good (the cassette on the TV seems to read “The Hobbit”, and the imperial trooper in the background is in front of something that reads like Star Wars). The Spectrum has better proportions, and everything looks consistent with the date helpfully included in the Polaroid border.</p>

<p>My favourite detail is the game on the TV: it is actually The Hobbit, probably the best known conversational game for the Spectrum, but with graphics that look more like those of an NES.</p>

<p>As a check, I used the same prompt with ChatGPT Images:</p>
<figure><img src="/assets/images/2026-03-15-llm-convo-game-ii/2026-03-12-19-47-58.png" alt="TODO" /><figcaption>TODO</figcaption></figure>
<p><br />
While the worst image quality is consistent with an 80s Polaroid, everything in the image is worse: the Spectrum-like computer, the generic non-game-like text in the TV, the Rubik cube with a wrong perspective…</p>

<p>I’ll stick with Nano Banana 2 for the future.</p>]]></content><author><name></name></author><category term="images" /><summary type="html"><![CDATA[** Update ** This entry was written when the blog looked like this: This blog used to be this ugly]]></summary></entry><entry><title type="html">Writing an agentic conversational game - II</title><link href="https://javiercancela.com/2026/03/14/llm-convo-game-ii.html" rel="alternate" type="text/html" title="Writing an agentic conversational game - II" /><published>2026-03-14T00:00:00+00:00</published><updated>2026-03-14T00:00:00+00:00</updated><id>https://javiercancela.com/2026/03/14/llm-convo-game-ii</id><content type="html" xml:base="https://javiercancela.com/2026/03/14/llm-convo-game-ii.html"><![CDATA[<h1 id="the-implementation">The implementation</h1>

<p>Once <a href="/2026/03/12/llm-convo-game-i.html">the plan</a> is defined, it’s time to start coding. I just fed OpenAI Codex the prompt as ChatGPT had defined it and generated this version:
https://github.com/javiercancela/virtual-world/tree/v1</p>

<p>Just as a test, I used Cursor in Auto mode to add `uv:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-08-16-08-01.png" alt="Use uv, please" /><figcaption>Use uv, please</figcaption></figure>
<p><br /></p>

<p>Running the code results in this interaction:</p>

<figure><img src="/assets/images/2026-03-15-llm-convo-game-ii/2026-03-14-19-07-44.png" alt="This image does not resolve into a safe action" /><figcaption>This image does not resolve into a safe action</figcaption></figure>
<p><br /></p>

<p>Ok, now we need some actual local LLM to test. I deleted my previous llama.cpp install, so I downloaded and compiled it again:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>cmake <span class="nt">-B</span> build <span class="nt">-DGGML_CUDA</span><span class="o">=</span>ON
cmake <span class="nt">--build</span> build <span class="nt">--config</span> Release <span class="nt">-j</span> 16
</pre></td></tr></tbody></table></code></pre></div></div>

<p>For the model, we use the Qwen3.5 family, with the Unsloth quantization:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>hf download unsloth/Qwen3.5-4B-GGUF   <span class="nt">--include</span> <span class="s2">"Qwen3.5-4B-UD-Q4_K_XL.gguf"</span>   <span class="nt">--local-dir</span> ./models/Qwen3.5-4B
hf download unsloth/Qwen3.5-9B-GGUF   <span class="nt">--include</span> <span class="s2">"Qwen3.5-9B-UD-Q5_K_XL.gguf"</span>   <span class="nt">--local-dir</span> ./models/Qwen3.5-9B
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The 4B version for routing, the 9B version for the rest.</p>

<p>We run one instance of <code class="language-plaintext highlighter-rouge">llama.cpp</code> for each, in different ports. To fit this into my system, I run the 9B model with the GPU and the 4B one with the CPU:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre> ./llama.cpp/build/bin/llama-server  /
    <span class="nt">-m</span> ./models/Qwen3.5-9B/Qwen3.5-9B-UD-Q5_K_XL.gguf   /
    <span class="nt">--alias</span> Qwen3.5-9B   /
    <span class="nt">--port</span> 8081

<span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>  ./llama.cpp/build/bin/llama-server  /
    <span class="nt">-m</span> ./models/Qwen3.5-4B/Qwen3.5-4B-UD-Q4_K_XL.gguf /
    <span class="nt">-ngl</span> 0  /
    <span class="nt">-t</span> 8  /
    <span class="nt">-c</span> 2048  /
    <span class="nt">--port</span> 8082
</pre></td></tr></tbody></table></code></pre></div></div>

<p>I used CUDA_VISIBLE_DEVICES=”” to force using the CPU and avoid compiling this model specifically for the CPU. But to make it work, I had to add the <code class="language-plaintext highlighter-rouge">-ngl 0</code> param too. The option <code class="language-plaintext highlighter-rouge">-c 2048</code> limits the context to use less memory, because context usage for routing is small.</p>

<p>I found an error after trying with the llama servers running:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre>...
slot update_slots: id  3 | task 910 | prompt processing done, n_tokens = 110, batch.n_tokens = 110
slot print_timing: id  3 | task 910 |
prompt eval time =      56.08 ms /   110 tokens (    0.51 ms per token,  1961.52 tokens per second)
 eval time =    2224.34 ms /   128 tokens (   17.38 ms per token,    57.55 tokens per second)
 total time =    2280.42 ms /   238 tokens
slot      release: id  3 | task 910 | stop processing: n_tokens = 237, truncated = 0
srv  update_slots: all slots are idle
srv          stop: cancel task, id_task = 910
srv  update_slots: all slots are idle
srv    operator(): got exception: {"error":{"code":500,"message":"Failed to parse input at pos 494: ","type":"server_error"}}
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 500

</pre></td></tr></tbody></table></code></pre></div></div>

<p>I just pasted the log to Claude and asked him to fix it. This is the fixed version: https://github.com/javiercancela/virtual-world/tree/v2.</p>

<p>This is the end result for this version:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
</pre></td><td class="rouge-code"><pre>Night Desk
Rain hammers against the steel shutters as Mara Voss leans over her desk, her
eyes fixed on the locked exit door. The only sound is the rhythmic clatter of
keys against the steel cabinet and the soft rustle of paper in the visitor
ledger. She offers a knowing smile, suggesting that the storm outside is just
as trapped as you are inside.
Commands: inspect &lt;object&gt;, talk to Mara, use &lt;item&gt; on &lt;target&gt;, take &lt;item&gt;,
move &lt;object&gt;, inventory, look, help, quit.

&gt;  What is the time according to the clock?
Rain drums against the high windows as Mara Voss measures your story, her eyes
fixed on the locked steel cabinet and the sealed exit door. The only things
within reach are the visitor ledger, the coat rack, the framed photo, and the
desk, all set against the cold glow of the office.

&gt; Examine the framed photo
The photograph reveals Mara with a commendation ribbon pinned to her jacket,
while the engraved plaque beneath reads: "Attention kept us alive." It feels
less like sentiment and more like a test, the stark white paper contrasting
with the dim, humming light of the security office.

&gt; Examine the visitor ledger
The ledger lies open on the desk, its final entry underlined twice in stark
red pencil, while the name "Silas Vale" stares back from the visitor's
signature. Mara's sharp eyes are already scanning the page, waiting for you
to confirm whether that name belongs in the report or the witness box.

&gt; Ask Mara about Silas Vale
Rain drums against the high windows as Mara Voss measures your story, her gaze
fixed on the locked steel cabinet and sealed exit door. The only things within
reach are the visitor ledger, the framed photo, the coat rack, and the heavy
steel cabinet that remains stubbornly closed.

&gt; talk to Mara
"The ink on that ledger doesn't dry quickly when someone is watching you, so
why are you still looking at me?" Mara's voice is low and steady, cutting
through the rain outside as she avoids your eyes. "If you believe Silas Vale's
name belongs in the witness box, then you know exactly what to do with the
ledger before I say another word."

&gt; Yes, I think it begongs in the witness box
That does not resolve into a safe action. Rephrase it with a clear object or
request.

&gt; open the desk
Using the desk that way does nothing useful.

&gt; open the steel cabinet
The keypad waits for a three-digit code you do not have yet.

&gt; inventory
Inventory: empty.

&gt; take the ledger
You cannot take the visitor ledger.

&gt; inspect the desk
The desk is a cluttered fortress of incident forms and a dead flashlight, the
visitor ledger already open to the latest page as if waiting for an answer.
The ink on the recent entries glistens under the flickering light, demanding
that you look closer at who has been here.

&gt; read the ledger
Inventory: empty.

&gt; inspect the ledger
The ledger's final entry trembles under double red underlines, the ink stark
against the white page. You trace the name Silas Vale, knowing Mara's eyes will
follow the trail of that signature. The silence in the office feels heavier
now, waiting for the next clue to fall into place.

&gt; help
Commands: inspect &lt;object&gt;, talk to Mara, use &lt;item&gt; on &lt;target&gt;, take &lt;item&gt;,
move &lt;object&gt;, inventory, look, help, quit.

&gt; move ledger
You shift the visitor ledger, but it changes nothing useful.

&gt; inspect the coat rack
Mara's raincoat drips onto the tiles, the metallic clink of a leather key clip
hanging empty from the sleeve seam. The rack remains barren, confirming that
whatever key belongs there is not on it. The silence of the office is broken
only by the rhythmic tapping of water against the floor.

&gt; inspect the leather key clip
That does not resolve into a safe action. Rephrase it with a clear object or
request.

&gt; tak the key clip
That does not resolve into a safe action. Rephrase it with a clear object or
request.

&gt; take the key clip
That does not resolve into a safe action. Rephrase it with a clear object or
request.

&gt; inspect Mara's raincoat
Mara Voss watches your every move with controlled impatience, her eyes scanning
the room as if expecting a trap. She looks tired and suspicious, refusing to
volunteer the cabinet code despite your persistent inquiries. The silence in
the Security Office grows heavier with each passing second.

&gt; inspect raincoat
The empty clip suggests the important key is stored elsewhere.

&gt; move the coat rack
You shift the desk, but it changes nothing useful.

&gt; move coat rack
You drag the coat rack aside, revealing only the slick sheen of wet tile and a
faint trail of grit beneath it. No spare key is hidden in the shadows of the
wood, and the silence of the office remains unbroken by the sound of brass or
distant sirens.

&gt; move framed photo
You straighten the framed photo, revealing a pristine, dust-free rectangle behind
it that offers no hidden switch. The recent handling of the frame only deepens
the room's quiet theme: notice the details, do not expect tricks. The air in the
Security Office remains still, waiting for the next observation.

&gt; talk to mara
"Careful with the coat rack, I've been testing to see if you'll notice the grit
trail beneath it. If you think I'm hiding something in the office, you should be looking at the visitor ledger instead."

&gt; move visitor ledger
You shift the visitor ledger, but it changes nothing useful.

&gt; take visitor ledger
You cannot take the visitor ledger.

&gt; talk to Silas Vale
The ledger you just inspected was a dead end, Silas, but it confirmed that
someone else has been watching this room. If you truly want to know what I'm
hiding, you need to get out here and talk to the visitor in the corner, not keep
asking me.

&gt; talk to the visitor
That does not resolve into a safe action. Rephrase it with a clear object or
request.

&gt; talk to mara
I know you want answers, Mara, but the visitor in the corner holds the truth I
cannot give you. Go inspect the ledger again; if that's a dead end, then someone
else is watching this room, and you're the only one who can see it.

&gt; inspect the corner
That does not resolve into a safe action. Rephrase it with a clear object or
request.

&gt; tell mara Silas Vale
Silas Vale is a man of many talents, yes, but his ledger is the only place his
true nature hides. Go back to the desk, open the ledger, and tell me what you
find there. If the pages are blank, then someone else is watching this room,
and you're the only one who can see it.

&gt; open the ledger
Using the visitor ledger that way does nothing useful.

&gt; inspect ledger
The ledger's final entry trembles under the red pencil, its double underlining
stark against the dim light. You recognize the name Silas Vale from the
visitor log, a detail Mara would likely dissect with sharp curiosity. The air
in the office feels heavy with the weight of a secret waiting to be uncovered.

&gt; talk to mara
Silence is the only thing that keeps the shadows from shifting, so stop talking
and look at the ledger again. If you think you see something there, I will
believe you; if not, then someone else is already watching us.

&gt; tell Mara I see something
The red pencil on the ledger is the only thing that matters right now, yet you
claim the steel cabinet holds a truth I cannot yet read. If that cabinet
contains more than a key, then the shadows are already shifting, and I am no
longer the only one watching us. Go back to the ledger; the answer to what
you see is written there, not in the cold steel.

&gt; Silas Vale
That does not resolve into a safe action. Rephrase it with a clear object or
request.
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Something is wrong with how the game decides that Mara should trust me. So the next step will be fixing this.</p>]]></content><author><name></name></author><category term="vibecoding" /><category term="agents" /><summary type="html"><![CDATA[The implementation]]></summary></entry><entry><title type="html">Writing an agentic conversational game - I</title><link href="https://javiercancela.com/2026/03/12/llm-convo-game-i.html" rel="alternate" type="text/html" title="Writing an agentic conversational game - I" /><published>2026-03-12T00:00:00+00:00</published><updated>2026-03-12T00:00:00+00:00</updated><id>https://javiercancela.com/2026/03/12/llm-convo-game-i</id><content type="html" xml:base="https://javiercancela.com/2026/03/12/llm-convo-game-i.html"><![CDATA[<h1 id="the-idea">The idea</h1>

<p>My first computer was a Sinclair ZX81, but after only a few months, my father sold it and bought a ZX Spectrum. Thanks to the Spectrum, I learnt to code, but most of the time I spent with this computer, and this was a good part of my childhood, I was playing games.</p>

<p>And one of my favourite types of games was conversational games.</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-08-22-56-26.png" alt="TODO" /><figcaption>I liked the book more than the movies. And the game more than the book.</figcaption></figure>
<p><br /></p>

<p>So why not try to recreate games like this one, but with LLMs to have actual conversations? How hard can it be? (Famous last words). Let’s find out!</p>

<h1 id="the-plan">The plan</h1>

<p>My go-to model is GPT. At the moment of starting this project, GPT 5.4 was already announced but not yet released, so I used GPT 5.3 (web interface).</p>

<p>This is the initial prompt:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-16-51-33.png" alt="A reasonable request" /><figcaption>A reasonable request</figcaption></figure>
<p><br />
And this is the answer:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-16-53-44.png" alt="Ha! Not that fast!" /><figcaption>Ha! Not that fast!</figcaption></figure>
<p><br />
Ok then, let’s clarify things first:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-16-55-06.png" alt="Here you go!" /><figcaption>Here you go!</figcaption></figure>
<p><br />
And finally:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-17-04-30.png" alt="Finally!" /><figcaption>Finally!</figcaption></figure>
<p><br /></p>

<p><a href="/assets/files/03-project_plan_detective_llm_agents.md">The plan</a> is quite comprehensive.</p>

<h1 id="the-review">The review</h1>

<p>But is it a good plan? Let’s ask Claude! I fed Claude Opus 4.6 the original prompt, the questions from ChatGPT, my answers, and the plan. This was Claude’s answer:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-17-28-44.png" alt="Not bad" /><figcaption>Not bad</figcaption></figure>
<p><br /></p>

<p>I was interested in knowing more about the model recommendations:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-17-29-50.png" alt="My specs suck" /><figcaption>My specs suck</figcaption></figure>
<p><br />
I have an NVIDIA GeForce RTX 5060 Ti with an AMD Ryzen 9 7900 × 24 and 64 GB of RAM running on Ubuntu 24.04. This was his (its?) recommendation:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-17-34-23.png" alt="Outdated" /><figcaption>Outdated</figcaption></figure>
<p><br />
A problem with conversations like this one is that the model finds a recent result (Qwen-3) but doesn’t bother to check if there is a better option that wasn’t available when the model was trained, as is the case with Qwen-3.5, which was announced in mid February. (To be fair, this happens less with ChatGPT). I have to ask Claude to double-check:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-17-40-50.png" alt="Are you sure?!" /><figcaption>Are you sure?!</figcaption></figure>
<p><br />
After that, he did a web search to get all the data:</p>
<figure><img src="/assets/images/2027-01-01-llm-rpg/2026-03-12-17-42-16.png" alt="Claude thinks he is talking to Trump" /><figcaption>Claude thinks he is talking to Trump</figcaption></figure>
<p><br />
I find it interesting that he uses this sycophantic tone to recognize the mistake.</p>

<p>We are now ready to start the coding phase.</p>]]></content><author><name></name></author><category term="vibecoding" /><category term="agents" /><summary type="html"><![CDATA[The idea]]></summary></entry><entry><title type="html">Reinforcement Learning II - Basic algorithms</title><link href="https://javiercancela.com/2026/02/17/q-learning.html" rel="alternate" type="text/html" title="Reinforcement Learning II - Basic algorithms" /><published>2026-02-17T00:00:00+00:00</published><updated>2026-02-17T00:00:00+00:00</updated><id>https://javiercancela.com/2026/02/17/q-learning</id><content type="html" xml:base="https://javiercancela.com/2026/02/17/q-learning.html"><![CDATA[<p>With the basic engine <a href="/2026/01/25/connect-4.html">created</a>, the next step is adding a couple of basic non-RL agents.</p>

<h1 id="the-heuristic-agent">The heuristic agent</h1>

<p>The first agent will follow a simple heuristic: make the most central move possible. For this, I wrote this prompt:</p>

<blockquote>
  <p>Add a new “heuristic” agent. The agent:</p>
  <ul>
    <li>if it has a winning move, it plays it</li>
    <li>if the opponent has one winning move, it blocks it</li>
    <li>if a move gives the opponent a winning move, it avoids it</li>
    <li>for the rest of the cases, it prioritices central moves, the more in the center the better</li>
    <li>for ties, choose a random move</li>
  </ul>
</blockquote>

<p>It implements it without a hitch. I test it against the random agent, playing the heuristic as the second player:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="rouge-code"><pre>Running 1000 games: Random vs Heuristic

 100/1000 games completed
 200/1000 games completed
 300/1000 games completed
 400/1000 games completed
 500/1000 games completed
 600/1000 games completed
 700/1000 games completed
 800/1000 games completed
 900/1000 games completed
 1000/1000 games completed

<span class="o">========================================</span>
Results
<span class="o">========================================</span>

Random <span class="o">(</span>Player 1<span class="o">)</span>:
 Wins:     13 <span class="o">(</span>  1.3%<span class="o">)</span>

Heuristic <span class="o">(</span>Player 2<span class="o">)</span>:
 Wins:    987 <span class="o">(</span> 98.7%<span class="o">)</span>

Draws:        0 <span class="o">(</span>  0.0%<span class="o">)</span>

Total:     1000 games
<span class="o">========================================</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That simple rule makes all the difference, and all of a sudden, playing randomly is an almost sure defeat.</p>

<h1 id="the-minimax-agent">The minimax agent</h1>

<p>I (vibe)coded the heuristic agent in January, but I didn’t continue the post until February, when the new Claude Opus 4.6 and GPT-5.3-Codex were <a href="https://x.com/CrypSaf/status/2019487372811940340?s=20">simultaneously released</a>. So the new code comes from these two models (mostly GPT).</p>

<p>My next prompt:</p>
<blockquote>
  <p>Add a minimax agent to use as baseline for future Q-Learning agents (like the heuristic agent)</p>
</blockquote>

<p>Minimax is an algorithm that can be used in two-player games with perfect information. The basic idea is to build a tree with all possible moves. If I’m the first player (the maximizer), at the beginning of the game, the board is empty, and that is the root of the tree.</p>

<p>From there, we get seven edges, one for one of my possible moves, that end in nodes corresponding to the state of the board after that move (a piece at the bottom of the selected column). From each node grow seven edges with the possible moves of my opponent (the minimizer), and so on.</p>

<p>This calculation repeats until all edges get to a terminal state (win, lose, or draw), or we get to a predefined evaluation limit (we can build the whole tree because it is computationally too expensive)</p>

<p>Once we have the tree, we compute a score for each leaf (the terminal nodes), and we choose one of them: if the edges getting to the nodes correspond to moves for the adversary (minimizer nodes), we choose the lowest score, if to my moves (maximizer nodes), we choose the highest one. The reason is that we’ll compute the score so that the higher the score, the better for the maximizer player, and we assume the minizer we’ll choose the move we evaluate as the worst for us.</p>

<figure><img src="/assets/images/2027-02-01-the-basics/2026-02-08-22-35-36.png" alt="TODO" /><figcaption>Example image tree. <a href="https://www.researchgate.net/figure/MiniMax-Generated-Tree-3_fig1_360872512">Source</a>.</figcaption></figure>
<p><br /></p>

<p>And how do we compute the score? We also use a heuristic for this. The one Codex chose (the idea seems logical, the detail may be suboptimal, or maybe not) is to add points based on how many pieces in a row you have, with an additional score based on how central your pieces are. This is how Claude explains the evaluation function:</p>
<figure><img src="/assets/images/2027-02-01-the-basics/2026-02-08-22-43-04.png" alt="Claude in Cursor" /><figcaption>Claude in Cursor</figcaption></figure>
<p><br /></p>

<p>The application lets you choose the depth of the tree, four by default. It works, but it is a little slow. I imagine that I should be using all 24 cores for this, so I prompted Codex again:</p>
<blockquote>
  <p>Make the benchmark.py script multithreaded so it can run several games at the same time</p>
</blockquote>

<p>When I tried the first implementation, the result was way worse than the single-thread version. So I tried again:</p>

<blockquote>
  <p>There is some problem with the multithreading. Now the execution takes way longer. Anaylize the code with detail and identify the issues.</p>
</blockquote>

<p>It finally creates a multithreaded version that improves speed.</p>

<p>The Minimax agent beats the heuristic, although not for that much:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
</pre></td><td class="rouge-code"><pre>Select Player 1 <span class="o">(</span>X<span class="o">)</span>:

 1. Random
 2. Heuristic
 3. Minimax

Enter choice: 2

Select Player 2 <span class="o">(</span>O<span class="o">)</span>:

 1. Random
 2. Heuristic
 3. Minimax

Enter choice: 3
Select Player 2 <span class="o">(</span>O<span class="o">)</span> minimax depth <span class="o">[</span>4]:

Number of games <span class="o">[</span>1000]: 10000
Workers <span class="o">[</span>24]:

Running 10000 games on 24 workers: Heuristic vs Minimax<span class="o">(</span><span class="nv">d</span><span class="o">=</span>4<span class="o">)</span>

 4800/10000 games completed
 7400/10000 games completed
 10000/10000 games completed

<span class="o">========================================</span>
Results
<span class="o">========================================</span>

Heuristic <span class="o">(</span>Player 1<span class="o">)</span>:
 Wins:   3904 <span class="o">(</span> 39.0%<span class="o">)</span>

Minimax<span class="o">(</span><span class="nv">d</span><span class="o">=</span>4<span class="o">)</span> <span class="o">(</span>Player 2<span class="o">)</span>:
 Wins:   4818 <span class="o">(</span> 48.2%<span class="o">)</span>

Draws:     1278 <span class="o">(</span> 12.8%<span class="o">)</span>

Total:    10000 games
<span class="o">========================================</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This version of the code is <a href="https://github.com/javiercancela/connect-4/tree/v2">here</a>.</p>]]></content><author><name></name></author><category term="reinforcement-learning" /><category term="vibecoding" /><category term="connect-4" /><summary type="html"><![CDATA[With the basic engine created, the next step is adding a couple of basic non-RL agents.]]></summary></entry><entry><title type="html">Reinforcement Learning II - Basic algorithms</title><link href="https://javiercancela.com/2026/02/17/the-basics.html" rel="alternate" type="text/html" title="Reinforcement Learning II - Basic algorithms" /><published>2026-02-17T00:00:00+00:00</published><updated>2026-02-17T00:00:00+00:00</updated><id>https://javiercancela.com/2026/02/17/the-basics</id><content type="html" xml:base="https://javiercancela.com/2026/02/17/the-basics.html"><![CDATA[<p>With the basic engine <a href="/2026/01/25/connect-4.html">created</a>, the next step is adding a couple of basic non-RL agents.</p>

<h1 id="the-heuristic-agent">The heuristic agent</h1>

<p>The first agent will follow a simple heuristic: make the most central move possible. For this, I wrote this prompt:</p>

<blockquote>
  <p>Add a new “heuristic” agent. The agent:</p>
  <ul>
    <li>if it has a winning move, it plays it</li>
    <li>if the opponent has one winning move, it blocks it</li>
    <li>if a move gives the opponent a winning move, it avoids it</li>
    <li>for the rest of the cases, it prioritices central moves, the more in the center the better</li>
    <li>for ties, choose a random move</li>
  </ul>
</blockquote>

<p>It implements it without a hitch. I test it against the random agent, playing the heuristic as the second player:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="rouge-code"><pre>Running 1000 games: Random vs Heuristic

 100/1000 games completed
 200/1000 games completed
 300/1000 games completed
 400/1000 games completed
 500/1000 games completed
 600/1000 games completed
 700/1000 games completed
 800/1000 games completed
 900/1000 games completed
 1000/1000 games completed

<span class="o">========================================</span>
Results
<span class="o">========================================</span>

Random <span class="o">(</span>Player 1<span class="o">)</span>:
 Wins:     13 <span class="o">(</span>  1.3%<span class="o">)</span>

Heuristic <span class="o">(</span>Player 2<span class="o">)</span>:
 Wins:    987 <span class="o">(</span> 98.7%<span class="o">)</span>

Draws:        0 <span class="o">(</span>  0.0%<span class="o">)</span>

Total:     1000 games
<span class="o">========================================</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That simple rule makes all the difference, and all of a sudden, playing randomly is an almost sure defeat.</p>

<h1 id="the-minimax-agent">The minimax agent</h1>

<p>I (vibe)coded the heuristic agent in January, but I didn’t continue the post until February, when the new Claude Opus 4.6 and GPT-5.3-Codex were <a href="https://x.com/CrypSaf/status/2019487372811940340?s=20">simultaneously released</a>. So the new code comes from these two models (mostly GPT).</p>

<p>My next prompt:</p>
<blockquote>
  <p>Add a minimax agent to use as baseline for future Q-Learning agents (like the heuristic agent)</p>
</blockquote>

<p>Minimax is an algorithm that can be used in two-player games with perfect information. The basic idea is to build a tree with all possible moves. If I’m the first player (the maximizer), at the beginning of the game, the board is empty, and that is the root of the tree.</p>

<p>From there, we get seven edges, one for one of my possible moves, that end in nodes corresponding to the state of the board after that move (a piece at the bottom of the selected column). From each node grow seven edges with the possible moves of my opponent (the minimizer), and so on.</p>

<p>This calculation repeats until all edges get to a terminal state (win, lose, or draw), or we get to a predefined evaluation limit (we can build the whole tree because it is computationally too expensive)</p>

<p>Once we have the tree, we compute a score for each leaf (the terminal nodes), and we choose one of them: if the edges getting to the nodes correspond to moves for the adversary (minimizer nodes), we choose the lowest score, if to my moves (maximizer nodes), we choose the highest one. The reason is that we’ll compute the score so that the higher the score, the better for the maximizer player, and we assume the minizer we’ll choose the move we evaluate as the worst for us.</p>

<figure><img src="/assets/images/2027-02-01-the-basics/2026-02-08-22-35-36.png" alt="TODO" /><figcaption>Example image tree. <a href="https://www.researchgate.net/figure/MiniMax-Generated-Tree-3_fig1_360872512">Source</a>.</figcaption></figure>
<p><br /></p>

<p>And how do we compute the score? We also use a heuristic for this. The one Codex chose (the idea seems logical, the detail may be suboptimal, or maybe not) is to add points based on how many pieces in a row you have, with an additional score based on how central your pieces are. This is how Claude explains the evaluation function:</p>
<figure><img src="/assets/images/2027-02-01-the-basics/2026-02-08-22-43-04.png" alt="Claude in Cursor" /><figcaption>Claude in Cursor</figcaption></figure>
<p><br /></p>

<p>The application lets you choose the depth of the tree, four by default. It works, but it is a little slow. I imagine that I should be using all 24 cores for this, so I prompted Codex again:</p>
<blockquote>
  <p>Make the benchmark.py script multithreaded so it can run several games at the same time</p>
</blockquote>

<p>When I tried the first implementation, the result was way worse than the single-thread version. So I tried again:</p>

<blockquote>
  <p>There is some problem with the multithreading. Now the execution takes way longer. Anaylize the code with detail and identify the issues.</p>
</blockquote>

<p>It finally creates a multithreaded version that improves speed.</p>

<p>The Minimax agent beats the heuristic, although not for that much:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
</pre></td><td class="rouge-code"><pre>Select Player 1 <span class="o">(</span>X<span class="o">)</span>:

 1. Random
 2. Heuristic
 3. Minimax

Enter choice: 2

Select Player 2 <span class="o">(</span>O<span class="o">)</span>:

 1. Random
 2. Heuristic
 3. Minimax

Enter choice: 3
Select Player 2 <span class="o">(</span>O<span class="o">)</span> minimax depth <span class="o">[</span>4]:

Number of games <span class="o">[</span>1000]: 10000
Workers <span class="o">[</span>24]:

Running 10000 games on 24 workers: Heuristic vs Minimax<span class="o">(</span><span class="nv">d</span><span class="o">=</span>4<span class="o">)</span>

 4800/10000 games completed
 7400/10000 games completed
 10000/10000 games completed

<span class="o">========================================</span>
Results
<span class="o">========================================</span>

Heuristic <span class="o">(</span>Player 1<span class="o">)</span>:
 Wins:   3904 <span class="o">(</span> 39.0%<span class="o">)</span>

Minimax<span class="o">(</span><span class="nv">d</span><span class="o">=</span>4<span class="o">)</span> <span class="o">(</span>Player 2<span class="o">)</span>:
 Wins:   4818 <span class="o">(</span> 48.2%<span class="o">)</span>

Draws:     1278 <span class="o">(</span> 12.8%<span class="o">)</span>

Total:    10000 games
<span class="o">========================================</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This version of the code is <a href="https://github.com/javiercancela/connect-4/tree/v2">here</a>.</p>]]></content><author><name></name></author><category term="reinforcement-learning" /><category term="vibecoding" /><category term="connect-4" /><summary type="html"><![CDATA[With the basic engine created, the next step is adding a couple of basic non-RL agents.]]></summary></entry><entry><title type="html">Running a multimodal embedding model for image classification</title><link href="https://javiercancela.com/2026/02/07/image-classification.html" rel="alternate" type="text/html" title="Running a multimodal embedding model for image classification" /><published>2026-02-07T00:00:00+00:00</published><updated>2026-02-07T00:00:00+00:00</updated><id>https://javiercancela.com/2026/02/07/image-classification</id><content type="html" xml:base="https://javiercancela.com/2026/02/07/image-classification.html"><![CDATA[<p>I had the idea of creating an application to classify a big set of images at work. I wasn’t fully convinced that it was going to work because it required processing a couple of hundred thousand images, but it was worth a try.</p>

<h1 id="the-technical-approach">The technical approach</h1>

<p>I’m using an OpenCLIP model specifically trained in the domain I’m interested in. OpenCLIP models are dual-encoder architectures: one image encoder and one text encoder, with both encoders mapping their inputs into a shared embedding space. They are trained with <a href="https://medium.com/data-science/contrastive-loss-explaned-159f2d4a87ec">contrastive loss</a>. Given a batch of image–text pairs, the model produces normalized embeddings:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>vᵢ = image_encoder(Iᵢ)   ∈ ℝᵈ
tᵢ = text_encoder(Tᵢ)    ∈ ℝᵈ
||vᵢ|| = ||tᵢ|| = 1
</pre></td></tr></tbody></table></code></pre></div></div>
<p>A similarity matrix is built over all image–text combinations in the batch:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>Sᵢⱼ = vᵢ · tⱼ
</pre></td></tr></tbody></table></code></pre></div></div>
<p>where τ is a learned temperature, cross-entropy loss is applied in both directions (image→text and text→image) to maximize similarity for matched pairs and minimize it against all other in-batch pairs. After training, the model supports zero-shot image–text matching via embedding similarity.</p>

<p>This illustration from the <a href="https://arxiv.org/abs/2103.00020">OpenAI original paper</a> makes a good representation of the process:</p>
<figure><img src="/assets/images/2027-02-01-the-basics copy/2026-02-07-18-32-19.png" alt="TODO" /><figcaption>Original text: <em>Figure 1. Summary of our approach. While standard image models jointly train an image feature extractor and a linear classifier to predict some label, CLIP jointly trains an image encoder and a text encoder to predict the correct pairings of a batch of (image, text) training examples. At test time, the learned text encoder synthesizes a zero-shot linear classifier by embedding the names or descriptions of the target dataset’s classes.</em></figcaption></figure>
<p><br /></p>

<p>The code for the paper is in <a href="https://github.com/OpenAI/CLIP">this repo</a>. There are lots of available models derived from this architecture, some of which can be <a href="https://huggingface.co/models?library=open_clip">found on HuggingFace</a>.</p>

<h1 id="the-application">The application</h1>

<p>All the code for the application was written by either Claude Code or OpenAI Codex, using Gemini to double-check some things. I took the role of a requirements analyst (this was a thing in the past), and just looked at the code in critical sections (database access, credentials management).</p>

<p>For the first step, we scale the image to 224px in the smaller dimension and center-crop it to make the image square, as the model expects. After that, we use a producer-consumer pattern to retrieve images and process them in batches through the model. We store the resulting embeddings in a parquet file (good enough!) with the image id.</p>

<p>For the classification step, we need a list of prompts describing each category in different ways. I used three or four descriptions for each category, and even tried with negative descriptions, even though I don’t think they helped. I also added some neutral descriptions to act as a threshold.</p>

<p>For each set of descriptions, we calculate the normalized mean:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre> <span class="n">embeddings</span> <span class="o">=</span> <span class="nf">encode_texts</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">prompts</span><span class="p">,</span> <span class="n">device</span><span class="p">)</span>
 <span class="n">mean_emb</span> <span class="o">=</span> <span class="n">embeddings</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># axis=0 calculates the mean of each feature based on the value of that feature for each of the embedding vectors
</span> <span class="n">mean_emb</span> <span class="o">=</span> <span class="n">mean_emb</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="nf">norm</span><span class="p">(</span><span class="n">mean_emb</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The algorithm for the classification calculates the cosine similarity between image embeddings and description embeddings, and decides based on a score. Something like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre><span class="k">for</span> <span class="n">img_emb</span> <span class="ow">in</span> <span class="n">image_embeddings</span><span class="p">:</span>
  <span class="n">img_categories</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
  <span class="n">neutral_sim</span> <span class="o">=</span> <span class="nf">float</span><span class="p">(</span><span class="n">img_emb</span> <span class="o">@</span> <span class="n">neutral_embedding</span><span class="p">)</span>

  <span class="k">for</span> <span class="n">title</span><span class="p">,</span> <span class="p">(</span><span class="n">pos_emb</span><span class="p">,</span> <span class="n">neg_emb</span><span class="p">)</span> <span class="ow">in</span> <span class="n">category_embeddings</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
    <span class="n">pos_sim</span> <span class="o">=</span> <span class="nf">float</span><span class="p">(</span><span class="n">img_emb</span> <span class="o">@</span> <span class="n">pos_emb</span><span class="p">)</span>
    <span class="n">neg_sim</span> <span class="o">=</span> <span class="nf">float</span><span class="p">(</span><span class="n">img_emb</span> <span class="o">@</span> <span class="n">neg_emb</span><span class="p">)</span>
    <span class="n">score</span> <span class="o">=</span> <span class="n">pos_sim</span> <span class="o">-</span> <span class="n">neg_sim</span>
    <span class="n">threshold</span> <span class="o">=</span> <span class="n">category_thresholds</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">score_threshold</span><span class="p">)</span>

    <span class="c1"># Match if score exceeds threshold and positive beats neutral by margin.
</span>    <span class="k">if</span> <span class="n">score</span> <span class="o">&gt;=</span> <span class="n">threshold</span> <span class="ow">and</span> <span class="n">pos_sim</span> <span class="o">&gt;=</span> <span class="n">neutral_sim</span> <span class="o">+</span> <span class="n">neutral_margin</span><span class="p">:</span>
      <span class="n">img_categories</span><span class="p">.</span><span class="nf">append</span><span class="p">((</span><span class="n">title</span><span class="p">,</span> <span class="n">score</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>There is a lot to improve in this exercise, but the overall results were positive. I will probably try one or two different approaches to solve this same problem, but at least I know that there is one thing that mostly works.</p>]]></content><author><name></name></author><category term="vertex-ai" /><summary type="html"><![CDATA[I had the idea of creating an application to classify a big set of images at work. I wasn’t fully convinced that it was going to work because it required processing a couple of hundred thousand images, but it was worth a try.]]></summary></entry><entry><title type="html">Reinforcement Learning I - Connect 4</title><link href="https://javiercancela.com/2026/01/25/connect-4.html" rel="alternate" type="text/html" title="Reinforcement Learning I - Connect 4" /><published>2026-01-25T00:00:00+00:00</published><updated>2026-01-25T00:00:00+00:00</updated><id>https://javiercancela.com/2026/01/25/connect-4</id><content type="html" xml:base="https://javiercancela.com/2026/01/25/connect-4.html"><![CDATA[<p>I’m going to vibe code a <a href="https://en.wikipedia.org/wiki/Connect_Four">Connect 4</a> game. Not because I specially like Connect-4, but because Connect-4 looks like a good game to learn <a href="https://spinningup.openai.com/en/latest/spinningup/rl_intro.html">Reinforcement Learning</a>: the world is the board, the agent is the player, and the actions are the plays. The final reward is, of course, winning the game. It isn’t trivial like tic-tac-toe, but it is orders of magnitude smaller than chess or go.</p>

<p>I will try not to touch a single line of code or documentation. Everything will be created by Claude Code or OpenAI Codex (depending on my mood), and committed as is to <a href="https://github.com/javiercancela/connect-4">this public repository</a>. I’m reusing an old repo from a previous non-vibe coding attempt. I removed all the code, but not the previous commits.</p>

<h1 id="vibe-coding">Vibe coding</h1>

<p>After cloning the repo and launching Claude Code, I gave it the following prompt (typos included; most prompts contain typos and other errors because English is not my native language, and I didn’t want to spend time double-checking or using Grammarly, as I did for this text):</p>

<blockquote>
  <p>Write a game engine for a connect-4 game. The engine should contain only the logic for setting moves, checking results, tracking who should play, but no graphical representation and no algorithm to decide the best move. Only an interface to check status, heck current player, and play a move. The engine will be used to train different Reinforcement Learning models in the future.</p>
</blockquote>

<p>The result looks good, but all the code is in just one file, and is full of comments. My preference is having multiple small files with specialized code, and relying on naming for documentation, using comments only for complex logic. So my next prompt is:</p>

<blockquote>
  <p>Refactor the code with the following goals:</p>
  <ul>
    <li>Small files or classes, with very limited scopes: for example, game and board should be different classes. Try to make classes and modules as small in scope as possible,  while still being useful.</li>
    <li>Remove comments unless there is complex logic to explain. With limited scope, method and variable names should be self-explaining.</li>
  </ul>
</blockquote>

<p>The result is in <a href="https://github.com/javiercancela/connect-4/tree/2c820f7611c592acc697afdea989b49ba2f01288">this commit</a>. I think the code is ok. It includes things like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>    <span class="k">def</span> <span class="nf">get_state_flat</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">self</span><span class="p">.</span><span class="n">_board</span><span class="p">.</span><span class="nf">get_grid</span><span class="p">().</span><span class="nf">flatten</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This code is unnecessary now, but it will be useful to train the models.</p>

<p>Next step is updating the documentation. I’m instructing Claude to use an AGENTS.md instead of its default CLAUDE.md, to make it more model-agnostic (no prompt without its typo):</p>

<blockquote>
  <p>Update the documentation. README.md with a brier summary of the app and instructions to test and run. AGENTS.md with full information for LLM agents.</p>
</blockquote>

<figure><img src="/assets/images/2026-02-02-connect-4/2026-01-25-19-12-36.png" alt="Not bad" /><figcaption>Not bad</figcaption></figure>
<p><br /></p>

<p>Ok, now we need someone to play against. The baseline model will be one player making random moves. Any future model should perform better than random.</p>

<blockquote>
  <p>Add the first algorithm, a model that just plays a random move. The random algorithm will just choose one of the legals move at random.</p>
</blockquote>

<p><a href="https://github.com/javiercancela/connect-4/tree/a3bcf93af19e97d9a2c7eeab013999e088d4ad52">Code looks good</a>, but we need a way to test it. I want to play against any model, so we add code to do so:</p>

<blockquote>
  <p>Now add a new module to play a game. The program will show the available agents to play with, and the player will select one, as well as the option to be player one or two.
 The game will be shown in the console, where the player will press the keys 1 to 7 to select the move.</p>
</blockquote>

<p><a href="https://github.com/javiercancela/connect-4/tree/d0baa2d83f1c4335ad1b96e006f66ab7d7b1910b">It works</a>!, but it is slightly confusing because columns are named 0-6, and you don’t know which pieces are yours. So let’s fix it:</p>

<blockquote>
  <p>Modify the board to show columns as 1-7 instead of 0-6. Also, show a legend with the symbol for the player (X or O)</p>
</blockquote>

<p>Done! Now it looks good:</p>

<figure><img src="/assets/images/2026-02-02-connect-4/2026-01-25-11-51-32.png" alt="Exciting game!" /><figcaption>Exciting game!</figcaption></figure>
<p><br /></p>

<p>The final step is creating a way to test the models by playing multiple games:</p>

<blockquote>
  <p>create a module to test models. The module will run two models against each other, one with white and the other with black, or even the same model against itself. The module will ask for the models to play and the number of games (1000 by default), and will show the final results for wins, loses, and draws both in number and percentage</p>
</blockquote>

<figure><img src="/assets/images/2026-02-02-connect-4/2026-01-25-19-21-32.png" alt="Nicely done!" /><figcaption>Nicely done!</figcaption></figure>
<p><br /></p>

<p>The first player has a slight advantage in the game by playing first, and that shows in the percentages.</p>

<p><a href="https://github.com/javiercancela/connect-4/tree/0a39cb6f7181e67737f6224f7763af464dda0466">This is the final version for today</a>. In total, I spent around half an hour on this code, and all the prompts I used are the ones in this post.</p>

<p>Training the models should be more challenging. We’ll see.</p>]]></content><author><name></name></author><category term="reinforcement-learning" /><category term="vibecoding" /><category term="connect-4" /><summary type="html"><![CDATA[I’m going to vibe code a Connect 4 game. Not because I specially like Connect-4, but because Connect-4 looks like a good game to learn Reinforcement Learning: the world is the board, the agent is the player, and the actions are the plays. The final reward is, of course, winning the game. It isn’t trivial like tic-tac-toe, but it is orders of magnitude smaller than chess or go.]]></summary></entry></feed>