<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Some Creativity</title>
	<atom:link href="https://blog.somecreativity.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.somecreativity.com</link>
	<description>Weblog of Sid Uppal</description>
	<lastBuildDate>Wed, 25 Feb 2026 06:35:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<site xmlns="com-wordpress:feed-additions:1">7388</site><cloud domain='blog.somecreativity.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>https://s0.wp.com/i/buttonw-com.png</url>
		<title>Some Creativity</title>
		<link>https://blog.somecreativity.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="https://blog.somecreativity.com/osd.xml" title="Some Creativity" />
	<atom:link rel='hub' href='https://blog.somecreativity.com/?pushpress=hub'/>
	<item>
		<title>Stop Building State Machines for Your AI Agents (use Durable Functions instead)</title>
		<link>https://blog.somecreativity.com/2026/02/24/stop-building-state-machines-for-your-ai-agents-use-durable-functions-instead/</link>
					<comments>https://blog.somecreativity.com/2026/02/24/stop-building-state-machines-for-your-ai-agents-use-durable-functions-instead/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Wed, 25 Feb 2026 06:31:15 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=2085</guid>

					<description><![CDATA[I built a&#160;sample&#160;that I think captures something important:&#160;AI agents that interact with the real world need workflows that pause, and Durable Functions make this much easier than current alternatives. The Problem Say you&#8217;re building a support agent. A customer asks for a refund. The agent can look up the order, check the return policy, and [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">I built a&nbsp;<a href="https://github.com/SidU/durable-support-agent">sample</a>&nbsp;that I think captures something important:&nbsp;AI agents that interact with the real world need workflows that pause, and Durable Functions make this much easier than current alternatives.</p>



<h4 class="wp-block-heading">The Problem</h4>



<p class="wp-block-paragraph">Say you&#8217;re building a support agent. A customer asks for a refund. The agent can look up the order, check the return policy, and decide a refund is warranted — but it can&#8217;t just&nbsp;<em>issue</em>&nbsp;the refund. A human needs to approve it.</p>



<p class="wp-block-paragraph"><strong>User in Teams:</strong></p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png"><img width="991" height="351" data-attachment-id="2089" data-permalink="https://blog.somecreativity.com/2026/02/24/stop-building-state-machines-for-your-ai-agents-use-durable-functions-instead/image-80/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png" data-orig-size="991,351" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=991" src="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=991" alt="" class="wp-image-2089" srcset="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png 991w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=768 768w" sizes="(max-width: 991px) 100vw, 991px" /></a></figure>



<p class="wp-block-paragraph"><strong>Supervisor Dashboard:</strong></p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png"><img width="1024" height="571" data-attachment-id="2091" data-permalink="https://blog.somecreativity.com/2026/02/24/stop-building-state-machines-for-your-ai-agents-use-durable-functions-instead/image-81/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png" data-orig-size="1289,720" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=1024" alt="" class="wp-image-2091" srcset="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png 1289w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">So now you need to:</p>



<ol class="wp-block-list">
<li>Save the pending request somewhere</li>



<li>Pause the workflow</li>



<li>Wait for a supervisor to approve or reject (could be hours or days)</li>



<li>Resume exactly where you left off</li>



<li>Process the refund and notify the customer</li>
</ol>



<p class="wp-block-paragraph">The typical approach? A state machine. You model every state (<code>pending_approval</code>,&nbsp;<code>approved</code>,&nbsp;<code>processing</code>,&nbsp;<code>completed</code>), every transition, and wire up polling or webhooks to detect when things change. You write a bunch of glue code to serialize context, handle edge cases, and coordinate between services.</p>



<p class="wp-block-paragraph">It works. It&#8217;s also tedious, error-prone, and obscures what&#8217;s actually a simple workflow.</p>



<h4 class="wp-block-heading">The Durable Functions Approach</h4>



<p class="wp-block-paragraph">Let&#8217;s start with the diagram. A customer asks the bot for a refund. The bot uses AI to look up the order, creates a case, and starts a Durable Functions orchestration that pauses until a supervisor approves or rejects it. Once approved, the orchestrator processes the refund and notifies the customer, all without polling or a state machine.&nbsp;</p>



<figure class="wp-block-image"><img data-attachment-id="2096" data-permalink="https://blog.somecreativity.com/2026/02/24/stop-building-state-machines-for-your-ai-agents-use-durable-functions-instead/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudcbdb3ntb3mgyxmg/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudc.jpg" data-orig-size="1904,1268" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="c2VxdWVuY2VEaWFncmFtCiAgICBwYXJ0aWNpcGFudCBVc2VyIGFzIFRlYW1zIFVzZXIKICAgIHBhcnRpY2lwYW50IEJvdCBhcyBUZWFtcyBCb3QKICAgIHBhcnRpY2lwYW50IE9wZW5BSSBhcyBPcGVuQUkgR1BULTRvCiAgICBwYXJ0aWNpcGFudCBDb3Ntb3MgYXMgQ29zbW9zIERCCiAgICBwYXJ0aWNpcGFudCBERiBhcyBEdXJhYmxlIEZ1bmN0aW9ucwogICAgcGFydGljaXBhbnQgRGFzaCBhcyBDYXNlIERhc2hib2FyZAogICAgcGFydGljaXBhbnQgU3VwIGFzIFN1cGVydmlzb3IKCiAgICBVc2VyLT4-Qm90OiBJIHdhcyBjaGFyZ2VkIHR3aWNlIGZvciBvcmRlciA0ODIxCiAgICBCb3QtPj5PcGVuQUk6IFNlbmQgbWVzc2FnZSArIHRvb2wgZGVmaW5pdGlvbnMKICAgIE9wZW5BSS0tPj5Cb3Q6IENhbGwgbG9va3VwX29yZGVyKDQ4MjEpCiAgICBCb3QtLT4-T3BlbkFJOiBPcmRlciBkYXRhIChpdGVtcywgdG90YWwsIHN0YXR1cykKICAgIE9wZW5BSS0tPj5Cb3Q6IENhbGwgaXNzdWVfcmVmdW5kKDQ4MjEsIDc5Ljk4LCAuLi4pCiAgICBCb3QtPj5Db3Ntb3M6IENyZWF0ZSBjYXNlIChzdGF0dXM6IHBlbmRpbmdfYXBwcm92YWwpCiAgICBCb3QtPj5ERjogU3RhcnQgb3JjaGVzdHJhdGlvbiAoaW5zdGFuY2VJZCA9IGNhc2VJZCkKICAgIEJvdC0tPj5Vc2VyOiBSZWZ1bmQgc3VibWl0dGVkLCBwZW5kaW5nIGFwcHJvdmFsLiBDYXNlOiBjYXNlLWFiYzEyMwoKICAgIE5vdGUgb3ZlciBERjogT3JjaGVzdHJhdG9yIGNhbGxzIHdhaXRGb3JFeHRlcm5hbEV2ZW50KEFwcHJvdmFsKTxici8-UGF1c2VzIGhlcmUg4oCUIGNvc3RzIG5vdGhpbmcgd2hpbGUgd2FpdGluZwoKICAgIE5vdGUgb3ZlciBVc2VyLEJvdDogQm90IHJlbWFpbnMgZnVsbHkgcmVzcG9uc2l2ZS48YnIvPlVzZXIgY2FuIGFzayBvdGhlciBxdWVzdGlvbnMsIGNoZWNrIGNhc2Ugc3RhdHVzLCBldGMuCgogICAgU3VwLT4-RGFzaDogT3BlbnMgQ2FzZSBEYXNoYm9hcmQKICAgIERhc2gtPj5Db3Ntb3M6IEdFVCAvYXBpL2Nhc2VzIChwb2xscyBldmVyeSA1cykKICAgIENvc21vcy0tPj5EYXNoOiBQZW5kaW5nIGNhc2VzCiAgICBEYXNoLS0-PlN1cDogU2hvd3MgY2FzZXMgdGFibGUKICAgIFN1cC0-PkRhc2g6IENsaWNrcyBBcHByb3ZlIG9uIGNhc2UtYWJjMTIzCiAgICBEYXNoLT4-REY6IFBPU1QgL2FwaS9jYXNlcy9jYXNlLWFiYzEyMy9hcHByb3ZlCiAgICBERi0-PkRGOiByYWlzZUV2ZW50KEFwcHJvdmFsLCBhcHByb3ZlZDogdHJ1ZSkKCiAgICBOb3RlIG92ZXIgREY6IE9yY2hlc3RyYXRvciByZXN1bWVzCgogICAgREYtPj5Db3Ntb3M6IHVwZGF0ZUNhc2UgdG8gYXBwcm92ZWQKICAgIERGLT4-REY6IGlzc3VlUmVmdW5kIChzaW11bGF0ZWQpCiAgICBERi0-PkNvc21vczogdXBkYXRlQ2FzZSB0byBjb21wbGV0ZWQKICAgIERGLT4-Qm90OiBQT1NUIC9hcGkvbm90aWZ5IChwcm9hY3RpdmUgbWVzc2FnZSkKICAgIEJvdC0tPj5Vc2VyOiBZb3VyIHJlZnVuZCBvZiA3OS45OCBoYXMgYmVlbiBhcHByb3ZlZCE" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudc.jpg?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudc.jpg?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2026/02/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudc.jpg" alt="Sequence diagram showing the full refund workflow — from customer message through AI tool calling, Durable Functions orchestration, supervisor approval, and proactive notification" class="wp-image-2096" /></figure>



<p class="wp-block-paragraph">Here&#8217;s the entire approval workflow in my sample:</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre><code class="language-typescript"><div class="cm-line"><span class="tok-keyword">export</span> <span class="tok-keyword">const</span> <span class="tok-variableName tok-definition">supportCaseOrchestrator</span><span class="tok-punctuation">:</span> <span class="tok-typeName">OrchestrationHandler</span> <span class="tok-operator">=</span> <span class="tok-keyword">function</span><span class="tok-keyword">*</span> <span class="tok-punctuation">(</span><span class="tok-variableName tok-definition">context</span><span class="tok-punctuation">)</span> <span class="tok-punctuation">{</span></div><div class="cm-line">  <span class="tok-keyword">const</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName">caseId</span><span class="tok-punctuation">,</span> <span class="tok-propertyName">action</span> <span class="tok-punctuation">}</span> <span class="tok-operator">=</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">getInput</span><span class="tok-punctuation">(</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line"></div><div class="cm-line">  <span class="tok-comment">// Mark as pending</span></div><div class="cm-line">  <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">callActivity</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;updateCase&apos;</span><span class="tok-punctuation">,</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName tok-definition">caseId</span><span class="tok-punctuation">,</span> <span class="tok-propertyName tok-definition">status</span><span class="tok-punctuation">:</span> <span class="tok-string">&apos;pending_approval&apos;</span> <span class="tok-punctuation">}</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line"></div><div class="cm-line">  <span class="tok-comment">// Wait for a human — costs nothing while paused</span></div><div class="cm-line">  <span class="tok-keyword">const</span> <span class="tok-variableName tok-definition">approvalTask</span> <span class="tok-operator">=</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">waitForExternalEvent</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;Approval&apos;</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">  <span class="tok-keyword">const</span> <span class="tok-variableName tok-definition">timeoutTask</span> <span class="tok-operator">=</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">createTimer</span><span class="tok-punctuation">(</span><span class="tok-variableName">sevenDaysFromNow</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">  <span class="tok-keyword">const</span> <span class="tok-variableName tok-definition">winner</span> <span class="tok-operator">=</span> <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">Task</span><span class="tok-operator">.</span><span class="tok-propertyName">any</span><span class="tok-punctuation">(</span><span class="tok-punctuation">[</span><span class="tok-variableName">approvalTask</span><span class="tok-punctuation">,</span> <span class="tok-variableName">timeoutTask</span><span class="tok-punctuation">]</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line"></div><div class="cm-line">  <span class="tok-keyword">if</span> <span class="tok-punctuation">(</span><span class="tok-variableName">winner</span> <span class="tok-operator">===</span> <span class="tok-variableName">approvalTask</span> <span class="tok-operator">&amp;&amp;</span> <span class="tok-variableName">approvalTask</span><span class="tok-operator">.</span><span class="tok-propertyName">result</span><span class="tok-operator">.</span><span class="tok-propertyName">approved</span><span class="tok-punctuation">)</span> <span class="tok-punctuation">{</span></div><div class="cm-line">    <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">callActivity</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;updateCase&apos;</span><span class="tok-punctuation">,</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName tok-definition">caseId</span><span class="tok-punctuation">,</span> <span class="tok-propertyName tok-definition">status</span><span class="tok-punctuation">:</span> <span class="tok-string">&apos;approved&apos;</span> <span class="tok-punctuation">}</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">    <span class="tok-keyword">if</span> <span class="tok-punctuation">(</span><span class="tok-variableName">action</span> <span class="tok-operator">===</span> <span class="tok-string">&apos;refund&apos;</span><span class="tok-punctuation">)</span> <span class="tok-punctuation">{</span></div><div class="cm-line">      <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">callActivity</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;issueRefund&apos;</span><span class="tok-punctuation">,</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName tok-definition">caseId</span> <span class="tok-punctuation">}</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">    <span class="tok-punctuation">}</span></div><div class="cm-line">    <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">callActivity</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;notifyBot&apos;</span><span class="tok-punctuation">,</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName tok-definition">caseId</span><span class="tok-punctuation">,</span> <span class="tok-propertyName tok-definition">message</span><span class="tok-punctuation">:</span> <span class="tok-string">&apos;Approved!&apos;</span> <span class="tok-punctuation">}</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">  <span class="tok-punctuation">}</span> <span class="tok-keyword">else</span> <span class="tok-punctuation">{</span></div><div class="cm-line">    <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">callActivity</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;updateCase&apos;</span><span class="tok-punctuation">,</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName tok-definition">caseId</span><span class="tok-punctuation">,</span> <span class="tok-propertyName tok-definition">status</span><span class="tok-punctuation">:</span> <span class="tok-string">&apos;rejected&apos;</span> <span class="tok-punctuation">}</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">    <span class="tok-keyword">yield</span> <span class="tok-variableName">context</span><span class="tok-operator">.</span><span class="tok-propertyName">df</span><span class="tok-operator">.</span><span class="tok-propertyName">callActivity</span><span class="tok-punctuation">(</span><span class="tok-string">&apos;notifyBot&apos;</span><span class="tok-punctuation">,</span> <span class="tok-punctuation">{</span> <span class="tok-propertyName tok-definition">caseId</span><span class="tok-punctuation">,</span> <span class="tok-propertyName tok-definition">message</span><span class="tok-punctuation">:</span> <span class="tok-string">&apos;Rejected.&apos;</span> <span class="tok-punctuation">}</span><span class="tok-punctuation">)</span><span class="tok-punctuation">;</span></div><div class="cm-line">  <span class="tok-punctuation">}</span></div><div class="cm-line"><span class="tok-punctuation">}</span><span class="tok-punctuation">;</span></div><div class="cm-line"></div><div class="cm-line"></div></code></pre>
		</div>
	</div>
</div>


<p class="wp-block-paragraph">That&#8217;s it. Read it top to bottom: it&#8217;s just the workflow. No state machine. No polling. No webhook plumbing. The orchestrator pauses at&nbsp;<code>waitForExternalEvent</code>, serializes its state, and stops executing entirely. </p>



<p class="wp-block-paragraph">When a supervisor clicks &#8220;Approve&#8221; in the dashboard,&nbsp;the dashboard calls the Durable Functions HTTP API with:</p>


<div class="wp-block-code">
	<div class="cm-editor">
		<div class="cm-scroller">
			
<pre><code><div class="cm-line">raiseEvent(&apos;Approval&apos;, { approved: true })</div></code></pre>
		</div>
	</div>
</div>


<p class="wp-block-paragraph">passing the case ID. The framework matches this to the paused orchestration instance, deserializes its state, and resumes execution from the exact yield where it was waiting. The orchestrator then runs the remaining steps — update the case, process the refund, notify the customer — as if no time had passed.</p>



<p class="wp-block-paragraph"><strong>Key:</strong>&nbsp;<code>waitForExternalEvent</code>&nbsp;costs nothing while waiting. No process running. No timer ticking. No compute billed. Each customer&#8217;s case gets its own orchestration instance, waiting independently.</p>



<h4 class="wp-block-heading">Why This Matters for AI Agents</h4>



<p class="wp-block-paragraph">As we build agents that do more than just answer questions, agents that take actions, trigger workflows, and interact with external systems, we&#8217;re going to hit this pattern constantly:</p>



<ul class="wp-block-list">
<li><strong>Refund approvals</strong>: agent submits, human approves</li>



<li><strong>Deployment requests</strong>: agent prepares a change, human confirms</li>



<li><strong>Escalations</strong>: agent triages, human takes over</li>



<li><strong>Multi-step processes</strong>: agent starts, waits for external data, continues</li>
</ul>



<p class="wp-block-paragraph">Every one of these is a &#8220;pause and wait&#8221; problem. You&nbsp;<em>could</em>&nbsp;solve each one with a state machine, a database, and some glue code. Or you could write the workflow as a straight-line function and let the infrastructure handle the rest.</p>



<h4 class="wp-block-heading">What About the Alternatives?</h4>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th class="has-text-align-left" data-align="left">Approach</th><th class="has-text-align-left" data-align="left">How it works</th><th class="has-text-align-left" data-align="left">Why it hurts</th></tr></thead><tbody><tr><td><strong>Polling loop</strong></td><td>Bot checks a &#8220;pending&#8221; flag in a database every N seconds</td><td>Wastes compute. 1,000 pending cases = 1,000 polling loops. Latency depends on poll interval.</td></tr><tr><td><strong>Queue + worker</strong></td><td>Bot writes to a queue; worker picks up after approval</td><td>You build the state machine yourself: track which step each case is on, handle retries, deal with poison messages. &#8220;Wait for approval&#8221; doesn&#8217;t map naturally to a queue.</td></tr><tr><td><strong>Webhook callback</strong></td><td>Bot registers a callback URL; approval service calls it</td><td>Bot must be running when the callback arrives hours later. If it restarts, the callback URL may be stale. No built-in retry or state tracking.</td></tr><tr><td><strong>Database + cron</strong></td><td>Store pending cases in DB, cron job checks for approved ones</td><td>Same polling problem. Cron frequency = latency floor. State machine lives in application code. Error handling is manual.</td></tr><tr><td><strong>Durable Functions</strong></td><td><code>waitForExternalEvent</code>&nbsp;pauses at zero cost;&nbsp;<code>raiseEvent</code>&nbsp;resumes instantly</td><td>Requires Azure Functions runtime. But: no polling, no state machine code, built-in retry, scales to thousands of concurrent cases.</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">Durable Functions win here because:</p>



<ul class="wp-block-list">
<li><strong>Zero-cost waiting</strong>: a case pending for 3 days uses no compute until approved</li>



<li><strong>No state machine</strong>: the orchestrator reads like a sequential function, but the framework handles checkpointing, replay, and fault tolerance</li>



<li><strong>Parallel independence</strong>: Alice&#8217;s refund and Bob&#8217;s escalation are separate instances; approving one doesn&#8217;t affect the other</li>
</ul>



<h4 class="wp-block-heading">The Full Sample</h4>



<p class="wp-block-paragraph">The&nbsp;<a href="https://github.com/SidU/durable-support-agent">durable-support-agent</a>&nbsp;sample has three pieces:</p>



<ol class="wp-block-list">
<li><strong>A Teams bot</strong>&nbsp;that uses GPT-4o with tool calling to handle customer support — order lookups, knowledge base search, refund requests, escalations</li>



<li><strong>Azure Durable Functions</strong>&nbsp;that orchestrate the approval workflow with zero-cost pausing</li>



<li><strong>A Next.js dashboard</strong>&nbsp;where supervisors approve or reject pending cases</li>
</ol>



<p class="wp-block-paragraph">The whole thing runs locally. The bot creates cases, the orchestrator pauses, the dashboard lets you approve, and the customer gets notified, all coordinated through a workflow you can read in 30 lines.</p>



<p class="wp-block-paragraph">If you&#8217;re building agents that need human-in-the-loop workflows, give Durable Functions a look. </p>



<h4 class="wp-block-heading">Learn More</h4>



<ul class="wp-block-list">
<li><a href="https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview">Azure Durable Functions overview</a>&nbsp;— what they are and how they work</li>



<li><a href="https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=in-process%2Cnodejs-v3%2Cv1-model&amp;pivots=csharp#human">Human interaction pattern</a>&nbsp;— the exact pattern used in this sample (<code>waitForExternalEvent</code>&nbsp;+&nbsp;<code>raiseEvent</code>)</li>



<li><a href="https://learn.microsoft.com/en-us/azure/azure-functions/durable/quickstart-js-vscode">Durable Functions for JavaScript/TypeScript</a>&nbsp;— quickstart for the Node.js SDK</li>



<li><a href="https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-code-constraints">Orchestrator function constraints</a>&nbsp;— rules for deterministic replay (important to understand before writing orchestrators)</li>



<li><a href="https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-timers">Timers in Durable Functions</a>&nbsp;— how&nbsp;<code>createTimer</code>&nbsp;works for timeouts and deadlines</li>



<li><a href="https://github.com/SidU/durable-support-agent">durable-support-agent sample</a>&nbsp;— the full source code for this post</li>
</ul>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2026/02/24/stop-building-state-machines-for-your-ai-agents-use-durable-functions-instead/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2085</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2026/02/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudc.jpg" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2026/02/c2vxdwvuy2veawfncmftciagicbwyxj0awnpcgfudcbvc2vyigfzifrlyw1zifvzzxikicagihbhcnrpy2lwyw50iejvdcbhcybuzwftcybcb3qkicagihbhcnrpy2lwyw50ie9wzw5bssbhcybpcgvuqukgr1bultrvciagicbwyxj0awnpcgfudc.jpg" medium="image">
			<media:title type="html">c2VxdWVuY2VEaWFncmFtCiAgICBwYXJ0aWNpcGFudCBVc2VyIGFzIFRlYW1zIFVzZXIKICAgIHBhcnRpY2lwYW50IEJvdCBhcyBUZWFtcyBCb3QKICAgIHBhcnRpY2lwYW50IE9wZW5BSSBhcyBPcGVuQUkgR1BULTRvCiAgICBwYXJ0aWNpcGFudCBDb3Ntb3MgYXMgQ29zbW9zIERCCiAgICBwYXJ0aWNpcGFudCBERiBhcyBEdXJhYmxlIEZ1bmN0aW9ucwogICAgcGFydGljaXBhbnQgRGFzaCBhcyBDYXNlIERhc2hib2FyZAogICAgcGFydGljaXBhbnQgU3VwIGFzIFN1cGVydmlzb3IKCiAgICBVc2VyLT4-Qm90OiBJIHdhcyBjaGFyZ2VkIHR3aWNlIGZvciBvcmRlciA0ODIxCiAgICBCb3QtPj5PcGVuQUk6IFNlbmQgbWVzc2FnZSArIHRvb2wgZGVmaW5pdGlvbnMKICAgIE9wZW5BSS0tPj5Cb3Q6IENhbGwgbG9va3VwX29yZGVyKDQ4MjEpCiAgICBCb3QtLT4-T3BlbkFJOiBPcmRlciBkYXRhIChpdGVtcywgdG90YWwsIHN0YXR1cykKICAgIE9wZW5BSS0tPj5Cb3Q6IENhbGwgaXNzdWVfcmVmdW5kKDQ4MjEsIDc5Ljk4LCAuLi4pCiAgICBCb3QtPj5Db3Ntb3M6IENyZWF0ZSBjYXNlIChzdGF0dXM6IHBlbmRpbmdfYXBwcm92YWwpCiAgICBCb3QtPj5ERjogU3RhcnQgb3JjaGVzdHJhdGlvbiAoaW5zdGFuY2VJZCA9IGNhc2VJZCkKICAgIEJvdC0tPj5Vc2VyOiBSZWZ1bmQgc3VibWl0dGVkLCBwZW5kaW5nIGFwcHJvdmFsLiBDYXNlOiBjYXNlLWFiYzEyMwoKICAgIE5vdGUgb3ZlciBERjogT3JjaGVzdHJhdG9yIGNhbGxzIHdhaXRGb3JFeHRlcm5hbEV2ZW50KEFwcHJvdmFsKTxici8-UGF1c2VzIGhlcmUg4oCUIGNvc3RzIG5vdGhpbmcgd2hpbGUgd2FpdGluZwoKICAgIE5vdGUgb3ZlciBVc2VyLEJvdDogQm90IHJlbWFpbnMgZnVsbHkgcmVzcG9uc2l2ZS48YnIvPlVzZXIgY2FuIGFzayBvdGhlciBxdWVzdGlvbnMsIGNoZWNrIGNhc2Ugc3RhdHVzLCBldGMuCgogICAgU3VwLT4-RGFzaDogT3BlbnMgQ2FzZSBEYXNoYm9hcmQKICAgIERhc2gtPj5Db3Ntb3M6IEdFVCAvYXBpL2Nhc2VzIChwb2xscyBldmVyeSA1cykKICAgIENvc21vcy0tPj5EYXNoOiBQZW5kaW5nIGNhc2VzCiAgICBEYXNoLS0-PlN1cDogU2hvd3MgY2FzZXMgdGFibGUKICAgIFN1cC0-PkRhc2g6IENsaWNrcyBBcHByb3ZlIG9uIGNhc2UtYWJjMTIzCiAgICBEYXNoLT4-REY6IFBPU1QgL2FwaS9jYXNlcy9jYXNlLWFiYzEyMy9hcHByb3ZlCiAgICBERi0-PkRGOiByYWlzZUV2ZW50KEFwcHJvdmFsLCBhcHByb3ZlZDogdHJ1ZSkKCiAgICBOb3RlIG92ZXIgREY6IE9yY2hlc3RyYXRvciByZXN1bWVzCgogICAgREYtPj5Db3Ntb3M6IHVwZGF0ZUNhc2UgdG8gYXBwcm92ZWQKICAgIERGLT4-REY6IGlzc3VlUmVmdW5kIChzaW11bGF0ZWQpCiAgICBERi0-PkNvc21vczogdXBkYXRlQ2FzZSB0byBjb21wbGV0ZWQKICAgIERGLT4-Qm90OiBQT1NUIC9hcGkvbm90aWZ5IChwcm9hY3RpdmUgbWVzc2FnZSkKICAgIEJvdC0tPj5Vc2VyOiBZb3VyIHJlZnVuZCBvZiA3OS45OCBoYXMgYmVlbiBhcHByb3ZlZCE</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-2.png?w=991" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2026/02/image-3.png?w=1024" medium="image" />
	</item>
		<item>
		<title>Giving OpenClaw Its Own Identity, And a Sandbox to Run In</title>
		<link>https://blog.somecreativity.com/2026/02/16/giving-openclaw-its-own-identity-and-a-sandbox-to-run-in/</link>
					<comments>https://blog.somecreativity.com/2026/02/16/giving-openclaw-its-own-identity-and-a-sandbox-to-run-in/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Mon, 16 Feb 2026 08:35:40 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[openclaw]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=2056</guid>

					<description><![CDATA[OpenClaw is an open-source AI agent framework that gives LLMs real tools and autonomy. It already has a built-in Teams channel, but it works as a traditional bot using delegated auth, meaning the agent acts with your permissions. OpenClaw A365 takes a different approach. Instead of a bot wearing your credentials, it gives the agent [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph"><a href="https://openclaw.ai">OpenClaw</a> is an open-source AI agent framework that gives LLMs real tools and autonomy. It already has a built-in Teams channel, but it works as a traditional bot using delegated auth, meaning the agent acts with your permissions.</p>



<p class="wp-block-paragraph"><a href="https://github.com/SidU/openclaw-a365"><strong>OpenClaw</strong> <strong>A365</strong></a> takes a different approach. Instead of a bot wearing your credentials, it gives the agent its own identity in your Microsoft 365 tenant, sandboxes its runtime, and makes every action observable to IT &#8211; all while extending its reach beyond Teams to Outlook, Word, Excel, and PowerPoint.</p>



<p class="wp-block-paragraph">Two things I kept coming back to while investigating OpenClaw:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</p>



<h4 class="wp-block-heading"><strong>1</strong>. <strong>Agents</strong> <strong>need</strong> <strong>their</strong> <strong>own</strong> <strong>identity,</strong> <strong>not</strong> <strong>yours.</strong> </h4>



<p class="wp-block-paragraph">Traditional bot frameworks use delegated auth &#8212; the agent acts <em>as</em> <em>you</em>, with access to everything you can see. That&#8217;s terrifying when the agent can reason and take actions autonomously, especially as they get more capable.<br /><br />With A365&#8217;s <a href="https://learn.microsoft.com/en-us/microsoft-agent-365/developer/identity">agentic-identity model</a>, the agent gets its own Entra ID account (e.g. agent@contoso.com). You share a resource, like a calendar, with it like you would a colleague. It only sees what you&#8217;ve explicitly granted.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png"><img width="1024" height="751" data-attachment-id="2060" data-permalink="https://blog.somecreativity.com/2026/02/16/giving-openclaw-its-own-identity-and-a-sandbox-to-run-in/screenshot-2026-02-15-at-11-57-49-pm/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png" data-orig-size="1880,1380" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Screenshot 2026-02-15 at 11.57.49 PM" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=1024" alt="" class="wp-image-2060" srcset="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png 1880w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">(See <a href="https://youtu.be/7uD2vyfBUUs">demo video</a>)</p>



<p class="wp-block-paragraph"><a href="https://purview.microsoft.com">Audit logs</a> and the <a href="https://learn.microsoft.com/en-us/microsoft-agent-365/developer/observability?tabs=python">Observability</a> stack show the agent acted, not you via some app. This is how trust should work.</p>



<h4 class="wp-block-heading"><strong>2. If an agent can run code, you need to control what it can reach.</strong></h4>



<p class="wp-block-paragraph">OpenClaw agents can generate and execute code, including network requests. OpenClaw A365 enforces network policy at the container level via iptables. You choose: unrestricted, locked down to Microsoft + your LLM provider, or a custom allowlist. The agent cannot call a domain you haven&#8217;t approved.</p>



<p class="wp-block-paragraph">Combining a real identity with least-privilege access and a sandboxed runtime gets us closer to highly autonomous agents that are still observable, governable, and safe to deploy in the enterprise.<br /></p>



<h4 class="wp-block-heading"><strong>Why this matters</strong></h4>



<p class="wp-block-paragraph">Agent 365 <a href="https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/">was released</a> in preview to Frontier customers last November at Ignite. It was a super-intense push for me, my team, and many others across the company. Back then, we didn&#8217;t know that an agent framework like OpenClaw would arrive and make it obvious to everyone why agents need their own identities, sandboxed runtimes, and observability. </p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png"><img loading="lazy" width="1024" height="749" data-attachment-id="2069" data-permalink="https://blog.somecreativity.com/2026/02/16/giving-openclaw-its-own-identity-and-a-sandbox-to-run-in/star-history-2026216/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png" data-orig-size="3152,2306" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="star-history-2026216" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=1024" alt="" class="wp-image-2069" srcset="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=2048 2048w, https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">The fact that the platform was already there waiting speaks to the foresight Microsoft had. Hope to see Google and other identity providers follow suit.</p>



<h4 class="wp-block-heading"><strong>Links</strong></h4>



<p class="wp-block-paragraph"><strong>Demo video:</strong> <a href="https://youtu.be/7uD2vyfBUUs">https://youtu.be/7uD2vyfBUUs</a></p>



<p class="wp-block-paragraph"><strong>GitHub</strong>: <a href="https://github.com/SidU/openclaw-a365">https://github.com/SidU/openclaw-a365</a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2026/02/16/giving-openclaw-its-own-identity-and-a-sandbox-to-run-in/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2056</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2026/02/chatgpt-image-feb-16-2026-12_33_46-am.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2026/02/chatgpt-image-feb-16-2026-12_33_46-am.png" medium="image">
			<media:title type="html">ChatGPT Image Feb 16, 2026, 12_33_46 AM</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2026/02/screenshot-2026-02-15-at-11.57.49-pm.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2026/02/star-history-2026216.png?w=1024" medium="image" />
	</item>
		<item>
		<title>GoodDocs</title>
		<link>https://blog.somecreativity.com/2025/12/27/gooddocs/</link>
					<comments>https://blog.somecreativity.com/2025/12/27/gooddocs/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sat, 27 Dec 2025 09:54:15 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=2020</guid>

					<description><![CDATA[Many of the docs we write exist to help teams make better decisions by writing down the thinking and reviewing it with others. With AI, it is easier than ever to generate a doc from a few words of a prompt, but when a draft looks &#8220;done&#8221; too quickly, important context and key aspects can [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Many of the docs we write exist to help teams make better decisions by writing down the thinking and reviewing it with others. With AI, it is easier than ever to generate a doc from a few words of a prompt, but when a draft looks &#8220;done&#8221; too quickly, important context and key aspects can get skipped. That is why teams often adopt doc templates: they force the right questions to show up every time.</p>



<p class="wp-block-paragraph">There are real benefits to standardized doc formats when you work with many people. A consistent template reminds you of the things you missed before and trains the team to avoid repeating past mistakes. It keeps everyone aligned on what needs to be answered, makes reviews dramatically faster, and helps new teammates find what they need without decoding each author&#8217;s personal style.</p>



<p class="wp-block-paragraph">The downside is toil: filling in every section takes time, which is exactly when people reach for AI and generate a draft from a few keywords. That is useful, but it can also skip critical thinking. The challenge is letting people use AI for speed while still ensuring the important parts are covered. That is where GoodDocs comes in.</p>



<p class="wp-block-paragraph"><a href="https://github.com/SidU/GoodDocs">GoodDocs</a> solves that by making documentation easy to write and easy to trust, even when AI helps produce the first draft. It encourages using AI as a thought-partner and research-partner, with an additional review layer that checks for missing reasoning, while still reducing toil so doc creators can focus on shipping real improvements for customers and business impact.</p>



<p class="wp-block-paragraph">We already have all the pieces: GitHub for storage and version control, GitHub Actions on PRs to run validation automatically, Codex/Claude-Code/GitHub Copilot CLI as the orchestration and review layer, and VS Code/Cursor as the editor. GoodDocs brings those parts together into a single, lightweight system for structured docs. </p>



<h4 class="wp-block-heading"><strong>How to use it</strong></h4>



<ol class="wp-block-list">
<li><strong>Setup</strong> by:
<ul class="wp-block-list">
<li>Creating a repo using <a href="https://github.com/SidU/GoodDocs">GoodDocs</a> as a template.</li>



<li>Cloning your repo locally.</li>
</ul>
</li>



<li><strong>Initialize</strong> the repo defaults with <code>make init</code>. This is a 1-time step.</li>



<li><strong>Run</strong> Codex/Claude-Code/GitHub Copilot CLI in your repo in terminal.</li>



<li><strong>Create</strong> a new doc with <code>make new-doc</code>, then draft it using the <code>$doc-author</code> skill.</li>



<li><strong>Edit</strong> your doc using your favorite editor, filling out all the sections.</li>



<li><strong>Share</strong> your doc by opening a PR from your branch. </li>



<li><strong>Validation</strong> runs automatically, and optional LLM review can run when enabled (ensure OPENAI_API_KEY is set in <a href="https://github.com/SidU/GoodDocs/settings/secrets/actions">https://github.com/your_account/repo/settings/secrets/actions</a>)</li>
</ol>



<h4 class="wp-block-heading"><strong>Example</strong></h4>



<p class="wp-block-paragraph">This repository includes a complete example document at <a href="https://github.com/SidU/GoodDocs/blob/main/docs/example/0001-example.md">docs/example/0001-example.md</a>. It follows the template, passes validation, and shows the expected level of detail across sections like Motivation, Proposed Solution, and Alternatives &amp; Open Questions.</p>



<p class="wp-block-paragraph">You can view a sample PR where the LLM left template-based review comments <a href="https://github.com/SidU/GoodDocs-Sample4/pull/1/  &nbsp; changes#diff-0ac83c8a544fd919161b18e766fbb1f0a0876f72025a42a48425330ad2ba6192">here</a>.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png"><img loading="lazy" width="1024" height="651" data-attachment-id="2028" data-permalink="https://blog.somecreativity.com/2025/12/27/gooddocs/screenshot-2025-12-27-at-1-43-22-am/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png" data-orig-size="2796,1780" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Screenshot 2025-12-27 at 1.43.22 AM" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=1024" alt="" class="wp-image-2028" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=2048 2048w, https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h4 class="wp-block-heading"><strong>How to customize it</strong> </h4>



<p class="wp-block-paragraph">(<em>You want to do this to get the real value out of this</em>)</p>



<p class="wp-block-paragraph">You can tune GoodDocs to match your org. In your repo that you created from GoodDocs as a template:</p>



<p class="wp-block-paragraph">Update <code>templates/doc-template.md</code> to change the doc format, and edit <code>schema/doc_rules.json</code> to adjust validation rules, required sections, or quality heuristics. If you need multiple doc types, add new templates and doc type entries so each format has its own rules and folder.</p>



<p class="wp-block-paragraph">Common customization examples and why they help:</p>



<ul class="wp-block-list">
<li><strong>PRDs</strong> to capture customer context, success metrics, and rollout plans in a consistent way.</li>



<li><strong>Dev design docs / RFCs</strong> to force clarity on trade-offs, API contracts, and migration plans before code is written.</li>



<li><strong>Decisions (<a href="https://github.com/joelparkerhenderson/architecture-decision-record">ADR</a>-style)</strong> to keep a durable record of why a choice was made and what alternatives were considered.</li>



<li><strong>Operations / incident playbooks</strong> to standardize escalation, post-mortem learnings, runbooks, and recovery steps.</li>



<li><strong>Compliance or security reviews</strong> to ensure required checks are documented and auditable.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/12/27/gooddocs/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2020</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/12/image-to-represent-an-abstract-document-minimal.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/12/image-to-represent-an-abstract-document-minimal.png" medium="image">
			<media:title type="html">image-to-represent-an-abstract-document-minimal</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/12/screenshot-2025-12-27-at-1.43.22-am.png?w=1024" medium="image" />
	</item>
		<item>
		<title>Controlling AI Agent Participation in Group Conversations (Koala)</title>
		<link>https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/</link>
					<comments>https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sat, 22 Nov 2025 16:01:24 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1999</guid>

					<description><![CDATA[Last Friday, we had the opportunity to hear from Justin Weisz, Stephanie Houde, Steven Ross, and the IBM team about their research on controlling AI agent participation in group conversations. They ran a set of studies with a Slack bot called Koala to understand how an agent should behave in live multiparty brainstorming sessions. Read [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Last Friday, we had the opportunity to hear from <a href="https://www.linkedin.com/in/jweisz3/">Justin Weisz</a>, <a href="https://www.linkedin.com/in/stephanie-houde-82b73a3/">Stephanie Houde</a>, <a href="https://www.linkedin.com/in/steven-ross-08aab81/">Steven Ross</a>, and the IBM team about <a href="https://arxiv.org/abs/2501.17258">their research on controlling AI agent participation in group conversations</a>. They ran a set of studies with a Slack bot called Koala to understand how an agent should behave in live multiparty brainstorming sessions. Read on for what they found. Their results are important for how we think about designing agents in collaborative spaces like Teams.</p>



<h4 class="wp-block-heading"><strong>Koala</strong></h4>



<p class="wp-block-paragraph">They built an LLM based conversational-agent prototype called <em><strong>Koala</strong></em> for Slack as a bot.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png"><img loading="lazy" width="1024" height="560" data-attachment-id="2006" data-permalink="https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/direct-request/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png" data-orig-size="1475,808" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="direct request." data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=1024" alt="" class="wp-image-2006" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png 1475w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">They ran two studies with Koala to measure its impact on brainstorming, using the findings from Study 1 to refine and evolve the agent for Study 2.</p>



<h4 class="wp-block-heading"><strong>Study Setup</strong></h4>



<ul class="wp-block-list">
<li>Same groups tested across:
<ol class="wp-block-list">
<li>No AI</li>



<li>Koala Reactive (responds when addressed) via mention</li>



<li>Koala Proactive (decides when to speak)</li>
</ol>
</li>



<li>Tasks: 3-min brainstorming -&gt; pick top 3 ideas.</li>
</ul>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png"><img loading="lazy" width="1024" height="381" data-attachment-id="2008" data-permalink="https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/fig-3-study-1-overview-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png" data-orig-size="1466,546" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Fig 3. Study 1 overview. During each session, a group of participants sequentially completed three rounds of trainstorming and" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=1024" alt="" class="wp-image-2008" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png 1466w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h5 class="wp-block-heading"><strong>High-level Findings</strong></h5>



<ul class="wp-block-list">
<li>Everyone preferred having Koala over no AI
<ul class="wp-block-list">
<li><span style="text-decoration: underline">Shows everyone appreciated having an agent while brainstorming</span></li>
</ul>
</li>



<li>Strong preference for Reactive over Proactive in v1.</li>



<li>Koala contributed 73% of all ideas;<strong> </strong>33% of top ideas.<br />(Takeaway: AI boosted volume <em>and</em> quality.)</li>
</ul>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png"><img loading="lazy" width="662" height="361" data-attachment-id="2004" data-permalink="https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/survey-question-preferred-condition/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png" data-orig-size="662,361" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Survey Question Preferred Condition" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png?w=662" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png?w=662" alt="" class="wp-image-2004" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png 662w, https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png?w=300 300w" sizes="(max-width: 662px) 100vw, 662px" /></a></figure>



<h5 class="wp-block-heading"><strong>Advantages (from Study 1)</strong></h5>



<ul class="wp-block-list">
<li>Removes “white page” problem; helps groups start.</li>



<li>Speeds up brainstorming.</li>



<li>Adds structure; pseudo-moderator.</li>



<li>Summaries keep the group on track.</li>



<li>Validates user ideas.</li>



<li>Fills knowledge gaps.</li>



<li>Visible human-AI collaboration sparks more ideas.</li>
</ul>



<h5 class="wp-block-heading"><strong>Disadvantages (from Study 1)</strong></h5>



<ul class="wp-block-list">
<li>Proactive mode = distracting, intrusive, overwhelming.
<ul class="wp-block-list">
<li>Too long, too frequent, wrong timing.</li>



<li>“Dominated the conversation.”</li>
</ul>
</li>



<li>Stifling effect (“boxed myself in,” production blocking).</li>



<li>Inaccurate / hallucinated summaries.</li>
</ul>



<h5 class="wp-block-heading"><strong>What Participants Wanted</strong></h5>



<ul class="wp-block-list">
<li>Control over when, how often, and how much<strong> </strong>Koala contributes.</li>



<li>Ability to steer behavior mid-conversation.</li>



<li>Combine reactive + selective proactive behaviors.</li>



<li>Agent should <strong>wait </strong>when humans are actively typing.</li>



<li>Option to ask permission before interjecting (“Want me to share top 3?”).</li>
</ul>



<h4 class="wp-block-heading"><strong>Koala II Improvements</strong></h4>



<ul class="wp-block-list">
<li>Model upgrade to Llama 3 led to fewer hallucinations, longer context.</li>



<li>Prompt updates: more targeted suggestions, less domination.</li>



<li>Tunable “value threshold” for proactivity.</li>



<li>UI control panel:
<ul class="wp-block-list">
<li>Reactive vs proactive toggle.</li>



<li>Proactive contribution threshold (High / Medium / Low).</li>



<li>Where messages appear: in-channel vs thread.</li>



<li>Long-message truncation.</li>
</ul>
</li>
</ul>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png"><img loading="lazy" width="1024" height="707" data-attachment-id="2009" data-permalink="https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/pasted-graphic-5/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png" data-orig-size="1511,1044" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Pasted Graphic 5" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=1024" alt="" class="wp-image-2009" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png 1511w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">Basically, give users the option of choosing how Koala should interact, allow it be steered on how to respond via a message mid-conversation, and pre-built persona selection.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png"><img loading="lazy" width="1024" height="529" data-attachment-id="2011" data-permalink="https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png" data-orig-size="1732,896" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="session (8) Comersational control enables users to adjust Koala Il&amp;#8217;s settings through natural language in the chat; and (C) Pers" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=1024" alt="" class="wp-image-2011" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png 1732w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h4 class="wp-block-heading"><strong>Study 2: Results</strong></h4>



<ul class="wp-block-list">
<li>Koala II perceived as quieter, better paced, more on-topic.</li>



<li>Felt more natural and less interruptive.</li>



<li>Big reversal:
<ul class="wp-block-list">
<li><em>No group switched from Proactive to Reactive.</em></li>



<li>When tuned, <span style="text-decoration: underline">people preferred the improved Proactive version.</span></li>
</ul>
</li>



<li>Threaded replies were a failed expectation (this surprised me initially, but makes sense):
<ul class="wp-block-list">
<li>People thought it would reduce noise, but it worsened collaboration.</li>
</ul>
</li>



<li>Tone complaints: Koala II occasionally too human (“That’s a great idea!”).</li>
</ul>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Three groups tried the option of having Koala II respond in thread rather than in channel, thinking it would reduce their distraction from Koala II. Surprisingly, it had the opposite effect. P1.1 explained how it took time to “look through everyone’s threads… taking away from our collaboration.” Many other participants made similar comments, suggesting that threaded replies may not be suited to the real-time nature of a brainstorming task.</p>
</blockquote>



<h4 class="wp-block-heading"><strong>User Control Insights (Study 2)</strong></h4>



<ul class="wp-block-list">
<li>Controls rated highly useful (avg 4.46/5).</li>



<li>People want to <strong>change settings dynamically</strong> during the session.</li>



<li>Different tasks → different proactivity levels.</li>



<li>Natural-language steering is attractive but risky (misinterpretation, pollutes conversation).</li>



<li>Roles and personas were preferred <em>as high-level modes</em>, but users still want low-level knobs.</li>
</ul>



<h4 class="wp-block-heading"><strong>Social + Governance Findings</strong></h4>



<ul class="wp-block-list">
<li>Adjusting AI settings inside a group is socially sensitive:
<ul class="wp-block-list">
<li>Users felt “intrusive” making unilateral changes.</li>



<li>But small teams were more accepting.</li>
</ul>
</li>



<li>Possible needs:
<ul class="wp-block-list">
<li>Admin roles</li>



<li>Voting on behavioral changes</li>



<li>Visibility of changes</li>
</ul>
</li>
</ul>



<h4 class="wp-block-heading"><strong>Taxonomy of Control (Paper’s Main Contribution)</strong></h4>



<ol class="wp-block-list">
<li>When the agent contributes
<ul class="wp-block-list">
<li>Triggers (all messages, direct address, silence, bursts of activity).</li>



<li>Filters (value threshold, relevance).</li>



<li>Rate (delay, pacing, matching human cadence).</li>
</ul>
</li>



<li>What the agent contributes
<ul class="wp-block-list">
<li>Content type (conservative vs wild ideas).</li>



<li>Style (tone, length, enthusiasm, formatting).</li>



<li>Modality (text, emojis, images, etc.).</li>
</ul>
</li>



<li>Where the agent contributes
<ul class="wp-block-list">
<li>In channel vs thread.</li>



<li>Future: other UI surfaces depending on context.</li>
</ul>
</li>



<li>How behaviors are specified
<ul class="wp-block-list">
<li>UI controls.</li>



<li>Natural language steering.</li>



<li>High-level roles.</li>



<li>Personas.</li>



<li>Granularity control (coarse vs fine).</li>
</ul>
</li>



<li>Who can change the settings
<ul class="wp-block-list">
<li>Permissions, visibility rules, group norms.</li>
</ul>
</li>



<li>Implementation
<ul class="wp-block-list">
<li>Prompt engineering.</li>



<li>External logic (needed because LLM self-regulation is unreliable).</li>



<li>Real-time control mechanisms, not static presets.</li>
</ul>
</li>
</ol>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png"><img loading="lazy" width="1024" height="895" data-attachment-id="2015" data-permalink="https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/pasted-graphic-7-2/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png" data-orig-size="1169,1022" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Pasted Graphic 7" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=1024" alt="" class="wp-image-2015" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=110 110w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png 1169w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h4 class="wp-block-heading"><strong>Key Design Insight</strong></h4>



<ul class="wp-block-list">
<li>Proactivity is not binary. It is multi-dimensional and must be <em>dynamically</em> adjustable by the group.</li>



<li>No single “best” setting; ideal behavior depends on:
<ul class="wp-block-list">
<li>group preferences</li>



<li>moment-to-moment context</li>



<li>stage of collaboration</li>
</ul>
</li>
</ul>



<h4 class="wp-block-heading"><strong>Going forward, explore next..</strong></h4>



<ul class="wp-block-list">
<li>Personalized AI behavior in collaborative settings.</li>



<li>Context-aware proactivity (detect active human exchange, detect pauses).</li>



<li>Allow different groups/situations to choose different behavior patterns.</li>



<li>The right approach:<strong> </strong>a configurable system, not a fixed algorithm.</li>
</ul>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/11/22/controlling-ai-agent-participation-in-group-conversations-koala/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1999</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/11/chatgpt-image-nov-7-2025-05_21_34-pm.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/chatgpt-image-nov-7-2025-05_21_34-pm.png" medium="image">
			<media:title type="html">ChatGPT Image Nov 7, 2025, 05_21_34 PM</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/direct-request.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/fig-3.-study-1-overview.-during-each-session-a-group-of-participants-sequentially-completed-three-rounds-of-trainstorming-and.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/survey-question-preferred-condition.png?w=662" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-5.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/session-8-comersational-control-enables-users-to-adjust-koala-ils-settings-through-natural-language-in-the-chat-and-c-pers.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/pasted-graphic-7-1.png?w=1024" medium="image" />
	</item>
		<item>
		<title>Inner Thoughts &#8211; Notes</title>
		<link>https://blog.somecreativity.com/2025/11/15/inner-thoughts-notes/</link>
					<comments>https://blog.somecreativity.com/2025/11/15/inner-thoughts-notes/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sat, 15 Nov 2025 11:32:15 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1965</guid>

					<description><![CDATA[We had the opportunity to host Bruce Liu, one of the authors of the Inner Thoughts paper, in our team’s AI learning session today. Sharing my key takeaways. Key Takeaways]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">We had the opportunity to host <a href="https://www.linkedin.com/in/xingyuliu1997/">Bruce Liu</a>, one of the authors of the <a href="https://arxiv.org/abs/2501.00383">Inner Thoughts paper</a>, in our team’s AI learning session today. Sharing my key takeaways.</p>



<h4 class="wp-block-heading">Key Takeaways</h4>



<ul class="wp-block-list">
<li>Giving an agent a persona and having it run a continuous internal monologue leads to more natural participation in group conversations.</li>



<li>The system generates multiple candidate thoughts, evaluates them on:
<ul class="wp-block-list">
<li>relevance</li>



<li>information gap</li>



<li>impact</li>



<li>appropriateness<br />&#8230; and only expresses a thought if motivation passes a threshold.</li>
</ul>
</li>



<li>This makes the agent selective, not reactive. It avoids over-speaking and feels socially aware.</li>



<li>The authors fine-tuned GPT-3.5 on the MPC (Multiparty Chat Corpus) dataset to predict the next speaker, and prompt the model to generate response based on it&#8217;s persona if selected by the prediction. They compared the Inner Thoughts approach against this baseline.
<ul class="wp-block-list">
<li>Dataset: <a href="https://github.com/sashank06/MPC-Corpus?utm_source=chatgpt.com">https://github.com/sashank06/MPC-Corpus</a></li>



<li>Paper: <a href="http://www.lrec-conf.org/proceedings/lrec2010/pdf/85_Paper.pdf">http://www.lrec-conf.org/proceedings/lrec2010/pdf/85_Paper.pdf</a></li>
</ul>
</li>



<li>The overall loop is: 
<ul class="wp-block-list">
<li><strong>Trigger</strong> &#8211; Initiating the thought process (when someone posts a message or silience threshold in this paper)</li>



<li><strong>Retrieval</strong>&nbsp;&#8211; Accessing relevant memories and context</li>



<li><strong>Thought Formation</strong>&nbsp;&#8211; Generating potential thoughts</li>



<li><strong>Evaluation</strong>&nbsp;&#8211; Assessing intrinsic motivation to express thoughts</li>



<li><strong>Participation</strong>&nbsp;&#8211; Deciding when and how to engage in conversation</li>
</ul>
</li>



<li>The important idea: Not every thought should be spoken.</li>



<li>Another interesting idea was that they used different prompts to simulate System-1 vs System-2 thinking (thinking fast-and-slow) to generate thoughts. 
<ul class="wp-block-list">
<li>They use a simple developer-set probability to choose between fast System-1 thoughts and slower System-2 reasoning, but this idea opens the door to far more sophisticated, context-aware switching.</li>
</ul>
</li>



<li>The agent behaves <span style="text-decoration: underline">more like a participant in the conversation</span>, not a tool that gets invoked when @ mentioned.</li>



<li>The code is clean and packaged well: <a href="https://github.com/xybruceliu/thoughtful-agents?utm_source=chatgpt.com">https://github.com/xybruceliu/thoughtful-agents</a>
<ul class="wp-block-list">
<li>Actually, should be very tractable to use it inside a <a href="https://aka.ms/teamsaiv2">Teams SDK</a> agent for Python.</li>
</ul>
</li>
</ul>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png"><img loading="lazy" width="1024" height="420" data-attachment-id="1987" data-permalink="https://blog.somecreativity.com/2025/11/15/inner-thoughts-notes/image-76/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png" data-orig-size="2026,832" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=1024" alt="" class="wp-image-1987" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png 2026w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/11/15/inner-thoughts-notes/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1965</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/11/chatgpt-image-nov-7-2025-05_21_34-pm.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/chatgpt-image-nov-7-2025-05_21_34-pm.png" medium="image">
			<media:title type="html">ChatGPT Image Nov 7, 2025, 05_21_34 PM</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/image-3.png?w=1024" medium="image" />
	</item>
		<item>
		<title>MUCA &#8211; Notes</title>
		<link>https://blog.somecreativity.com/2025/11/07/muca-notes/</link>
					<comments>https://blog.somecreativity.com/2025/11/07/muca-notes/#comments</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sat, 08 Nov 2025 01:22:52 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1956</guid>

					<description><![CDATA[In today&#8217;s AI Learning session, we had the opportunity to meet Manqing Mao and Jianzhe Lin who co-authored MUCA. Capturing my notes here. There are several interesting ideas in the paper that are applicable to multi-human &#60;-&#62; agent collaboration. Multi-User Chat Assistant (MUCA): Framework for LLM-Mediated Group Conversations MUCA targets multi-user, single-agent interactions — a [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">In today&#8217;s AI Learning session, we had the opportunity to meet <a href="https://www.linkedin.com/in/manqingmao/">Manqing Mao</a> and <a href="https://www.linkedin.com/in/jianzhe-peter-lin-a4135baa/">Jianzhe Lin</a> who co-authored <a href="https://arxiv.org/pdf/2401.04883v1">MUCA</a>. Capturing my notes here. There are several interesting ideas in the paper that are applicable to multi-human &lt;-&gt; agent collaboration.</p>



<p class="has-large-font-size wp-block-paragraph"><strong>Multi-User Chat Assistant (MUCA): Framework for LLM-Mediated Group Conversations</strong></p>



<p class="wp-block-paragraph">MUCA targets <em>multi-user, single-agent</em> interactions — a challenging setting where a chatbot must reason not only about what to say but also when and to whom. The system operationalizes these through the <strong>3W design dimensions</strong>:</p>



<ul class="wp-block-list">
<li><strong>What</strong> – selecting relevant content that advances the discussion or resolves conflicts.</li>



<li><strong>When</strong> – determining optimal response timing to balance engagement without interruption.</li>



<li><strong>Who</strong> – identifying the intended recipient(s) of the response within a group context.</li>
</ul>



<p class="wp-block-paragraph">Together, these govern a chatbot’s role as a supportive and context-aware participant in group discussions, rather than a turn-taking speaker responding to each message individually.</p>



<h3 class="wp-block-heading"><strong>Core Modules</strong></h3>



<ol class="wp-block-list">
<li><strong>Sub-topic Generator</strong><br />Initializes structured sub-topics from the conversation goal, agenda, or hints, enabling MUCA to guide discussions along coherent and logically connected threads rather than reacting opportunistically to each message.</li>



<li><strong>Dialog Analyzer</strong><br />Continuously interprets conversation state through several sub-modules:
<ul class="wp-block-list">
<li><strong>Sub-topic Status Update</strong> – tracks whether topics are <em>not discussed, being discussed,</em> or <em>well-discussed</em>, providing situational awareness.</li>



<li><strong>Utterance Feature Extractor</strong> – identifies which sub-topics are active within the current window, crucial for managing multi-threaded discussions.</li>



<li><strong>Accumulative Summary Update</strong> – maintains rolling summaries per participant to preserve long-term conversational context efficiently.</li>



<li><strong>Participant Feature Extractor</strong> – quantifies engagement (frequency, length, and focus of contributions) to detect lurkers or dominant speakers and inform adaptive participation strategies.</li>
</ul>
</li>



<li><strong>Utterance Strategies Arbitrator</strong><br />Selects one of seven <strong>dialog acts</strong>, ranked by heuristic confidence and contextual triggers, to determine MUCA’s next move. Each act has trigger conditions, warm-up, and cool-down turns to manage pacing:
<ul class="wp-block-list">
<li><strong>Direct Chatting:</strong> Respond immediately when pinged directly.</li>



<li><strong>Initiative Summarization:</strong> Periodically generate concise summaries to improve shared understanding.</li>



<li><strong>Participation Encouragement:</strong> Invite quieter participants to contribute using gentle, personalized prompts.</li>



<li><strong>Sub-topic Transition:</strong> Detect when a topic is exhausted or stale and guide the group to a new one.</li>



<li><strong>Conflict Resolution:</strong> Summarize opposing views and propose synthesis or consensus paths.</li>



<li><strong>In-context Chime-in:</strong> Contribute timely insights or clarifications when conversation flow stalls or questions remain unanswered.</li>



<li><strong>Keep Silence:</strong> Default behavior to avoid over-participation when no act is warranted, preserving conversational balance.</li>
</ul>
</li>
</ol>



<h3 class="wp-block-heading"><strong>Design Challenges Addressed</strong></h3>



<ul class="wp-block-list">
<li><strong>Stuck Conversation Advancement:</strong> Detects stagnation and injects contextually appropriate insights to re-ignite progress.</li>



<li><strong>Multi-threaded Discussion Management:</strong> Tracks overlapping topics and participant clusters to sustain coherence in complex group exchanges.</li>



<li><strong>Responsiveness Requirement:</strong> Maintains timely yet non-intrusive responses despite asynchronous, high-traffic chat environments.</li>



<li><strong>Participation Evenness:</strong> Uses data-driven engagement metrics to encourage balanced contributions across users.</li>



<li><strong>Conflict Resolution:</strong> Applies summarization and consensus-seeking acts to mediate disputes or align diverging viewpoints constructively.</li>
</ul>



<h3 class="wp-block-heading"><strong>Key Contribution</strong></h3>



<p class="wp-block-paragraph">MUCA provides the first structured framework enabling LLMs to function as <em>facilitators</em> in group settings. By uniting the 3W dimensions, a modular analysis pipeline, and dialog-act arbitration, it transforms large language models from reactive responders into proactive conversation participants capable of maintaining context, inclusivity, and flow in multi-participant discussions</p>



<p class="wp-block-paragraph"><strong>Paper</strong>: <a href="https://arxiv.org/pdf/2401.04883v1">https://arxiv.org/pdf/2401.04883v1</a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/11/07/muca-notes/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1956</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/11/chatgpt-image-nov-7-2025-05_21_34-pm.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/11/chatgpt-image-nov-7-2025-05_21_34-pm.png" medium="image">
			<media:title type="html">ChatGPT Image Nov 7, 2025, 05_21_34 PM</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>
	</item>
		<item>
		<title>Embeddings &#038; Similarity Metrics</title>
		<link>https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/</link>
					<comments>https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sat, 27 Sep 2025 22:08:26 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[rag]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1883</guid>

					<description><![CDATA[When asked what embedding model and similarity metric they’ve used, most people answer something like: “OpenAI embeddings with cosine similarity.” That’s a perfectly valid answer. But it leads to deeper questions: These were some of the questions we dug into in our team learning session last Friday. Let’s walk through the key takeaways. First: the [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">When asked what embedding model and similarity metric they’ve used, most people answer something like: <em>“OpenAI embeddings with cosine similarity.”</em></p>



<p class="wp-block-paragraph">That’s a perfectly valid answer. But it leads to deeper questions:</p>



<ul class="wp-block-list">
<li>What if you’re working with an open-source embedding model like BERT-base or MiniLM-base? Can you still use cosine similarity?</li>



<li>What if you come across code that’s using Euclidean distance with OpenAI embeddings &#8212; is that wrong?</li>



<li>Are there scenarios where Euclidean distance is actually better?</li>



<li>Do recommendation systems have different considerations than RAG systems?</li>
</ul>



<p class="wp-block-paragraph">These were some of the questions we dug into in our team learning session last Friday. Let’s walk through the key takeaways.</p>



<h4 class="wp-block-heading">First: the difference between Euclidean distance and cosine similarity</h4>



<p class="wp-block-paragraph">At a glance both compare vectors, but they focus on different things:</p>



<ul class="wp-block-list">
<li><strong>Euclidean distance</strong>: compares the <span style="text-decoration: underline">endpoints</span> of the vectors. It’s the straight-line distance between two points.</li>



<li><strong>Cosine similarity</strong>: compares the <span style="text-decoration: underline">directions</span>. It measures the angle between vectors, ignoring how long they are.</li>
</ul>



<h4 class="wp-block-heading">Euclidean distance</h4>



<p class="wp-block-paragraph">For simplicity’s sake, let’s take two vectors <img src="https://s0.wp.com/latex.php?latex=%7Ca%7C&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7Ca%7C&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7Ca%7C&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="|a|" class="latex" /> and <img src="https://s0.wp.com/latex.php?latex=%7Cb%7C&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7Cb%7C&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7Cb%7C&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="|b|" class="latex" /> drawn from the origin. The Euclidean distance between them is just the straight-line distance between their endpoints (the tips of the arrows). If you put a ruler between the tips, that’s the number you’d get.</p>



<p class="wp-block-paragraph">Matematically:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=%7Ca+-+b%7C+%3D+%5Csqrt%7B%28a_1+-+b_1%29%5E2+%2B+%28a_2+-+b_2%29%5E2+%2B+%5Ccdots+%2B+%28a_n+-+b_n%29%5E2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7Ca+-+b%7C+%3D+%5Csqrt%7B%28a_1+-+b_1%29%5E2+%2B+%28a_2+-+b_2%29%5E2+%2B+%5Ccdots+%2B+%28a_n+-+b_n%29%5E2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7Ca+-+b%7C+%3D+%5Csqrt%7B%28a_1+-+b_1%29%5E2+%2B+%28a_2+-+b_2%29%5E2+%2B+%5Ccdots+%2B+%28a_n+-+b_n%29%5E2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="|a - b| = &#92;sqrt{(a_1 - b_1)^2 + (a_2 - b_2)^2 + &#92;cdots + (a_n - b_n)^2}" class="latex" /></p>



<p class="wp-block-paragraph">This makes it clear why length matters here: even if two vectors point in almost the same direction, if one is much longer, the distance between their endpoints will still be large.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png"><img loading="lazy" width="1024" height="429" data-attachment-id="1926" data-permalink="https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/image-68/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png" data-orig-size="1848,776" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=1024" alt="" class="wp-image-1926" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png 1848w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h3 class="wp-block-heading">Cosine similarity</h3>



<p class="wp-block-paragraph">While Euclidean distance looks at the <em>endpoints</em> of vectors, cosine similarity only looks at their direction. Imagine projecting every vector onto the unit circle: cosine measures how close those directions are, regardless of how long the arrows are.</p>



<p class="wp-block-paragraph">Mathematically:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=%5Ctext%7Bcosine%7D%28a%2Cb%29+%3D+%5Cfrac%7Ba+%5Ccdot+b%7D%7B%7Ca%7C%7Cb%7C%7D+%3D+%5Ccos%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%5Ctext%7Bcosine%7D%28a%2Cb%29+%3D+%5Cfrac%7Ba+%5Ccdot+b%7D%7B%7Ca%7C%7Cb%7C%7D+%3D+%5Ccos%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Ctext%7Bcosine%7D%28a%2Cb%29+%3D+%5Cfrac%7Ba+%5Ccdot+b%7D%7B%7Ca%7C%7Cb%7C%7D+%3D+%5Ccos%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="&#92;text{cosine}(a,b) = &#92;frac{a &#92;cdot b}{|a||b|} = &#92;cos(&#92;theta)" class="latex" /></p>



<p class="wp-block-paragraph">Here <img src="https://s0.wp.com/latex.php?latex=a+%5Ccdot+b&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=a+%5Ccdot+b&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=a+%5Ccdot+b&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="a &#92;cdot b" class="latex" /> is the dot product and <img src="https://s0.wp.com/latex.php?latex=%5Ctheta&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%5Ctheta&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5Ctheta&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="&#92;theta" class="latex" /> is the angle between the two vectors. The lengths <img src="https://s0.wp.com/latex.php?latex=%5CVert+a+%5CVert&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%5CVert+a+%5CVert&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5CVert+a+%5CVert&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="&#92;Vert a &#92;Vert" class="latex" /> and <img src="https://s0.wp.com/latex.php?latex=%5CVert+b+%5CVert&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%5CVert+b+%5CVert&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5CVert+b+%5CVert&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="&#92;Vert b &#92;Vert" class="latex" /> cancel out, which is why cosine similarity is independent of vector magnitude.</p>



<ul class="wp-block-list">
<li>If the angle is 0° (vectors point the same way), cosine = 1 → perfectly similar.</li>



<li>If the angle is 90° (orthogonal), cosine = 0 → no similarity.</li>



<li>If the angle is 180° (opposite directions), cosine = –1 → maximally dissimilar.</li>
</ul>



<p class="wp-block-paragraph">Visually: even if one arrow is much longer, if they point in the same direction their cosine similarity is still 1.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png"><img loading="lazy" width="556" height="598" data-attachment-id="1899" data-permalink="https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/image-62/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png" data-orig-size="556,598" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png?w=279" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png?w=556" src="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png?w=556" alt="" class="wp-image-1899" style="aspect-ratio:0.9298052920619071;width:254px;height:auto" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png 556w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png?w=89 89w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png?w=279 279w" sizes="(max-width: 556px) 100vw, 556px" /></a></figure>



<h4 class="wp-block-heading">The intuition with three vectors</h4>



<p class="wp-block-paragraph">Imagine three vectors: <strong>A</strong>, <strong>B</strong>, and <strong>C</strong>:</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png"><img loading="lazy" width="1024" height="894" data-attachment-id="1903" data-permalink="https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/image-64/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png" data-orig-size="1386,1211" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=1024" alt="" class="wp-image-1903" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=110 110w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png 1386w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<ul class="wp-block-list">
<li>As you can see, B is more aligned in direction with A than C is.</li>



<li>With Euclidean distance, A–C (~ 9.2) looks closer than A–B (~ 13.5) because C’s tip is nearer to A’s tip, even though the angles are different.</li>



<li>With cosine similarity, A–B wins, because alignment (angle) matters more than raw length.</li>
</ul>



<p class="wp-block-paragraph">This is exactly the situation where Euclidean and cosine will disagree on ordering. So, this is why you need to be mindful of your choice of the comparison metric.</p>



<h4 class="wp-block-heading">Why normalization matters</h4>



<p class="wp-block-paragraph">A common trick is to normalize vectors so their length is 1 (i.e., put them on the unit circle or unit sphere). The math looks like this:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=v_%7B%5Ctext%7Bnorm%7D%7D+%3D+%5Cfrac%7Bv%7D%7B%5CVert+v+%5CVert_2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=v_%7B%5Ctext%7Bnorm%7D%7D+%3D+%5Cfrac%7Bv%7D%7B%5CVert+v+%5CVert_2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=v_%7B%5Ctext%7Bnorm%7D%7D+%3D+%5Cfrac%7Bv%7D%7B%5CVert+v+%5CVert_2%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="v_{&#92;text{norm}} = &#92;frac{v}{&#92;Vert v &#92;Vert_2}" class="latex" /></p>



<p class="wp-block-paragraph">Basically, take the vector <img src="https://s0.wp.com/latex.php?latex=v&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=v&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=v&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="v" class="latex" /> and divide each component by its length.</p>



<p class="wp-block-paragraph">When both vectors are normalized, this distance is just another way of measuring the angle between them — which is exactly what cosine similarity does.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png"><img loading="lazy" width="1024" height="503" data-attachment-id="1907" data-permalink="https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/image-65/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png" data-orig-size="1570,772" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=1024" alt="" class="wp-image-1907" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=1440 1440w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png 1570w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">So, in our example, after normalization B is closest to A, followed by C &#8211; with both cosine similarity and Euclidean distance.</p>



<p class="wp-block-paragraph">OpenAI embeddings already come normalized. Even though most people use cosine similarity without a second thought, even if you use Euclidean distance with them, you’ll get the same neighbors as cosine similarity — the rankings are identical.</p>



<h4 class="wp-block-heading">When magnitude matters: why not always normalize?</h4>



<p class="wp-block-paragraph">It’s tempting to think you should always normalize embeddings and stick to cosine similarity. After all, that’s what most semantic search and RAG systems do. But normalization isn’t always the right move, because sometimes the magnitude of the embedding carries meaning.</p>



<p class="wp-block-paragraph">Remember, the dot product between two vectors is:</p>



<p class="wp-block-paragraph"><img src="https://s0.wp.com/latex.php?latex=a+%5Ccdot+b+%3D+%5CVert+a+%5CVert+%5C%2C+%5CVert+b+%5CVert+%5Ccos%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=a+%5Ccdot+b+%3D+%5CVert+a+%5CVert+%5C%2C+%5CVert+b+%5CVert+%5Ccos%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=a+%5Ccdot+b+%3D+%5CVert+a+%5CVert+%5C%2C+%5CVert+b+%5CVert+%5Ccos%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="a &#92;cdot b = &#92;Vert a &#92;Vert &#92;, &#92;Vert b &#92;Vert &#92;cos(&#92;theta)" class="latex" /></p>



<p class="wp-block-paragraph">That means it encodes both alignment (the angle) <em>and</em> magnitude (the length of each vector). If length itself encodes a signal you care about, dot product or Euclidean distance can be the right tool, while cosine would wash that information away.</p>



<p class="wp-block-paragraph"><strong>Examples:</strong></p>



<ul class="wp-block-list">
<li><strong>Number of views on a video</strong> – a 10,000-view video might need to be treated differently from a 100-view video, even if the content is otherwise identical.</li>



<li><strong>Price of an item</strong> – if embeddings include “price” as one axis, Euclidean distance will reflect a real dollar gap ($499 vs. $1,999), not just semantic similarity.</li>



<li><strong>Quantity sold / demand</strong> – embeddings that include sales volume should allow high-demand items to naturally stand apart from slow movers.</li>



<li><strong>User activity level in recommendations</strong> – in collaborative filtering systems, highly active users often have embeddings with larger norms. Dot product/Euclidean distance naturally lets that <strong>popularity signal</strong> influence similarity scores.</li>
</ul>



<p class="wp-block-paragraph">In practice, large-scale recommendation systems have successfully leveraged this property. For example, <a href="https://arxiv.org/abs/1606.07154">Yahoo’s <strong>Prod2Vec</strong> approach </a>(Grbovic et al., 2015) applied Word2Vec-style training to user interaction sequences. They found that the resulting embeddings captured not only “semantic” relations between products, but also popularity and frequency effects in the vector norms which were signals that were directly useful for recommendations.</p>



<p class="wp-block-paragraph">So, you might think: <em>does this mean I don’t have to worry about Euclidean or dot product in RAG systems?</em> The answer is: usually not. But, here’s the fun part: most vector databases (FAISS, Pinecone, Weaviate, Milvus, etc.) implement cosine similarity by normalizing embeddings once and then <span style="text-decoration: underline">using dot product internally</span>. Why dot product? Because once embeddings are normalized, dot product works the same for ranking as Euclidean, but is faster to compute.</p>



<p class="wp-block-paragraph">My own small experiment, described below, confirmed this: Dot product was slightly faster than Euclidean on normalized embeddings (~1.1× speedup in my run), since it’s just multiply-and-sum with no subtractions/squares.</p>



<p class="wp-block-paragraph">After normalization, cosine and Euclidean gave identical nearest-neighbor rankings.</p>



<h5 class="wp-block-heading">THE Experiment</h5>



<ul class="wp-block-list">
<li>Generated ~20,000 database vectors and 200 query vectors with an embedding size of 384 (roughly what you’d get from MiniLM).</li>



<li>For each query, retrieved the top-K neighbors using:
<ol class="wp-block-list">
<li>Dot product (cosine if vectors are normalized)</li>



<li>Squared Euclidean distance</li>
</ol>
</li>
</ul>



<p class="wp-block-paragraph">Tested both on raw vectors and on normalized vectors (so that <img src="https://s0.wp.com/latex.php?latex=%5CVert+v+%5CVert+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%5CVert+v+%5CVert+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5CVert+v+%5CVert+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="&#92;Vert v &#92;Vert = 1" class="latex" />).</p>



<p class="wp-block-paragraph"><strong>Results:</strong></p>



<ul class="wp-block-list">
<li>On normalized vectors, cosine and Euclidean produced identical neighbor rankings. </li>



<li>In terms of performance, <strong>dot product was about 1.1× faster than Euclidean</strong> on normalized embeddings. That’s because dot product is just multiply-and-sum: <img src="https://s0.wp.com/latex.php?latex=a+%5Ccdot+b+%3D+%5Csum_i+a_i+b_i&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=a+%5Ccdot+b+%3D+%5Csum_i+a_i+b_i&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=a+%5Ccdot+b+%3D+%5Csum_i+a_i+b_i&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="a &#92;cdot b = &#92;sum_i a_i b_i" class="latex" /><br /><br />While squared Euclidean requires subtracting, squaring, and adding:<img src="https://s0.wp.com/latex.php?latex=%5CVert+a+-+b+%5CVert%5E2+%3D+%5Csum_i+%28a_i+-+b_i%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%5CVert+a+-+b+%5CVert%5E2+%3D+%5Csum_i+%28a_i+-+b_i%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%5CVert+a+-+b+%5CVert%5E2+%3D+%5Csum_i+%28a_i+-+b_i%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="&#92;Vert a - b &#92;Vert^2 = &#92;sum_i (a_i - b_i)^2" class="latex" /><br /><br />So Euclidean does more work per dimension, even if the final square root is skipped.</li>



<li>On raw (unnormalized) vectors, Euclidean and cosine gave different rankings, because vector length influences Euclidean distance but is canceled out in cosine.</li>
</ul>



<p class="wp-block-paragraph"><strong>Takeaways from the experiment:</strong></p>



<ul class="wp-block-list">
<li><strong>After normalization</strong>, dot product, cosine, and Euclidean distance are effectively the same in terms of ranking.</li>



<li><strong>Dot product is slightly faster</strong> in practice which explains why most vector databases implement cosine as “normalize once, then use dot product.”</li>



<li><strong>Before normalization</strong>, you can get very different results. Euclidean reflects both angle and magnitude, while cosine reflects only angle.</li>
</ul>



<h4 class="wp-block-heading">Recommendation vs. RAG systems</h4>



<ul class="wp-block-list">
<li>In <strong>RAG systems</strong>, you care primarily about <em>semantic similarity</em>. Normalization is almost always what you want, so cosine (or normalized Euclidean) is the default.</li>



<li>In <strong>recommendation systems</strong>, embeddings often mix semantic and behavioral signals. Magnitude might encode popularity, confidence, or frequency. In this world, dot product or Euclidean without normalization can be useful.</li>
</ul>



<h4 class="wp-block-heading">Decision Tree: When to Use Which</h4>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png"><img loading="lazy" width="1024" height="315" data-attachment-id="1940" data-permalink="https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/image-69/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png" data-orig-size="2193,675" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=1024" alt="" class="wp-image-1940" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=2048 2048w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h4 class="wp-block-heading">Key Takeaways</h4>



<ul class="wp-block-list">
<li><strong>Cosine similarity</strong>: great when direction = meaning; normalization removes scale.</li>



<li><strong>Euclidean distance</strong>: great when raw magnitudes carry interpretable meaning.</li>



<li><strong>Normalization</strong>: turns Euclidean into cosine for ranking purposes.</li>



<li><strong>OpenAI embeddings</strong>: already normalized, so Euclidean and cosine rank the same.</li>



<li><strong>Good rule of thumb in selecting the best similarity metric:</strong> match it to the one used to train your embedding model.</li>



<li><strong>Recommendations vs RAG</strong>: recommendations often want magnitude, RAG almost never does.</li>
</ul>



<h4 class="wp-block-heading">References</h4>



<ul class="wp-block-list">
<li><a href="https://www.pinecone.io/learn/vector-similarity">Vector Similarity Explained</a></li>



<li><a href="https://cs.stackexchange.com/questions/147713/why-word-embeddings-are-compared-with-cosine-distance-and-not-euclidean">Why are word embeddings compared with cosine and not euclidean?</a></li>
</ul>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/09/27/embeddings-similarity-metrics/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1883</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-13.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-13.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-11.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-5.png?w=556" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-7.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-8.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-12.png?w=1024" medium="image" />
	</item>
		<item>
		<title>Context Rot</title>
		<link>https://blog.somecreativity.com/2025/09/21/context-rot/</link>
					<comments>https://blog.somecreativity.com/2025/09/21/context-rot/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sun, 21 Sep 2025 11:17:49 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[rag]]></category>
		<category><![CDATA[retrieval-augmented-generation]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1869</guid>

					<description><![CDATA[Last Friday, our learning session covered Context Rot, a paper from the Chroma vector database team on how longer inputs affect LLM performance. They ran experiments with 18 leading LLMs, like o3, GPT-4.1, Claude, Gemini, and Qwen, on needle-in-a-haystack style questions, then measured how often the models gave the right answer. The best way to [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Last Friday, our learning session covered <em><a href="https://research.trychroma.com/context-rot">Context Rot</a></em>, a paper from the Chroma vector database team on how longer inputs affect LLM performance.</p>



<p class="wp-block-paragraph">They ran experiments with 18 leading LLMs, like o3, GPT-4.1, Claude, Gemini, and Qwen, on needle-in-a-haystack style questions, then measured how often the models gave the right answer.</p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png"><img loading="lazy" width="1024" height="639" data-attachment-id="1876" data-permalink="https://blog.somecreativity.com/2025/09/21/context-rot/image-59/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png" data-orig-size="1168,729" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=1024" alt="" class="wp-image-1876" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png 1168w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph">The best way to TLDR is to just watch this ~7 minute video on YouTube:</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe class="youtube-player" width="640" height="360" src="https://www.youtube.com/embed/TUjQuC4ugak?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en&#038;autohide=2&#038;wmode=transparent" allowfullscreen="true" style="border:0;" sandbox="allow-scripts allow-same-origin allow-popups allow-presentation allow-popups-to-escape-sandbox"></iframe>
</div></figure>



<p class="wp-block-paragraph">Here are the key takeaways:</p>



<ul class="wp-block-list">
<li><strong>Longer context hurts</strong>: Don’t overload models with full reports or long histories. As irrelevant text piles up, even strong models miss answers. Keep inputs lean for reliable results.</li>
</ul>



<ul class="wp-block-list">
<li><strong>Clarity of the query matters</strong>: Vague questions get worse answers in long contexts. Since you can’t rely on users to always be precise, systems must rewrite queries, for example by rephrasing them into clearer forms, mapping them to structured intent, or combining them with retrieval to anchor the request.</li>
</ul>



<ul class="wp-block-list">
<li><strong>Distractors amplify errors</strong>: Models can be tricked by irrelevant but similar text. In compliance or legal reviews, this means they might confuse one clause for another. Systems must filter out look-alike noise.
<ul class="wp-block-list">
<li>Use embeddings + keyword anchors together (semantic + lexical match).</li>



<li>Enforce entity checks (IDs, dates, names must align).</li>



<li>Apply re-ranking models to filter passages that look close but don’t directly answer.</li>



<li>Train retrievers with negative samples (examples of near-duplicate but irrelevant text).</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li><strong>Structure of irrelevant content matters</strong>: Clean, coherent irrelevant text is more distracting than random noise. That means polished background material can actually reduce accuracy if not filtered.</li>
</ul>



<ul class="wp-block-list">
<li><strong>Focused input beats full input</strong>: Retrieval layers or context filters that feed only what’s relevant improve both accuracy and cost. Businesses should invest in these instead of relying on raw long-context alone.</li>
</ul>



<ul class="wp-block-list">
<li><strong>Exact repetition breaks down</strong>: Researchers asked models to simply copy long blocks of text word-for-word. If a model can’t copy long sequences reliably, it can’t be trusted to surface exact details (IDs, contract terms, medical dosages). Retrieval workflows must include verification.</li>
</ul>



<p class="wp-block-paragraph"><strong>So the bigger question is: should you solve context-rot as an app-developer or wait for big-labs to solve?</strong></p>



<p class="wp-block-paragraph">Big labs will keep improving the physics of long context: better positional encodings, more efficient attention, training strategies that improve decay. But those fixes won’t handle your domain specifics: which clauses in a contract matter, which patient record fields are critical, or how to enforce compliance rules. That’s squarely on app / agent developers and those investments should be durable.</p>



<p class="wp-block-paragraph">Retrieval layers, query normalization, and verification pipelines will remain useful even if models get better, because they enforce governance, and add trust, and cut costs.</p>



<p class="wp-block-paragraph">What may become obsolete are low-level hacks like custom chunk sizes. So, right strategy seems like not to wait. Build domain-aware context engineering now, knowing labs will lift the floor while your systems enforce the ceiling.</p>



<p class="wp-block-paragraph">One strategy is to win customers who care about context-rot being solved well today, even if some of that work gets thrown away as labs improve. Those early wins give you a base to move into higher-value scenarios while competitors catch up later on the basics with less effort.</p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/09/21/context-rot/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1869</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-3.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-3.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/image-2.png?w=1024" medium="image" />
	</item>
		<item>
		<title>Defining &#8220;AGI&#8221;</title>
		<link>https://blog.somecreativity.com/2025/09/07/defining-agi/</link>
					<comments>https://blog.somecreativity.com/2025/09/07/defining-agi/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Mon, 08 Sep 2025 05:19:58 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[agi]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[openai]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1852</guid>

					<description><![CDATA[This week, one of the papers we discussed in my team was, the spicely titled, “What The F*ck Is Artificial General Intelligence?” by Michael Timothy Bennett, which I found after hearing him on MLST (still one of my favorite podcasts, has super high signal/noise). Several interesting points came up:]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">This week, one of the papers we discussed in my team was, the spicely titled, <em>“</em><a href="https://arxiv.org/pdf/2503.23923">What The F*ck Is Artificial General Intelligence?</a>” by Michael Timothy Bennett, which I found after hearing him on <a href="https://www.youtube.com/watch?v=K18Gmp2oXIM">MLST</a> (still one of my favorite podcasts, has super high signal/noise). Several interesting points came up:</p>



<ul class="wp-block-list">
<li><strong>It&#8217;s a western thing: </strong>Someone mentioned that the whole concept of “AGI” feels very Western. In Eastern thought, intelligence is everywhere on a spectrum. Even very simple life forms like cells demonstrate intelligence by communicating with each other<em>.</em> For example, cells exchange chemical signals when they meet, adapt their behavior, and coordinate responses. This broader framing aligns with Bennett’s critique of anthropocentric definitions of intelligence.</li>



<li><strong>Kids: </strong>Someone pointed out how their 2-year-old can pick up concepts after just a few repetitions. That speed of skill acquisition, and doing so with very little data, is central to generalist intelligence. Bennett frames this as adaptation with limited resources, which also brings energy efficiency into the picture.</li>



<li><strong>Energy</strong>: We debated whether energy cost should be part of the definition. If something burns the energy of a star to reach human-level capability, is that really AGI? Bennett argues adaptability includes both sample efficiency and energy efficiency, so by his framing it matters.</li>



<li><strong>New Science: </strong>We agreed that being able to discover new science, as Bennett calls out with the “artificial scientist” framing, is a key marker of AGI. It’s more than just doing tasks; it’s also about prioritizing, experimenting, and finding new knowledge.</li>



<li><strong>It&#8217;s a spectrum: </strong>There was consensus that intelligence isn’t binary but a spectrum: at the high end are systems that not only learn new skills but do so efficiently, making them “more intelligent” than others that reach the same outcome at much higher cost.</li>



<li><strong>Methods</strong>: On methods, we noted that search is necessary but not sufficient—you can’t just brute-force your way through the unknown. Approximation (fitting the messy world) is also critical. Bennett calls these the two foundational tools, and points out both are inefficient in different ways.</li>



<li><strong>Hybrid</strong>: The group leaned toward hybrid architectures (like AlphaGo, or more recent blends like o3 and AlphaGeometry) as the likely path forward. Bennett also highlights cognitive architectures that try to integrate perception, reasoning, and memory, exactly the kind of fusion we thought made sense.</li>



<li><strong>Finally, we asked the “is GPT-5 AGI?” question.</strong> We realized how quickly the goal-posts move. If someone had shown us GPT-5 in ChatGPT just a few years ago, we’d probably have called it AGI on the spot. Bennett makes the same observation: public hype keeps redefining AGI as whatever we don’t yet have.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/09/07/defining-agi/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1852</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/09/star-in-space-that-looks-like-a-brain.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/09/star-in-space-that-looks-like-a-brain.png" medium="image">
			<media:title type="html">star-in-space-that-looks-like-a-brain</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>
	</item>
		<item>
		<title>MCP Universe</title>
		<link>https://blog.somecreativity.com/2025/08/31/mcp-universe/</link>
					<comments>https://blog.somecreativity.com/2025/08/31/mcp-universe/#respond</comments>
		
		<dc:creator><![CDATA[Sid]]></dc:creator>
		<pubDate>Sun, 31 Aug 2025 21:09:16 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<category><![CDATA[chatgpt]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[technology]]></category>
		<guid isPermaLink="false">http://blog.somecreativity.com/?p=1833</guid>

					<description><![CDATA[Salesforce AI’s new MCP-Universe benchmark puts frontier models through 200+ real-world tool-use tasks. The results: GPT-5 lands at 43.7%, Grok-4 at 33.3%, and Claude-Sonnet at 29.4%. The rest of this post breaks down why these numbers are so much lower than BFCL, what domains drag models down most, and what the findings mean for teams [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Salesforce AI’s new <a href="https://mcp-universe.github.io/">MCP-Universe</a> benchmark puts frontier models through 200+ real-world tool-use tasks. The results: GPT-5 lands at 43.7%, Grok-4 at 33.3%, and Claude-Sonnet at 29.4%.</p>



<p class="wp-block-paragraph">The rest of this post breaks down <em>why</em> these numbers are so much lower than BFCL, what domains drag models down most, and what the findings mean for teams wiring MCP into their platforms.</p>



<h4 class="wp-block-heading">TLDR:</h4>



<ul class="wp-block-list">
<li><strong>Frontier models underperform: </strong>GPT‑5 tops out at 43.72% success, Grok‑4 at 33.33%, and Claude‑4.0‑Sonnet at 29.44%, while the best open‑source model reaches 24.68% (details in the paper). </li>



<li><strong>Failures are driven by three core challenges: </strong>
<ul class="wp-block-list">
<li>long contexts that balloon across multi‑step tool use,</li>



<li>unfamiliar/underspecified tool interfaces that trigger API misuse (the “unknown‑tools” problem), and </li>



<li>distraction from large sets of unrelated tools. </li>
</ul>
</li>



<li><strong>Simple mitigations help inconsistently:</strong> 
<ul class="wp-block-list">
<li>per‑step summarization and a pre‑task “exploration” phase yield domain and model‑specific gains but no universal lift.</li>
</ul>
</li>



<li>Models generally follow formats well but falter on content correctness, especially on dynamic, time‑sensitive tasks. </li>



<li>Domain difficulty varies sharply (location navigation is uniformly hard; GPT‑5 fares best in finance and 3D). </li>



<li>Agent architecture matters: o3 with OpenAI Agent SDK outperforms o3 with ReAct; </li>



<li><strong>Links</strong>:
<ul class="wp-block-list">
<li>Paper:&nbsp;<a href="https://alphaxiv.org/abs/2508.14704" target="_blank" rel="noreferrer noopener">https://arxiv.org/abs/2508.14704</a>; </li>



<li>project:&nbsp;<a href="https://mcp-universe.github.io/" target="_blank" rel="noreferrer noopener">https://mcp-universe.github.io</a>;</li>



<li>code:&nbsp;<a href="https://github.com/SalesforceAIResearch/MCP-Universe" target="_blank" rel="noreferrer noopener">https://github.com/SalesforceAIResearch/MCP-Universe</a>.</li>
</ul>
</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity" />



<h3 class="wp-block-heading">Takeaways for teams integrating MCP:</h3>



<ol class="wp-block-list">
<li><strong>Limit Tool Exposure</strong>: Avoid exposing LLMs to overly large or noisy tool environments. Curate and scope tool sets to minimize &#8220;cognitive&#8221; load and improve selection accuracy.</li>



<li><strong>Orchestration Design Matters</strong>: Design orchestration layers that guide LLMs toward relevant tools. Consider SDK-level constraints or routing logic to reduce ambiguity.</li>



<li><strong>Platform Implications</strong>: Integration strategies should account for tool density and relevance filtering<strong>.</strong> Explore tooling levers that help LLMs navigate complex tool ecosystems more effectively (constrain and route the toolset, tighten tool interfaces, shape returned data, long context growth, standardize errors, etc.)</li>
</ol>



<h4 class="wp-block-heading">But BFCL shows the frontier models at more than 70% accuracy?!</h4>



<p class="wp-block-paragraph">Berkeley Function Calling Leaderboard (<a href="https://gorilla.cs.berkeley.edu/leaderboard.html">BFCL</a>) has frontier LLMs like GPT-4.5, Claude-Opus-4, and Claude-Sonnet clearing around 70% overall accuracy, so a natural question is what’s different with MCP-Universe causing the numbers to be much lower (e.g., GPT-4.5 is 70.85 on BFCL but 24.68 on MCP-Universe).</p>



<p class="wp-block-paragraph">Crux is that MCP-Universe is wired into real MCP servers and long contexts, while BFCL is scoring on a curated, static dataset.</p>



<p class="wp-block-paragraph">MCP-Universe leans on multi-step reasoning where small errors can snowball.</p>



<p class="wp-block-paragraph">The large number of unrelated tools in MCP-Universe (to mimic real-world messiness) is another factor.</p>



<h3 class="wp-block-heading">What domains did they test on?</h3>



<p class="wp-block-paragraph"><strong>See chart below:</strong></p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png"><img loading="lazy" width="1024" height="554" data-attachment-id="1845" data-permalink="https://blog.somecreativity.com/2025/08/31/mcp-universe/image-55/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png" data-orig-size="2144,1162" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=1024" alt="" class="wp-image-1845" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=2048 2048w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="wp-block-paragraph"><strong>Example of a task:</strong></p>



<figure class="wp-block-image size-large"><a href="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png"><img loading="lazy" width="1024" height="364" data-attachment-id="1847" data-permalink="https://blog.somecreativity.com/2025/08/31/mcp-universe/image-56/" data-orig-file="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png" data-orig-size="2296,817" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=300" data-large-file="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=1024" src="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=1024" alt="" class="wp-image-1847" srcset="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=1024 1024w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=2048 2048w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=128 128w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=300 300w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=768 768w, https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<h3 class="wp-block-heading">So, what does this mean?</h3>



<p class="wp-block-paragraph">You get what you measure. Now that MCP-Universe is showing frontier LLMs struggling, the developers behind those models have a clear target to chase. Expect the accuracy of real-world MCP tool calls to climb fast in the coming months.</p>



<h4 class="wp-block-heading">Links</h4>



<ul class="wp-block-list">
<li><a href="https://arxiv.org/pdf/2508.14704">Paper</a></li>



<li><a href="https://mcp-universe.github.io/">Website</a></li>



<li><a href="https://github.com/SalesforceAIResearch/MCP-Universe">Code</a></li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.somecreativity.com/2025/08/31/mcp-universe/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1833</post-id>
		<media:thumbnail url="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-2.png" />
		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-2.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/0eda0091e089b109dc142c655f5833609a7b82daf5ef673a798b5d1b014e3c27?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">Sid</media:title>
		</media:content>

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-3.png?w=1024" medium="image" />

		<media:content url="https://blog.somecreativity.com/wp-content/uploads/2025/08/image-4.png?w=1024" medium="image" />
	</item>
	</channel>
</rss>
