<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:media="http://search.yahoo.com/mrss/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Towards AI</title>
	<atom:link href="https://towardsai.net/feed" rel="self" type="application/rss+xml" />
	<link>https://towardsai.net</link>
	<description>Making AI accessible to all</description>
	<lastBuildDate>Mon, 08 Jun 2026 09:28:45 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://towardsai.net/wp-content/uploads/2019/05/cropped-towards-ai-square-circle-png-32x32.png</url>
	<title>Towards AI</title>
	<link>https://towardsai.net</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Principal Component Analysis (PCA): Theory, Mathematics, and Applications</title>
		<link>https://towardsai.net/p/machine-learning/principal-component-analysis-pca-theory-mathematics-and-applications</link>
		
		<dc:creator><![CDATA[Praveen Bhavani]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 09:10:06 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/?p=51558</guid>

					<description><![CDATA[Author(s): Praveen Bhavani Originally published on Towards AI. Principal Component Analysis (PCA) is one of the most widely used techniques for dimensionality reduction and feature extraction. PCA transforms correlated variables into a smaller set of uncorrelated variables called principal components, while preserving as much information (variance) as possible. PCA is fundamentally a linear algebra and statistical method rooted in: Covariance structure analysis Orthogonal transformations Eigenvalue decomposition Variance maximization Modern datasets often contain hundreds or thousands of correlated variables. In finance, quantitative trading, computer vision, genomics, and machine learning, high-dimensional data creates several challenges: Computational inefficiency Noise accumulation Multicollinearity Overfitting Difficulty in visualization and interpretation Conceptual Foundation of PCA Consider a dataset of n observations measured across p features, where those features are often correlated with one another — as height and weight tend to be in a population study, or as pixel intensities tend to be in neighboring regions of an image. Working directly in this high-dimensional, correlated space is statistically inefficient: many of the p dimensions carry redundant information, and the sheer number of variables can obscure the underlying structure we actually care about. PCA addresses this by asking a deceptively simple question: is there a new coordinate system in which the data’s variability becomes easier to see? Rather than describing each observation through the original features x₁, x₂, …, xₚ, PCA constructs a new set of axes the principal components PC₁, PC₂, …, PCₚ that are rotations of the original coordinate space. These axes are chosen sequentially and according to a strict rule: each one must point in the direction of greatest remaining variance in the data, subject to being orthogonal to every component that came before it. This gives rise to a natural ordering. PC₁ is the single direction through the data that explains more variance than any other possible axis — it is, in a precise geometric sense, the line along which the data is most spread out. PC₂ then captures as much of the residual variance as possible, after the contribution of PC₁ has been removed, and it does so at a right angle to PC₁. PC₃ repeats this logic relative to the first two components, and so on down the chain. The orthogonality constraint is not merely a geometric nicety — it is what guarantees that the principal components are uncorrelated with one another. Information captured by PC₁ is completely absent from PC₂, which means each component contributes something genuinely new to the description of the data. This stands in direct contrast to the original features, which may be entangled by correlations that make it hard to attribute variance to any single variable cleanly. The practical payoff is that the bulk of the dataset’s variance is typically concentrated in the first few principal components. A dataset with fifty correlated features might have 90% of its variance sitting in just three or four PCs. Those components can then stand in for the full feature set — dramatically reducing dimensionality while preserving the structure that matters most for analysis, visualization, or downstream modeling. Geometric Interpretation The algebra of PCA — eigendecompositions, covariance matrices, orthogonal projections — can feel abstract until you see what it is actually doing to the data. Geometrically, PCA is a rotation. Nothing is stretched, nothing is discarded, and no information is destroyed. The coordinate axes simply pivot to a new orientation, one chosen to align with the natural shape of the data rather than with the arbitrary axes of the original measurement space. To make this concrete, consider two financial variables measured daily over a year: a stock’s return and its trading volume. Plotted as a scatter of points, this data rarely forms a tidy horizontal or vertical band. Returns and volume tend to move together — high-volume days often coincide with large price swings — so the cloud of points stretches diagonally across the plane, oriented somewhere between the two original axes. The original coordinate system, with its horizontal “return” axis and vertical “volume” axis, cuts across the data at an oblique angle. It describes the cloud accurately, but inefficiently: both variables are needed to characterize even the dominant pattern. PCA identifies that diagonal direction and makes it the first axis. PC₁ runs along the longest dimension of the cloud — the direction in which the data is most spread out. A single number along this axis tells you more about where a given day sits in the distribution than either the return or the volume measurement alone. The second axis, PC₂, is then placed at a right angle to PC₁, pointing across the narrow dimension of the cloud and capturing whatever residual variation remains after the dominant trend is accounted for. The result of this rotation is threefold. First, it compresses information: the dominant structure of the data, which previously required two coordinates to describe, is now legible in one. Second, it removes redundancy: because the original variables were correlated, they were partly saying the same thing twice; the rotated axes separate that shared signal from the independent variation each variable carries. Third, it decorrelates the features: by construction, the projections of the data onto PC₁ and PC₂ have zero correlation — the axes are orthogonal, so knowing a point’s position along one component tells you nothing about its position along the other. What PCA does not do is change the data itself. Every point occupies the same position in space before and after the transformation; only the rulers used to measure that position have changed. This is why PCA is lossless when all components are retained — and why choosing to keep only the first k components is a deliberate, interpretable act of compression rather than an accidental distortion. The Data Matrix The starting point for any rigorous treatment of PCA is a precise description of the data it operates on. We represent our dataset as a matrix X ∈ ℝⁿˣᵖ, where n is the number of observations and p is the number of measured variables. Each of [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*r8rmnAsC90k3yJEG55gjPw.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Build a Zero-Cost Web Automation Pipeline With OpenRouter, OpenClaw, and MediaUse</title>
		<link>https://towardsai.net/p/machine-learning/build-a-zero-cost-web-automation-pipeline-with-openrouter-openclaw-and-mediause</link>
		
		<dc:creator><![CDATA[yooiken]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 09:09:12 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/?p=51560</guid>

					<description><![CDATA[Author(s): yooiken Originally published on Towards AI. Build a Zero-Cost Web Automation Pipeline With OpenRouter, OpenClaw, and MediaUse I have become less interested in whether a cheap model can “browse the web” and more interested in whether it can run a boring workflow correctly every morning. That is a different problem. Most low-cost or free LLMs fail at web automation because the model has to do too much at once. It has to understand the goal, inspect the page, decide where to click, recover from layout changes, parse the result, and then write something useful. One weak link ruins the whole run. The workaround is simple: do not ask the free model to operate the browser. Use the free model as the dispatcher. Let MediaUse handle the browser work through site plugins. The model calls semantic commands like “get Hacker News top stories” or “read this Reddit thread.” MediaUse turns those commands into stable browser actions and returns structured JSON. In this article, I will build a daily pipeline that: Uses OpenClaw with OpenRouter’s free openrouter/owl-alpha model as the orchestrator. Uses MediaUse Hacker News skill to find today’s technical stories. Uses MediaUse Reddit skill to collect user reactions. Uses MediaUse ChatGPT skill to turn the research into a Medium draft. Saves the article draft locally. Runs every day around 10:00 AM. The result is a low-cost agent that does not depend on a frontier model for every step. The free model plans and routes. MediaUse performs the web operations. ChatGPT is optional and only used at the end because I want the final writing to be good. If you want the strictest “zero API spend” version, skip the ChatGPT step and ask owl-alpha to write the draft from the collected JSON. If you already have access to ChatGPT through the web UI, the MediaUse ChatGPT skill can use that browser workflow instead of sending paid API calls. Current caveat: OpenRouter lists openrouter/owl-alpha as free during its current availability window, with tool support and a large context window. Free model availability can change, so check the model page before relying on it in production. Why this works better than “LLM, please browse the web” A general browser agent has to reason over pixels and HTML. A site plugin does not. MediaUse skills package website actions into predictable commands. The Hacker News skill has commands like: mediause hackernews get top --limit 20 --jsonmediause hackernews read item --id &#60;item_id&#62; --depth 2 --replies 20 --max-length 2000 --json The Reddit skill has commands like: mediause reddit search posts --query &#34;open source AI agent&#34; --subreddit &#34;LocalLLaMA&#34; --sort relevance --time day --limit 10 --jsonmediause reddit read item --post-id &#60;post_id&#62; --sort top --limit 30 --depth 3 --max-length 3000 --json The LLM does not need to know where the Reddit search box is. It does not need to scroll through nested comments. It just asks for the operation. That is the whole trick. Low-quality models often struggle when the task is open-ended. They do much better when the action space is small, named, and structured. MediaUse gives them that smaller action space. The pipeline Here is the workflow I use: 10:00 AM &#x007C; vOpenClaw wakes up the workflow &#x007C; vOpenRouter owl-alpha chooses the plan &#x007C; vMediaUse Hacker News skill fetches today&#39;s top tech stories &#x007C; vMediaUse Reddit skill searches for matching user reactions &#x007C; vResearch JSON is normalized into one brief &#x007C; vMediaUse ChatGPT skill writes a Medium draft &#x007C; vDraft is saved to ./drafts/YYYY-MM-DD-medium-draft.md Step 1: install and configure MediaUse On Windows, install or update the MediaUse CLI: powershell -C &#34;iwr https://release.mediause.dev/install.ps1 -UseBasicParsing &#x007C; iex&#34;mediause --version Configure your MediaUse key: mediause manage key &#60;your_mediause_key&#62; --json Install the site plugins: mediause plugin add hackernews --jsonmediause plugin add reddit --jsonmediause plugin add chatgpt --json Bind the accounts: mediause auth list --json # Hacker News supports guest read workflows.mediause use account hackernews:guest --policy balanced --json# Reddit usually works best in visible mode.mediause use account reddit:&#60;account_id&#62; --policy balanced --show --json# ChatGPT needs your account context if you use it for writing.mediause use account chatgpt:&#60;account_id&#62; --policy balanced --jsonmediause auth health --json Step 2: configure OpenClaw to use OpenRouter Owl Alpha Create an OpenRouter API key, then set it in your shell: $env:OPENROUTER_API_KEY = &#34;&#60;your_openrouter_key&#62;&#34; Use openrouter/owl-alpha as the model for the orchestration agent. The exact OpenClaw config shape may differ depending on your version, but the important part is the provider, base URL, and model: provider: openrouterbase_url: https://openrouter.ai/api/v1model: openrouter/owl-alphaapi_key_env: OPENROUTER_API_KEYtemperature: 0.2 You can sanity-check the model directly with OpenRouter’s OpenAI-compatible API: $body = @{ model = &#34;openrouter/owl-alpha&#34; messages = @( @{ role = &#34;user&#34; content = &#34;Return a JSON plan for collecting today&#39;s developer news.&#34; } ) response_format = @{ type = &#34;json_object&#34; }} &#x007C; ConvertTo-Json -Depth 10 Invoke-RestMethod ` -Uri &#34;https://openrouter.ai/api/v1/chat/completions&#34; ` -Method Post ` -Headers @{ Authorization = &#34;Bearer $env:OPENROUTER_API_KEY&#34; &#34;Content-Type&#34; = &#34;application/json&#34; } ` -Body $body Step 3: give the agent a narrow prompt The prompt matters. Do not ask the model to be creative with the workflow. Ask it to call a small set of commands and produce a strict output. Use this as the system prompt for the OpenClaw workflow: You are a daily technical research dispatcher. Your job is to collect material for one Medium article draft.Rules:- Use MediaUse commands only for website data collection.- Prefer structured JSON outputs.- Do not browse manually.- Do not invent article facts.- Keep the final research bundle under 12,000 words.- Stop if a site returns a risk prompt, captcha, or account challenge.Workflow:1. Get today&#39;s top Hacker News stories.2. Select 3 to 5 stories about projects, developer tools, AI infrastructure, open source, or software engineering.3. Read each selected HN item with comments.4. For each selected story, search Reddit for matching discussion.5. Read the most relevant Reddit thread when available.6. Build a research bundle with: - title - source URL - why it matters - HN discussion summary - Reddit user feedback summary - notable disagreement - possible article angle7. Send the research bundle to ChatGPT through the MediaUse ChatGPT skill to [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*7CYcvYvhcbfqwa_eODFR9g.png" medium="image"></media:content>
            	</item>
		<item>
		<title>I Gave Qwen3.7-Plus a Screenshot and It Found the Exact Pixel to Click for $0.40</title>
		<link>https://towardsai.net/p/machine-learning/i-gave-qwen3-7-plus-a-screenshot-and-it-found-the-exact-pixel-to-click-for-0-40</link>
		
		<dc:creator><![CDATA[Chew Loong Nian - AI ENGINEER]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 09:08:19 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/i-gave-qwen3-7-plus-a-screenshot-and-it-found-the-exact-pixel-to-click-for-0-40</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Chew Loong Nian &#8211; AI ENGINEER Originally published on Towards AI. I Gave Qwen3.7-Plus a Screenshot and It Found the Exact Pixel to Click for $0.40 I uploaded a messy AWS console screenshot and asked one question: which pixel do I click to launch an instance? The model came back with click at (x=1147, y=283). I overlaid that coordinate on the image. It landed dead center on the orange &#34;Launch instance&#34; button. Then I checked the price: $0.40 per million input tokens — one-sixth what Alibaba charges for the text-only Qwen3.7-Max, and the model scores 79.0 on ScreenSpot Pro, the benchmark that decides whether a &#34;computer use&#34; agent actually works. The author argues that successful “computer use” hinges on GUI grounding: given a screenshot and an instruction, the model must output exact pixel coordinates for the right UI element. They explain how Qwen3.7-Plus (a vision-capable variant that only outputs text) achieves a strong ScreenSpot Pro score (79.0), compare it to Qwen3.7-Max and other benchmarks, and show how to implement it quickly using Alibaba Cloud Model Studio via the OpenAI-compatible SDK. The article walks through four practical “glue” calls—(1) screenshot-to-JSON coordinates, (2) converting coordinates into real clicks with a confidence gate, (3) running an observe-act loop in Playwright for browser tasks, and (4) “screenshot to code” to recreate UI components. Finally, it discusses when to use Plus versus alternatives, highlights the key limitation that Plus is proprietary/API-only (no open weights or self-hosting), and concludes that it’s a cost-effective way to prototype frontier-grade screen grounding before moving to more polished managed or self-hostable solutions. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*qK2iPpPF-r92K0J8zDYhxA.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Beyond the Prompt: Why Autonomous AI Agents Are Replacing the Chatbot</title>
		<link>https://towardsai.net/p/machine-learning/beyond-the-prompt-why-autonomous-ai-agents-are-replacing-the-chatbot</link>
		
		<dc:creator><![CDATA[Suchit Majumdar]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 09:08:07 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/beyond-the-prompt-why-autonomous-ai-agents-are-replacing-the-chatbot</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Suchit Majumdar Originally published on Towards AI. Beyond the Prompt: Why Autonomous AI Agents Are Replacing the Chatbot In May 2025, Sebastian Siemiatkowski — the same Klarna CEO who fifteen months earlier had told the world that one OpenAI-powered assistant was doing the work of 700 customer service agents — quietly started hiring humans back. Bloomberg got the quote: “Cost unfortunately seems to have been a too predominant evaluation factor, what you end up having is lower quality.” Headcount over the same window went from 5,527 at the end of 2022 to 3,422 at the end of 2024, per the S-1 Klarna filed in November. The chatbot stayed. The “all-AI customer service” story did not. So the title of this piece is half a lie, and I want to correct it before you read another paragraph. Chatbots are not, in any general sense, being replaced by autonomous agents in 2026. The replacement is happening in one specific place: queue-shaped back-office work where no human is waiting on the other end, and almost nowhere else. That narrow claim is the thesis. The broad version is what every vendor deck says, and it is wrong. If you walked out of your last AI strategy review thinking the agent wave is about to subsume your support org, your sales org, and your engineering org all at once, you are about to spend the next four quarters defending a budget against numbers that will not arrive. That is the claim. The rest is me showing my work. Klarna is evidence for the thesis, in reverse The 2024 Klarna press release is worth re-reading with an engineer’s eye. 2.3 million conversations in month one across 35 languages. Resolution time from 11 minutes down to 2. A CSAT of 4.4 against a human baseline of 4.2, Klarna’s own number, never independently audited. OpenAI mirrored the case study on its own site. It was the most widely cited “AI replaced humans” deployment of the LLM era. It was also a chatbot. Not an agent. A user-initiated, real-time, conversational interface with safety rails and a handoff-to-human button. Gergely Orosz pointed this out at the time in his Pragmatic Engineer breakdown: what Klarna had actually built was L1 tier-one support automation, the kind of containment work IVR systems were doing twenty years ago, except now in natural language. The bot was a filter that escalated anything sharp. Then it broke on the seams chatbots always break on. The May 2025 reporting from CX Dive and CNBC converges on a single picture: hallucinations clustered on edge cases. CSAT cratered on emotional tickets where the bot was technically correct but tonally wrong, because being right and being heard are different jobs. Compliance teams refused to let an LLM autonomously close accounts. So Klarna kept the bot for volume and rebuilt the human layer underneath it, “Uber-style,” remote and flexible, hiring students and rural workers as on-demand specialists. Read that as a bull case for chatbots if you want. I read it as a warning about the entire customer-facing slice. The most aggressive chatbot deployment in the world, with founder-level air cover and a workforce reduction of nearly 2,000 people, still bounced off the part of the work where a customer was on the line and cared about being there. That isn’t a story about agents replacing chatbots. It’s a story about customer-facing conversation being a category that resists full automation by either shape of system. The spine of the argument: the meaningful axis isn’t conversational versus autonomous, it’s who triggers the work. Source: builder spec compiled from Klarna S-1, Intercom Fin published metrics, Lemonade 10-K (Q4 2024). Where the chatbot still wins, and it isn’t close Intercom Fin is the cleanest counter to the “agents will eat customer support” narrative. Self-reported resolution rate of 67% globally as of late 2025, on 40 million cumulative conversations, across more than 10,000 business accounts. Priced at $0.99 per resolved conversation. Intercom claims the human-agent comparison is $5 to $10 per query and I’ll flag that as a vendor-published number, not an audit — but Teneo’s 2025 cost analysis lands in roughly the same range ($8–$15 per fully-loaded human resolution), so the order of magnitude is real even if Intercom is choosing the friendly end. The caveats matter. “Resolution” is defined by Intercom: the customer exits, or affirms satisfaction, after Fin’s last answer. No public study correlates that signal with actual customer satisfaction. And the variance across accounts is enormous. One Intercom community thread in late 2025 had a customer reporting 27.6% resolution rate next to another at 80.1% over the same 12-week window, with the high performers being the ones who spent two to four weeks cleaning their knowledge base before launch. The published 67% is a marketing mean sitting on a long, ugly tail. But the unit economics survive every caveat. This is a working chatbot business, at scale, on user-initiated conversational work, with no agent loop in sight. If your Q3 roadmap involves wrapping Fin in a LangGraph orchestrator and rebranding it an “agentic support platform,” the question I would ask in your planning meeting is whether the additional dollars per resolution clear the additional tokens per resolution, because the LeanOps numbers I’ll get to below say they usually don’t. There’s also the Air Canada precedent from February 2024, when the BC Civil Resolution Tribunal made the airline liable for its chatbot’s incorrect bereavement-fare advice. The damages were small, roughly $650 CAD. The precedent is not. Any system, conversational or autonomous, that makes binding statements to a customer creates legal exposure, which is one more structural reason the production migration is happening where no customer sits on the other end of the conversation at all. What actually has to be true for an agent to pay for itself Strip away the framework news cycle. OpenAI Agents SDK in March 2025. Google ADK in April. LangGraph 1.0 in October. Anthropic computer use [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/0*rPvRhFF3tSxRhhP8.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Moonshot Cracked Claude Code&#8217;s Playbook with an MIT Terminal Agent and a $0.60 Model</title>
		<link>https://towardsai.net/p/machine-learning/moonshot-cracked-claude-codes-playbook-with-an-mit-terminal-agent-and-a-0-60-model</link>
		
		<dc:creator><![CDATA[Chew Loong Nian - AI ENGINEER]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 09:07:31 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/moonshot-cracked-claude-codes-playbook-with-an-mit-terminal-agent-and-a-0-60-model</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Chew Loong Nian &#8211; AI ENGINEER Originally published on Towards AI. Why this matters right now A Chinese lab just shipped a terminal coding agent that does almost everything Claude Code does, released the entire thing under the MIT license, and pointed it at a model that costs $0.60 per million output tokens. Claude Code’s default model, Opus 4.8, costs $25 for the same million tokens. That is roughly 42 times more expensive on the part of the bill that actually hurts. I spent the morning reading Moonshot’s repo line by line, pulling the install script apart, and trying to find the catch. The catch is smaller than you would expect. The article explains that Moonshot’s terminal agent is called Kimi Code CLI and is truly open to build and distribute under MIT, with an emphasis on how it differs from other “terminal coding agents” that are moving toward closed-source distribution. It details what the CLI does (single-binary install, a TUI, OAuth/API key login, and a workflow that reads/edits code and runs shell/web tasks), then highlights standout capabilities such as video input, conversational MCP configuration, built-in subagents, lifecycle hooks, and editor integration via an Agent Client Protocol. It argues the core value is not just the agent wrapper but the pricing and model loop: Kimi Code CLI runs on Kimi K2.6 open-weight models with dramatically cheaper output-token costs, making agent usage far more affordable than Claude Code’s and others’ more expensive proprietary setups. The piece compares how each incumbent approaches openness and licensing, concludes that Kimi Code CLI is an open alternative aligned with developers’ need to avoid lock-in and pricing shocks, and ends with a “verdict” that while the repo is still young and not for everyone (e.g., enterprise constraints), the direction toward open, MIT-licensed tooling is the most durable edge for 2026. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*3iPjmM0rYj5Kw44cmkqPPQ.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Connections, Roles, and Warehouses: Getting CoCo Desktop Production-Ready from Day One</title>
		<link>https://towardsai.net/p/machine-learning/connections-roles-and-warehouses-getting-coco-desktop-production-ready-from-day-one</link>
		
		<dc:creator><![CDATA[Satish Kumar]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 08:52:45 +0000</pubDate>
				<category><![CDATA[Data Engineering]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/connections-roles-and-warehouses-getting-coco-desktop-production-ready-from-day-one</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Satish Kumar Originally published on Towards AI. Connections, Roles, and Warehouses: Getting CoCo Desktop Production-Ready from Day One Snowflake COCO Desktop&#x007C; Part 1 of 8 There’s a moment every data engineer hits when first opening Snowflake’s CoCo Desktop: the welcome screen looks clean, the interface is polished, and then the connect step appears. And if your organization uses SSO, has multiple accounts, or runs a non-default role setup — that step is where things quietly fall apart. Most getting-started content for AI coding tools assumes the connection is the easy part. With CoCo Desktop, authentication is where you make architectural decisions that affect every subsequent session: which credentials get cached, which warehouse runs agent queries, which role the agent operates under. Getting it right upfront saves a lot of friction later. Getting it wrong means your agents either fail silently or run with more privileges than you intended. This is the first article in an 8-part series on Snowflake CoCo Desktop for data engineering teams. This one covers everything before the first prompt: installation, prerequisites, the onboarding flow, authentication options, connection management, and the decisions you’ll want to make consciously rather than by default. TL;DR CoCo Desktop requires a paid Snowflake account with Cortex Code enabled and the SNOWFLAKE.CORTEX_USERdatabase role — trial accounts won&#39;t work. Available for macOS and Windows only (no Linux desktop client). The 4-step onboarding flow (welcome → connect → mode → theme) is mostly intuitive, but the Connect step catches teams who rely on SSO without a configured default browser. OAuth is the right default for most users. Password auth is available but not recommended; key pair is best for service accounts. PAT and Workload Identity Federation are also supported for specialized use cases. Default Warehouse set via the UI persists both server-side on your Snowflake account and locally in connections.toml. That dual-write behavior matters when multiple team members share the same Snowflake user. The connections.toml file permissions (chmod 600) are a requirement on macOS/Linux, not just a best practice — Snowflake tools will refuse to read the file otherwise. What this doesn’t cover: how to configure roles for least-privilege agent use — that’s a permission modes topic covered in subsequent Article. Prerequisites: What You Need Before Installing Before downloading CoCo Desktop, confirm these requirements are met. Skipping this step is the most common source of “it connects but nothing works” issues. Account requirements: A paid Snowflake account (trial accounts are explicitly blocked — see the troubleshooting section below) Cortex Code must be enabled on the account Your user must have the SNOWFLAKE.CORTEX_USER database role (granted through PUBLIC by default, but your org may have revoked it) At least one supported model must be available to your account (check CORTEX_MODELS_ALLOWLIST) Platform requirements: macOS (Apple Silicon or Intel) or Windows Linux is not supported for the desktop client (use Cortex Code CLI instead) Network requirements: Network access to your Snowflake server If a model you need isn’t available in your region, an ACCOUNTADMIN must configure cross-region inference: ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = &#39;AWS_US&#39;; Replace AWS_US with the appropriate region identifier (AWS_EU, AWS_APJ, AZURE_US, or ANY_REGION). This is a common first-run blocker that looks like a connection failure but is actually a model availability issue. Quick prerequisite check — run this in any Snowflake worksheet to confirm readiness: SELECT CURRENT_USER() AS user, CURRENT_ROLE() AS role, CURRENT_WAREHOUSE() AS warehouse, CURRENT_ORGANIZATION_NAME() &#x007C;&#x007C; &#39;-&#39; &#x007C;&#x007C; CURRENT_ACCOUNT_NAME() AS account_identifier; If warehouse comes back NULL, you&#39;ll need to set a default before CoCo Desktop will execute agent queries. The Onboarding Flow Is Deceptively Simple Opening CoCo Desktop for the first time sends you through four screens: welcome, connect, mode, then theme. The first and last are cosmetic. The middle two are where the real work happens. The Connect step is where you either authenticate against an existing connection or create a new one. If you’ve already set up the Cortex Code CLI or Snowflake CLI, CoCo Desktop detects your ~/.snowflake/connections.toml automatically and shows your existing connections with a status dot. This is genuinely convenient — you don&#39;t have to re-enter anything. If you&#39;re starting fresh, you&#39;ll fill in an account identifier, a connection name, a username, and pick an authentication method. The account identifier format trips people up consistently. It follows the pattern orgname-accountname — not the Snowflake URL format, not the legacy account.region format that older tools use. You can find it at app.snowflake.com under your avatar → &#34;Connect a tool to Snowflake.&#34; You can also read it directly from your Snowsight URL: https://app.snowflake.com/orgname/accountname/. Worth bookmarking that path if you&#39;re setting up multiple team members. The Mode step asks whether to start in Agent mode or Editor mode. This is not a permanent decision — you can switch at any time — but the choice sets the default layout for your first session. Agent mode is optimized for parallel agent sessions across multiple workspaces; Editor mode is optimized for working with files while keeping agent sessions on the side. More on the practical difference between these in Article 2. One thing the onboarding flow doesn’t surface clearly: if your browser doesn’t open automatically for OAuth or SSO, there’s a “Browser didn’t open?” fallback link in the app. It’s easy to miss on first run and results in people assuming the connection failed when it just needs a manual URL copy. Authentication Methods: A Practical Decision Tree CoCo Desktop supports six authentication methods. The four primary ones cover most use cases; two additional methods serve specialized automation scenarios. Which one you choose should depend on your account’s security posture, not just what’s easiest to configure. &#x007C; Authentication Method &#x007C; Best For &#x007C; Credential Storage &#x007C; Notes &#x007C;&#x007C; ---------------------------------- &#x007C; ----------------------------------------- &#x007C; ----------------------------------- &#x007C; ------------------------------------------------------------------------------------------------------------ &#x007C;&#x007C; OAuth &#x2705; Recommended &#x007C; Most human users &#x007C; OS Keychain / DPAPI &#x007C; Add `client_store_temporary_credential = true`; otherwise re-authentication may be required on every launch. &#x007C;&#x007C; External Browser / SSO &#x007C; Organizations using Okta or Azure AD [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*gRDVLBp_qb7tZ7ZDxMqAxQ.png" medium="image"></media:content>
            	</item>
		<item>
		<title>My First $5,000 Month Writing About AI Engineering on Medium</title>
		<link>https://towardsai.net/p/machine-learning/my-first-5000-month-writing-about-ai-engineering-on-medium</link>
		
		<dc:creator><![CDATA[Anubhav]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 08:52:27 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/my-first-5000-month-writing-about-ai-engineering-on-medium</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Anubhav Originally published on Towards AI. My First $5,000 Month Writing About AI Engineering on Medium In May, my Medium earnings crossed $5,000 from writing about AI engineering. A month earlier, the same account had done 10.9K views. Two months earlier, it was at 3.9K. The jump looks like an overnight success if you only look at the final revenue screenshot. The tempting explanation is that two posts exploded and carried the entire month. That’s true, but it misses the point. If I had published the exact same viral posts on a brand new account, they would have spiked and died. Instead, they spiked and woke up fifteen other articles. Readers clicked on a post about local LLMs, finished it, looked at my profile, and clicked through three more posts on retrieval-augmented generation and agent architectures. The Month That Changed The Account March was quiet. I was publishing consistently and seeing small signals of life — a stray comment here, a highlighted code snippet there. The traffic was random. By mid-April, the compounding started to become visible. I wasn’t getting viral hits, but my baseline daily views were rising. I would publish a new article and older pieces would get a secondary wave of traffic. The catalog was waking up as readers who found one piece of my work started clicking through to the others. Entering the second week of May, the breakout happened. The recommendations stopped being random. The same kind of builder-focused reader kept showing up. When a post took off, the traffic spilled over into the rest of the catalog. I remember refreshing the dashboard on a Tuesday morning and seeing an old RAG post from March suddenly back on the daily charts, sitting right next to an article I had published twelve hours prior. The Wrong Lesson Would Be “Write Viral Posts” When people see a big month, they immediately try to reverse-engineer the biggest hits. Looking at my May dashboard, a few specific articles did most of the numbers. My guide on Claude Code setup pulled in 39,000 views and nearly 20,000 reads, earning about $1,360. A breakdown of the best local LLMs for coding did 39,000 views and 18,900 reads, bringing in just over $2,000 — my single highest-earning post. A curated list of 12 AI books brought in another 28,000 views, 12,800 reads, and $714. The obvious conclusion is that I should just write more listicles and setup guides. Outliers gave me the spike. But the spike only mattered because there were 20 other posts for those readers to land on next. When a developer clicked on the Claude Code setup guide, they got a useful tutorial. At the bottom of that page, however, they saw links to my other work. They found deep technical dives like “RAG Chunking That Works,” “Multi-Agent Systems” and a detailed comparison of LangGraph vs Temporal. If my profile had only contained generic AI news, that developer would have closed the tab. Because it contained dense engineering articles, they realized I was a builder. They hit the follow button, joined the email list, and clicked through to older posts. The Niche Was Narrower Than “AI” I drew a hard boundary around my topics. The niche was not artificial intelligence broadly, but AI engineering specifically. If I wrote about AI in general, I would be competing with news sites and generic content farms. I’d end up writing about how AI is going to change the future of work, which no engineer wants to read. Instead, I broke my niche down into very specific lanes. This specific mapping worked. It had enough search volume to matter, but the code snippets naturally filtered out people looking for ChatGPT prompt hacks. It was also deeply connected. If someone reads an article about RAG chunking, they are a natural fit for a piece about reranking or hybrid search. I had to stay disciplined here. Writing a great article about Python agents today and a generic productivity piece tomorrow would just dilute the exact reader base I was trying to build. Compounding Looked Boring Until It Was Obvious By late April, I realized that compounding in content creation looks like absolutely nothing for weeks. It looks like small, boring movements. I would publish an article and get 50 views. I’d publish another and get 70. Then, older posts started firing together. The pattern repeated every time. One new post would catch attention. Readers clicked on it, finished reading, and the recommendation engine pulled up an article from three weeks ago at the bottom of the page. The Claude Code spike directly helped my older AI coding workflow posts. The AI Books article drove traffic back to my roadmap pieces. The RAG Chunking article kept my technical RAG authority alive and fed traffic to my reranking guide. You cannot judge a technical article by its first week of traffic, because an article about Python agents might sit dead for a month until a related LangGraph piece suddenly revives it. Technical Deep Dives Worked, But Not Alone I used to think you had to pick: deep tutorials or broad overviews. May taught me you need both, for different reasons. Deep technical dives proved my competence. My article on RAG Chunking was dense, filled with regex patterns for markdown parsing and detailed explanations of semantic boundaries. It didn’t go viral. It did 2,700 views and 1,100 reads. But the people who clicked it read it, holding a read ratio around 40%. Meanwhile, my deep-dive architectural comparison of LangGraph vs Temporal did only 959 views, but it earned $119.91. That is a high earnings-per-view ratio. It monetized well because it answered a specific architectural question that senior engineers were searching for. They were trying to decide between state machines and durable execution for their agent orchestration, and they spent ten minutes reading every word of that post. The biggest discovery came from posts with broader entry points. [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*YbvplMy15kwvnNK1WrW0GQ.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn&#8217;t Be This Good</title>
		<link>https://towardsai.net/p/machine-learning/google-shrank-gemma-4-by-72-and-unsloth-fixed-the-4-bit-bug-nobody-else-caught-on-one-4090-and-4-bit-shouldnt-be-this-good</link>
		
		<dc:creator><![CDATA[Chew Loong Nian - AI ENGINEER]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 08:52:08 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/google-shrank-gemma-4-by-72-and-unsloth-fixed-the-4-bit-bug-nobody-else-caught-on-one-4090-and-4-bit-shouldnt-be-this-good</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Chew Loong Nian &#8211; AI ENGINEER Originally published on Towards AI. Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn&#39;t Be This Good A 26-billion-parameter model has no business fitting in 15GB of memory and spitting out 193 tokens a second on a single consumer GPU. That is laptop-and-gaming-rig territory, not a datacenter. Yet that is exactly what Google’s new Gemma 4 QAT checkpoints do, and after digging into how they pulled it off, the part that stuck with me is not the speed. It is that the 4-bit version barely loses anything compared to the full-precision original. By every law of quantization I thought I understood, it should be noticeably dumber. It isn’t. After the lead, the article breaks down why Gemma 4 QAT + Unsloth’s GGUF conversion is unusually effective: it quantizes during training so the model learns to be robust to 4-bit rounding, explains the typical PTQ quality loss, and describes how Unsloth fixes a subtle scale-mismatch bug that otherwise wipes out most of the benefit when converting to llama.cpp formats. It then provides concrete performance and memory numbers for different Gemma 4 variants (especially the 26B-A4B mixture-of-experts model), compares naive vs dynamic conversion accuracy, and summarizes the practical steps to run the model with llama.cpp, plus other deployment options (API server, Ollama/LM Studio, Unsloth Studio, vLLM/SGLang, MLX, and browser ONNX). Finally, it offers guidance on which model to choose based on available hardware, notes the remaining caveat that 4-bit is still 4-bit, and concludes that the usual quality-vs-speed tradeoff is collapsing—making the 26B-A4B feel like a near big-model experience on consumer GPUs. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*-clSjTvfOU6v5IVtXkVFyw.png" medium="image"></media:content>
            	</item>
		<item>
		<title>LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents</title>
		<link>https://towardsai.net/p/machine-learning/langchain-explained-understanding-models-prompts-chains-memory-indexes-and-agents</link>
		
		<dc:creator><![CDATA[Atul Kumar]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 08:43:59 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/langchain-explained-understanding-models-prompts-chains-memory-indexes-and-agents</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Atul Kumar Originally published on Towards AI. LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents Large Language Models (LLMs) such as GPT, Gemini, and Claude have made it easier than ever to build intelligent applications. However, developing production-ready AI systems often requires much more than simply calling an API. This is where LangChain comes in In this article, we’ll explore the core components of LangChain and understand why they are important. Introduction to Langchain : LangChain is an open-source framework that simplifies the development of applications powered by Large Language Models. It provides a collection of components that help developers connect LLMs with external data, memory, tools, and workflows. Benefits of LangChain: 1. Model Agnostic LangChain provides a unified interface for different LLM providers such as OpenAI, Gemini, and Claude. This makes it easier to switch models without rewriting large portions of code. 2. Reusable Prompt Templates Instead of hardcoding prompts, developers can create dynamic and reusable prompt templates, improving maintainability and scalability. 3. Simplified AI Workflows Chains allow multiple operations to be connected, making complex workflows easier to build and manage. 4. Context-Aware Applications Memory enables applications to remember previous interactions, creating more natural and personalized user experiences. 5. Efficient Knowledge Retrieval Indexes and vector databases allow applications to retrieve relevant information from large datasets, improving response accuracy. 6. Autonomous Decision Making Agents can dynamically decide which tools or actions to use, enabling the development of intelligent AI systems. 7. Faster Development LangChain provides ready-to-use components, reducing development time and allowing developers to focus on application logic rather than infrastructure. What Can We Build Using LangChain? 1. AI Chatbots Build conversational assistants capable of answering questions and maintaining context throughout a conversation. 2. Customer Support Systems Create intelligent support agents that can access company knowledge bases and provide accurate responses. 3. Retrieval-Augmented Generation (RAG) Applications Build systems that retrieve information from documents and use LLMs to generate context-aware answers. 4. Document Question-Answering Systems Allow users to upload PDFs, research papers, or reports and ask questions about their contents. 5. AI Agents Develop autonomous agents that can use tools, APIs, databases, and search engines to complete tasks. 6. Personal AI Assistants Create assistants that manage schedules, answer questions, summarize information, and perform actions on behalf of users. 7. Content Generation Tools Generate blogs, social media posts, emails, reports, and marketing content automatically. 8. Recommendation Systems Use embeddings and semantic search to recommend products, articles, courses, or videos. 9. Research Assistants Build AI systems that search, summarize, and analyze information from multiple sources. 10. Multi-Agent Systems Create multiple specialized agents that collaborate to solve complex problems and automate workflows. Components Of LangChain:- LangChain Components 1. Models Models are the core interfaces through which we interact with AI models. It is one of the core building blocks of a LangChain application. Problem: Different AI providers, such as Anthropic, OpenAI, and Google Gemini, use different SDKs and API formats. If you write code directly for one provider, switching to another often requires changing significant parts of your code. Solution: LangChain’s Model abstraction provides a common interface for interacting with different LLMs. Instead of writing provider-specific code, we write against LangChain’s standardized API. Models In LangChain 2. Prompts A Prompt is the fundamental text input or instruction provided to a Large Language Model (LLM). Problem: LLMs perform best when prompts are well — structured properly. But hardcoding prompts every time can lead to: Repeated code Inconsistent outputs Difficult maintenance Solution: The Prompt component in LangChain provides reusable and dynamic templates that insert user inputs into predefined instructions before sending them to the model. Prompting Techniques: Dynamic Prompting: Prompt changes based on user input or variable. Example of Dynamic Prompting Use Case: Personalised responses, Chatbots, AI Applications. 2. Role-Based Prompting: Assign a role or persona to the model. Example of Role based Prompting Use Case: Expert Advice, Tutoring, Coding Assistance. 3. Few-Shot Prompting: It provides an example so the model learns the desired pattern and generates the desired output for a new input. Example of Few-Shot Prompting Use Case: Classification, Extraction, Formatting Tasks. 3. Chains A Chain in LangChain connects individual components like prompts, LLMs, and output parsers into a seamless, automated workflow. Let Say, NLP Pipeline We can represent the flow using pipelines; if not, we have to manually take the output of one and push it as input to another Problem: Many GenAI applications require multiple steps, not just a single Large Language Model (LLM) call. Without chains, we would have had to manually connect all the steps. Solution: Chains connect multiple LangChain components into a single workflow, in which the output of one component becomes the input to the next. Types of Chains: Sequential Chain: Steps executed one after another. Parallel Chain: Multiple tasks run simultaneously. RAG Chain: Most common in Industry( Chat with PDF, Company Knowledge base) Types Of Chains 4. Memory Problem: By Default, “LLM API calls are stateless” means they do not remember previous conversations. Solution: Memory stores conversation history or important information and automatically provides it to the LLM when needed. How Memory Works? Working of Memory Types of Memory: Conversation Buffer Memory: Stores the entire conversation. Simple, but becomes large over time. Best for short conversations and prototyping. Conversation Buffer Memory 2. Conversation Buffer Window Memory: Limit memory to only the last K messages. Best for maintaining the recent conversational context while keeping the token usage predictable and preventing the context window from overflowing. 3. Summarize Based Memory: Periodically summarize older chat segments to keep a condensed memory footprint. 4. Custom Memory: For advanced use cases, we can store specialized state, e.g., the user&#39;s preference or key facts about them, in a common memory class &#x1F4A1;Key Takeaway: Choose the right memory type based on your use case, conversation length, and token limit to build an efficient and context-aware AI application. 5. Indexes Indexes in LangChain connect your application to external Knowledge, such as PDFs, [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*1XqwcxRy04lZnW1_no8Avg.png" medium="image"></media:content>
            	</item>
		<item>
		<title>TOON: Beyond JSON for LLMs</title>
		<link>https://towardsai.net/p/machine-learning/toon-beyond-json-for-llms</link>
		
		<dc:creator><![CDATA[Sourav Ghosh]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 08:42:51 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/toon-beyond-json-for-llms</guid>

					<description><![CDATA[Last Updated on June 8, 2026 by Editorial Team Author(s): Sourav Ghosh Originally published on Towards AI. Is JSON Finally Getting a Token-Efficient Alternative for LLMs? For years, JSON has been the default language for APIs, integrations, configuration files, event payloads, and all other types of application-to-application communications. It is an easy language to understand, it is very robust and developers can easily exploit it. But when we transition from traditional software systems to Large Language Model applications, we start to see how JSON comes with an invisible price tag. LLMs do not process JSON the way that applications do. They handle it as tokens. The article explains why JSON becomes token-expensive for LLMs—repeated keys, syntax, and nested structure consume context window and increase cost—then introduces TOON (Token-Oriented Object Notation) as a more token-efficient, prompt-friendly way to represent structured data while preserving the same underlying data model (objects, arrays, strings, numbers, booleans, null). It shows a before/after example converting JSON arrays of records into TOON where field names are declared once, values are arranged in rows, and structure remains readable for the model. The piece argues TOON is especially valuable at the LLM boundary when payloads share a uniform schema with repeated records (common in RAG retrieval results, agent tool outputs, and agent memory), and it provides enterprise scenarios plus code/prompt patterns illustrating how to use TOON as LLM input while keeping JSON for validated outputs. Finally, it outlines best practices and cautions: don’t replace JSON everywhere, use TOON only where it fits (and validate outputs), benchmark against JSON, consider tooling/model reliability and escaping edge cases, and treat TOON as an optimization layer for context representation rather than an enterprise contract substitute. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*Q30an2gE-vcZTJiXojXlCQ.png" medium="image"></media:content>
            	</item>
	</channel>
</rss>
