<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:media="http://search.yahoo.com/mrss/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Towards AI</title>
	<atom:link href="https://towardsai.net/feed" rel="self" type="application/rss+xml" />
	<link>https://towardsai.net</link>
	<description>Making AI accessible to all</description>
	<lastBuildDate>Tue, 19 May 2026 16:55:22 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://towardsai.net/wp-content/uploads/2019/05/cropped-towards-ai-square-circle-png-32x32.png</url>
	<title>Towards AI</title>
	<link>https://towardsai.net</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Your AI Agent Works Perfectly in the Demo. Here Are the 6 Ways It Dies in Production.</title>
		<link>https://towardsai.net/p/machine-learning/your-ai-agent-works-perfectly-in-the-demo-here-are-the-6-ways-it-dies-in-production</link>
		
		<dc:creator><![CDATA[Vinamra Yadav]]></dc:creator>
		<pubDate>Tue, 19 May 2026 09:50:24 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/your-ai-agent-works-perfectly-in-the-demo-here-are-the-6-ways-it-dies-in-production</guid>

					<description><![CDATA[Last Updated on May 19, 2026 by Editorial Team Author(s): Vinamra Yadav Originally published on Towards AI. The demo worked perfectly. You ran it twenty times. You showed it to your team. You showed it to your CTO. Every prompt returned exactly the right output. Then you deployed it. Three days later, a customer reported that the agent gave them completely wrong information — confidently, without any error. Your logs showed HTTP 200s all the way down. Your monitoring reported zero errors. The agent had been silently hallucinating for 72 hours, and nothing in your infrastructure had noticed. This is not a model quality problem. The model was doing exactly what models do. This is an architecture problem — and it’s the problem nobody writes about, because it only becomes visible after you’ve already deployed. I’ve spent the last year building and reviewing AI agent systems in production. The failure taxonomy is consistent. There are six ways an AI agent dies in production, and almost none of them show up in a demo. The math that should terrify you Before the taxonomy, one number worth sitting with. If your agent achieves 85% accuracy per step — which is a good number, better than many production systems — and your workflow has 10 steps, the probability of completing that workflow successfully is 0.8⁵¹⁰ = 19.7%. In this simplified model — where steps are independent and success is binary — roughly eight out of ten workflows fail despite each individual step being “pretty good.” Real failure modes are messier than this, and steps are rarely fully independent. But the model captures the architecture problem accurately: multi-step workflows compound failure. The only way out is to build failure handling into every step, not just the last one. Now, the six failure modes. Failure 1: Context degradation In a multi-step agent workflow, the model doesn’t remember what happened two steps ago — you send it. Every API call includes the entire conversation history. And that history grows with every step. What engineers miss: context doesn’t just grow, it degrades. Datadog’s 2026 State of AI Engineering report documents the pattern precisely: the average token count in production agent workflows more than doubled year-over-year for median-use teams, and quadrupled for heavy users. As context grows, the original instruction becomes diluted — newer tool outputs and summaries crowd out the early reasoning, and the agent continues confidently on increasingly corrupted signal. When this drift surfaces, there is no owner to page, no baseline to compare against, no runbook to execute. It surfaces as a customer complaint. The agent doesn’t tell you this is happening. The outputs become subtly wrong in ways that are nearly impossible to detect without evaluation tooling. The pattern that makes it worse: engineers build agents that pass outputs between steps as plain text summaries. The model summarises step 3’s output, passes the summary to step 4, which summarises again for step 5. Each summarisation is a lossy compression. By step 8, you’re acting on a summary of a summary of a summary of the original instruction. The fix: preserve structured outputs between steps, not prose summaries. Use typed data contracts between agent steps rather than natural language handoffs. # Instead of this — lossy text handoffresult = agent.run(&#34;Summarize what you found and pass it to the next step&#34;)# Do this - structured contract between steps@dataclassclass StepResult: extracted_entities: list[str] confidence_scores: dict[str, float] raw_source_ids: list[str] # preserve provenance step_number: int Failure 2: Silent failures This is the one that keeps engineers up at night, because it’s the one you don’t know about. Traditional monitoring is completely blind to agent failures. An agent that hallucinates a confident wrong answer still returns HTTP 200. Latency stays normal. Error rate stays at zero. Your dashboards are green. Your Slack alerts are quiet. Latitude’s production observability research documents the pattern clearly: “Tool misuse is the most common agent-specific failure mode in production — and the most insidious: a single malformed argument at step 2 silently corrupts every subsequent step that depends on that output.” The agent calls a tool with incorrect arguments, selects the wrong tool for the task, or fails to handle a tool error and continues as if the call succeeded. The classic scenario: a customer support agent that answers questions about account status. In testing, all queries are clean, structured English. In production, queries are messy, multilingual, emotionally charged. The agent returns plausible wrong answers with normal latency and HTTP 200s. The only signal is a customer escalation — which arrives hours or days after the degradation began. The fix: add a lightweight LLM evaluator layer that scores every agent output before it reaches the user. Not a human in the loop — a small, fast model that checks three things: is this response relevant to the query? Does it contradict the source data? Does the confidence language match what the retrieval actually returned? async def evaluate_before_returning(query: str, response: str, sources: list) -&#62; dict: evaluation_prompt = f&#34;&#34;&#34; Query: {query} Response: {response} Sources consulted: {sources} Score on three dimensions (0-1 each): - relevance: does the response answer the actual query? - grounding: is the response supported by the sources? - calibration: does the certainty language match source quality? &#34;&#34;&#34; score = await fast_evaluator.run(evaluation_prompt) if score[&#34;grounding&#34;] &#60; 0.7: raise AgentQualityError(&#34;Response not grounded in retrieved sources&#34;) return score Failure 3: Tool execution schema drift Your agent calls tools — APIs, database queries, internal services. Those tools change. When they change, your agent doesn’t know. This is the API version problem in disguise. An agent calling a tool doesn’t validate that the tool’s response schema matches what it was trained or prompted to expect. When a third-party API updates their response format, or when your internal service adds a required field, or when an OAuth token expires — the agent receives a malformed or empty response and, depending on how you’ve built it, either hallucinates a plausible-looking answer from the gap, or enters a retry loop. Datadog’s 2026 State [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*w-Q6MjQk1UGHhIa9gmpHwg.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Unleashing the Power of ONNX for Speedier SBERT Inference</title>
		<link>https://towardsai.net/p/machine-learning/unleashing-the-power-of-onnx-for-speedier-sbert-inference</link>
		
		<dc:creator><![CDATA[Swaraj Patil]]></dc:creator>
		<pubDate>Tue, 19 May 2026 09:05:32 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/unleashing-the-power-of-onnx-for-speedier-sbert-inference</guid>

					<description><![CDATA[Last Updated on May 19, 2026 by Editorial Team Author(s): Swaraj Patil Originally published on Towards AI. SBERT, also known as Sentence-Bert, is a widely used approach for obtaining sentence embeddings that aim to retain the contextual information within the sentences. However, generating these embeddings can be slow when dealing with large amounts of data. To address this, one option is to utilize batch-based encoding to accelerate the inference. However, this may not necessarily reduce the inference time. In this Medium blog post, we will explore the application of the ONNX (Open Neural Network Exchange) framework and how it aids in reducing the inference time of the model. P.S. This article does not delve into the internal workings of ONNX. For more in-depth information, please consult the official ONNX documentation. Let’s begin by installing the import libraries. We can use pip for the installation of ONNX pip install onnxpip install onnxruntime-gpupip install transformerspip install torch Once ONNX is installed we verify it using the below snippet import onnxprint(onnx.__version__) In order to obtain sentence embeddings, we will utilize the IMDB dataset sourced from Kaggle. Specifically, we will focus on the “Overview of Movie” column to generate embeddings using SBERT. The time needed to create embeddings will be determined for the 1000 sentences present in the dataset. We will perform two experiments here on both CPU and GPU Inference time for 1000 sentences using Vanilla SBERT (CPU). Inference time for 1000 sentences using ONNX converted SBERT (CPU). Inference time for 1000 sentences using Vanilla SBERT (GPU). Inference time for 1000 sentences using ONNX converted SBERT (GPU). The Sentence BERT model that we would consider here is all-MiniLM-L6-v2 We can invoke the Sentence BERT model from the Hugging Face Library and the Sentence Transformer Library. The output embeddings from both the library will be the same. For our experiments, we will use the Hugging Face library. Remember that when we use the Hugging Face library after obtaining the embeddings, additional post-processing could be needed such as Pooling or Normalization. The different steps can be obtained from the model page on Hugging Face. Perform those steps to get final sentence embeddings. Let&#39;s first convert the model to ONNX format. # # Load pretrained model and tokenizerfrom transformers import AutoModel, AutoTokenizermodel_name = &#34;sentence-transformers/all-MiniLM-L6-v2&#34;tokenizer = AutoTokenizer.from_pretrained(model_name, do_lower_case=True )model = AutoModel.from_pretrained(model_name )#Mean Pooling - Take attention mask into account for correct averagingdef mean_pooling(model_output, attention_mask): token_embeddings = model_output[0] #First element of model_output contains all token embeddings input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() temp = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) return F.normalize(temp, p=2, dim=1)# Get the first example data to run the model and export it to ONNXsample = [&#39;Hey, how are you today?&#39;]inputs = tokenizer(sample, padding=True, truncation=True, return_tensors=&#34;pt&#34; )## Convert Model to ONNX Formatimport osimport torchdevice = torch.device(&#34;cpu&#34;)# Set model to inference mode, which is required before exporting # the model because some operators behave differently in# inference and training mode.model.eval()model.to(device)output_dir = os.path.join(&#34;.&#34;, &#34;onnx_models&#34;)if not os.path.exists(output_dir): os.makedirs(output_dir)export_model_path = os.path.join(output_dir, &#39;all_MiniLM_L6-v2.onnx&#39;)with torch.no_grad(): symbolic_names = {0: &#39;batch_size&#39;, 1: &#39;max_seq_len&#39;} torch.onnx.export(model, # model being run args=tuple(inputs.values()), # model input (or a tuple for multiple inputs) f=export_model_path, # where to save the model (can be a file or file-like object) opset_version=11, # the ONNX version to export the model to do_constant_folding=True, # whether to execute constant folding for optimization input_names=[&#39;input_ids&#39;, # the model&#39;s input names &#39;attention_mask&#39;, &#39;token_type_ids&#39;], output_names=[&#39;start&#39;, &#39;end&#39;], # the model&#39;s output names dynamic_axes={&#39;input_ids&#39;: symbolic_names, # variable length axes &#39;attention_mask&#39; : symbolic_names, &#39;token_type_ids&#39; : symbolic_names, &#39;start&#39; : symbolic_names, &#39;end&#39; : symbolic_names}) print(&#34;Model exported at &#34;, export_model_path) Now that we have converted the Sentence BERT Model. Let’s get the stats for the models. Vanilla SBERT (CPU) The inference time obtained for the Vanilla SBERT model on the CPU can be found using the snippet below. import timeimport pandas as pdimport numpy as npfrom tqdm import tqdmdf = pd.read_csv(&#39;./imdb_top_1000.csv&#39;, usecols=[&#39;Overview&#39;])total_samples = len(df)latency = []outputs_cpu = []with torch.no_grad(): for i in tqdm(range(total_samples)): data = [df.loc[i, &#34;Overview&#34;]] inputs = tokenizer(data, padding=True, truncation=True, return_tensors=&#34;pt&#34; ) start = time.time() outputs_cpu.append(mean_pooling(model(**inputs), inputs[&#39;attention_mask&#39;] ).cpu().detach().numpy()) latency.append(time.time() - start)print(&#34;\n&#34;)print(&#34;PyTorch {} Inference time = {} ms&#34;.format(device.type, np.round(np.average(latency)*1000, 4))) 100%&#x007C;██████████&#x007C; 1000/1000 [00:36&#60;00:00, 27.62it/s] PyTorch cpu Inference time = 34.2605 ms ONNX Converted SBERT (CPU) The inference time obtained for the ONNX SBERT model on the CPU can be found using the below snippet. import onnxruntimeimport numpy as npsess_options = onnxruntime.SessionOptions()session = onnxruntime.InferenceSession(export_model_path, sess_options, providers=[&#39;CPUExecutionProvider&#39;])latency = []ort_outputs_cpu = []for i in tqdm(range(total_samples)): data = [df.loc[i, &#34;Overview&#34;]] inputs = tokenizer(data, padding=True, truncation=True, return_tensors=&#34;pt&#34; ) ort_inputs = {k:v.cpu().numpy() for k, v in inputs.items()} start = time.time() op = session.run(None, ort_inputs) op = torch.from_numpy(op[0]) ort_outputs_cpu.append(mean_pooling([op], inputs[&#39;attention_mask&#39;] ).cpu().detach().numpy()) latency.append(time.time() - start)print(&#34;\n&#34;)print(&#34;OnnxRuntime {} Inference time = {} ms&#34;.format(device.type, np.round(np.average(latency)*1000, 4))) 100%&#x007C;██████████&#x007C; 1000/1000 [00:16&#60;00:00, 60.80it/s] OnnxRuntime cpu Inference time = 15.5696 ms Outputs outputs_cpu[0][:,:10] ## Vanilla SBERT CPU Outputarray([[-0.06326339, 0.0414625 , -0.04707527, -0.03361899, -0.02562934, 0.03499832, 0.00804075, -0.05042004, 0.00215668, -0.03816812]], dtype=float32) ort_outputs_cpu[0][:,:10] ## Onnx SBERT CPU Outputarray([[-0.06326343, 0.04146247, -0.04707528, -0.033619 , -0.02562926, 0.03499835, 0.0080408 , -0.05042008, 0.00215669, -0.03816817]], dtype=float32) Vanilla SBERT (GPU) The inference time obtained for the Vanilla SBERT model on the GPU can be found using the snippet below. device = torch.device(&#34;cuda&#34;)# Set model to inference mode, which is required before exporting # the model because some operators behave differently in# inference and training mode.model.eval()model.to(device)total_samples = len(df)latency = []outputs_gpu = []with torch.no_grad(): for i in tqdm(range(total_samples)): data = [df.loc[i, &#34;Overview&#34;]] inputs = tokenizer(data, padding=True, truncation=True, return_tensors=&#34;pt&#34; ).to(device) start = time.time() outputs_gpu.append(mean_pooling(model(**inputs), inputs[&#39;attention_mask&#39;]).cpu().detach().numpy()) latency.append(time.time() - start)print(&#34;\n&#34;)print(&#34;PyTorch {} Inference time = {} ms&#34;.format(device.type, np.round(np.average(latency)*1000, 4))) 100%&#x007C;██████████&#x007C; 1000/1000 [00:07&#60;00:00, 135.29it/s] PyTorch cuda Inference time = 6.737 ms ONNX Converted SBERT (GPU) The inference time obtained for the ONNX SBERT model on the GPU can be found using the snippet below. import onnxruntimeimport numpy as npsess_options = onnxruntime.SessionOptions()session = onnxruntime.InferenceSession(export_model_path, sess_options, providers=[&#39;CUDAExecutionProvider&#39;])latency = []ort_outputs_gpu = []for i in tqdm(range(total_samples)): data = [df.loc[i, &#34;Overview&#34;]] inputs = tokenizer(data, padding=True, truncation=True, return_tensors=&#34;pt&#34; ).to(device) ort_inputs = {k:v.cpu().numpy() for k, v in inputs.items()} start = time.time() op = session.run(None, ort_inputs) op = torch.from_numpy(op[0]) ort_outputs_gpu.append(mean_pooling([op], inputs[&#39;attention_mask&#39;].cpu()).cpu().detach().numpy()) latency.append(time.time() - start)print(&#34;\n&#34;)print(&#34;OnnxRuntime {} Inference time = {} ms&#34;.format(device.type, np.round(np.average(latency)*1000, 4))) 100%&#x007C;██████████&#x007C; 1000/1000 [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*CXfP-CzFyF9rV9OlGT1ScA.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Terraform vs CI/CD for Serverless Deployments</title>
		<link>https://towardsai.net/p/machine-learning/terraform-vs-ci-cd-for-serverless-deployments</link>
		
		<dc:creator><![CDATA[Simon Corde]]></dc:creator>
		<pubDate>Mon, 18 May 2026 17:01:02 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/terraform-vs-ci-cd-for-serverless-deployments</guid>

					<description><![CDATA[Last Updated on May 19, 2026 by Editorial Team Author(s): Simon Corde Originally published on Towards AI. I passed the Terraform certification some month ago. While learning terraform, I quickly found myself willing to provision all I can through terraform. Turned out a bad idea. Here is why. Do you use Terraform for deploying helm charts ? Terraform vs CI/CD for Serverless Deployments 1. The confusion between infra and app lifecycle Infrastructure lifecycle and application lifecycle differ Infrastructure change slowly whereas application releases can appear multiple times a day. For instance, while you are satisfied with your network configuration, you will not change it much at during next year. On the other hand, if your release of this morning for your application has broken some feature despite all the tests you have put in place, you want to rollback to the previous version quickly. Therefore the changes you make to application have to be released much more frequently than pure infrastructure. You cannot handle both in the same place. 2. What Terraform is GREAT at Terraform is an open source Infrastructure As Code open source tool created by Hashicorp. It allows to define your infrastructure, provision and manage it. It has seen a huge adoption in the industry though the years. It allows you to automated deployment for different providers, namely the cloud ones like AWS, Azure, GCP but also other PaaS like Heroku or even github or gitlab. Today many tools have their Terraform provider to allow you to declare your configuration in files and communicate with their apis. The 1 M$ question really is what do I put in Terraform ? Terraform shines at defining your desired state in terms of configuration for the following: networking IAM buckets SQL data base secrets DNS load balancer These infrastructure components define the core of your systems. Once they are setup, you rarely update them. 3. Why application deployment is different Over the past few years, the tech industry has become significantly more competitive. And this shift has only accelerated with the arrival of AI-powered coding assistants. Today, development velocity is no longer a nice-to-have — it is a core requirement. Feature teams can no longer afford to wait weeks or even a full month to deliver new functionality. In many organizations, it is now common to ship multiple features per week, and in some cases, even multiple times per day. This level of speed requires a fundamental change in how software is delivered. Promoting application images across environments must be fast, reliable, and fully automated in order to support rapid testing and release cycles. Equally important is the ability to roll back just as quickly when something goes wrong. In this model, deployment is no longer just about pushing changes to production — it is about enabling continuous, safe, and reversible delivery at high velocity. 4. The Cloud Run example Let us take as a backbone example the serverless platform in Google Cloud named Cloud Run. This is a container as a service platform which can run at scale. We are going to compare the workflows to deploy an application through Terraform vs through Continuous Deployment leveraging github actions. &#x1F4A1;All material found in this article can be found in my repository at https://github.com/Redcart/serverless-deployment. &#x1F6E0; Terraform example In the following piece of Terraform configuration, we are defining our cloud run service. Once you are satisfied with this configuration, you just need to run terraform apply. But now, if you need to promote a new version of your app ? You first need to build and push the new docker image of your code in Artifact Registry. Second you have to pick up the righ name and change it in the variable named below container_image. Eventually run terraform apply. resource &#34;google_cloud_run_v2_service&#34; &#34;cloud-run-service&#34; { name = var.cloud_run_service_name location = var.region ingress = &#34;INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER&#34; deletion_protection = false template { service_account = var.sa_cloud_run containers { image = var.container_image env { name = &#34;GCP_PROJECT_ID&#34; value = var.project_id } env { name = &#34;INPUT_DATASET&#34; value = var.input_dataset } env { name = &#34;OUTPUT_DATASET&#34; value = var.output_dataset } env { name = &#34;API_KEY&#34; value_source { secret_key_ref { secret = var.api_key_secret_name version = var.api_key_secret_version } } } } }} &#x1F6E0; CI/CD in github actions example In the following piece of configuration code, we are deploying through gcloud CLI in github actions a cloud run service. If we want to deploy a new version of the app, we just have to commit and push the change in the github remote repository, Then the CI/CD will be triggered, the docker image built and pushed to Artifact Registry, the cloud run service automatically updated with this new docker image. All in one step, from one single repository. IMAGE=${{ env.GCP_REGION }}-docker.pkg.dev/${{ env.GCP_PROJECT_ID }}/cloud-run/app:${{ github.sha }}gcloud run deploy ${{ env.PROJECT }} \ --image=$IMAGE \ --region=${{ env.GCP_REGION }} \ --platform=managed \ --memory=1Gi \ --no-allow-unauthenticated \ --ingress=internal-and-cloud-load-balancing \ --service-account=${{ env.SERVICE_ACCOUNT_RUNTIME }} \ --set-env-vars GCP_PROJECT_ID=${{ env.GCP_PROJECT_ID }} \ --set-env-vars INPUT_DATASET=${{ env.INPUT_DATASET }} \ --set-env-vars OUTPUT_DATASET=${{ env.OUTPUT_DATASET }} \ --set-secrets API_KEY=API_KEY:latest 5. Terraform pain points for deployments While we have seen in the previous section that deploying a cloud run service through terraform can be easy, it has some limitations. First, as discussed earlier, it mixes infra and app concerns. In some teams, people in charge of infra and app are not the same. You can encounter lack of velocity especially if you need to quickly rollback because of a pending apply (a colleague has done a plan but has not confirmed the apply yet). You can ran into state locking, drift is more likely to happen and interfer with the behavior of your app in production. The plan of such terraform workspaces can become very noisy and take a while. &#x1F4CC; To overcome some of these limitations, one may tempted to say ok, I will have more terraform workspaces to not be bothered by other pending apply or long plan. But here again, do not underestimate the complexity and challenge to maintain a bunch of workspaces. [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*VUm2I7qDqhmMhHCcvETraA.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Merve Noyan Stopped Writing Training Scripts — Her Agent Just Fine-Tuned 18 Models Solo for $11.40</title>
		<link>https://towardsai.net/p/machine-learning/merve-noyan-stopped-writing-training-scripts-her-agent-just-fine-tuned-18-models-solo-for-11-40</link>
		
		<dc:creator><![CDATA[Chew Loong Nian - AI ENGINEER]]></dc:creator>
		<pubDate>Mon, 18 May 2026 10:08:35 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/?p=51193</guid>

					<description><![CDATA[Author(s): Chew Loong Nian &#8211; AI ENGINEER Originally published on Towards AI. The 17,300-view AI Engineer Singapore talk that quietly killed half my MLOps job I watched Merve Noyan’s “Your Agent Can Now Train Models” talk three times this week. It went up on the AI Engineer channel three days ago, hit 17,300 views in 72 hours, and now sits as the second-most-watched talk on the entire @aiDotEngineer feed — beaten only by the “CI/CD Is Dead” pitch from Hugo Santos two slots above it. Both are screaming the same thing in different keys: the loop where a human writes a training script, picks a GPU, watches loss curves, and pushes a checkpoint is about to look as quaint as configuring Tomcat by hand. Claude reads the dataset card during a live demo at AI Engineer Singapore.The article discusses how Merve Noyan’s AI Engineer talk illustrated the automation of MLOps processes, highlighting the capabilities of the new huggingface-llm-trainer skill. This skill allows users to fine-tune AI models with minimal human intervention, showcasing significant improvements in efficiency and cost-saving in model training. The author recounts personal experiences with the skill, detailing various training tasks and costs involved, and concludes that while automation is transforming the MLOps landscape, the need for human understanding of the training process remains essential. The piece advocates for a shift in MLOps roles towards oversight rather than manual scripting. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py" medium="image"></media:content>
            	</item>
		<item>
		<title>660 AI Agents Ran 27,000 Experiments. Their Biggest Discovery Was a 2015 Textbook Result.</title>
		<link>https://towardsai.net/p/machine-learning/660-ai-agents-ran-27000-experiments-their-biggest-discovery-was-a-2015-textbook-result</link>
		
		<dc:creator><![CDATA[Vektor Memory]]></dc:creator>
		<pubDate>Mon, 18 May 2026 09:40:12 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/?p=51195</guid>

					<description><![CDATA[Author(s): Vektor Memory Originally published on Towards AI. On Hyperspace, basic swarms, the math nobody wrote down, and why we built the thing they were missing in a single afternoon. Join us as we traverse multiple whitepapers and agentic memory ideas like a ferret on Adderall. Some rabbit holes start with a GitHub link. Someone drops it in social posts on Facebook/Reddit/Discord. No context, just the URL to Github and a single line: Someone just built AGI! Wow! The repo was called hyperspaceai/agi. The name alone should have been a warning. I clicked it anyway because I was curious, of course. As I delved deeper into the github vibe code abyss, I could see the attraction: a new frontier of swarm bot peer-to-peer networks with the ability to earn base 10 points per epoch of confirmation and crypto tokenomics baked in. Playstation does have something similar created awhile back called Folding@Home—for the PS3 and PCs: https://en.wikipedia.org/wiki/Folding@home — is a distributed computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements of proteins, and is reliant on simulations run on volunteers’ personal computers. If you like to view one of the first actual swarm bots whitepapers: The term “Swarm-bot” originally refers to the landmark 2000–2005 European Union-funded SWARM-BOTS project, coordinated by Marco Dorigo, which successfully created a physical peer-to-peer network of autonomous mobile robots called s-bots. These s-bots connected physically and coordinated via peer-to-peer local sensing. https://www.sciencedirect.com/science/article/abs/pii/S0921889005001478 The AGI That Wasn’t Hyperspace describes itself as the first distributed AGI system. 660 agents. 27,000 experiments. A peer-reviewed research pipeline running autonomously across a P2P network. The marketing is excellent and captivating, guaranteed to attract lemmings like flies to juicy GitHub stars. The actual results are a different story. The swarm’s biggest published discovery — the finding that propagated to 23 agents within hours via gossip protocol, the one they highlight as proof the system works — was Kaiming initialization. Kaiming init has been in the PyTorch standard library since 2015. It’s covered in week two of every deep learning course. Kaiming He published the paper eleven years ago. A grad student with a coffee and an afternoon would have found it faster. https://arxiv.org/pdf/1502.01852 The infrastructure underneath is genuinely impressive. DiLoCo gradient compression, libp2p gossip, CRDT leaderboards, 32 anonymous nodes completing a collaborative training run in 24 hours. The plumbing is real. I don’t want to dismiss that. But AGI? No. What they built is a parallel random search engine with a shared high score table and excellent branding. To understand why, you need to understand how the gradient compression actually works — because it’s the most technically interesting part, and it’s completely separate from the intelligence problem. The Tech That Actually Works: DiLoCo and Gradient Compression Standard distributed training requires every GPU to synchronise gradients after every forward/backward pass. Every node waits for every other node. This works in a data centre on InfiniBand. It falls apart completely over the internet — latency is too high, bandwidth too variable. DiLoCo (Decoupled Local Communication, Google DeepMind 2023) solves this differently. Instead of syncing every step, each node trains independently for many steps — called “inner steps” — then syncs once. The “delta” being sent is just the net drift: weights_after - weights_before. Node A: train 100 steps locally → share deltaNode B: train 100 steps locally → share deltaNode C: train 100 steps locally → share delta ↓ average the deltas (outer step) ↓ all nodes update → repeat But even one sync of a model’s full weight delta is massive. A 500M parameter model is roughly 2GB of float32 deltas. Over the internet, per round, that’s unusable. So Hyperspace stacks two compression techniques on top: SparseLoCo — top-k sparsity. Only send the largest-magnitude weight updates. Most parameter updates are near-zero noise. The high-magnitude updates carry the actual learning signal. Full delta: [0.001, -0.0003, 0.89, 0.0001, -0.76, ...]Top-2% only: [ 0, 0, 0.89, 0, -0.76, ...] → send as sparse {index: value} pairs Parcae — layer pooling. Group adjacent transformer layers into blocks of 6, average their gradients before taking top-k. Adjacent layers learn correlated things. Averaging before sparsification means a more stable top-k mask. The combined result: 195× compression. 5.5MB per round instead of roughly 1GB. DiLoCo: sync every N steps not every step → ~100× less frequentSparseLoCo: top-2% of delta values only → 45× smaller payloadParcae: pool layers before sparsification → 6× additional reductionTotal: 195× This is real and impressive. The problem is that none of it has anything to do with intelligence. It’s bandwidth optimisation. The agents communicating through this pipe are still completely amnesiac. Why the Swarm Is Basic: The Architecture Problem Here is the agents’ complete intelligence loop. Every agent. All 660 of them. Every one of the 27,000 experiments: 1. read current leaderboard (what&#39;s the best score?)2. read last 5 experiment results from shared branch3. prompt LLM: &#34;given these results, generate hypothesis&#34;4. run experiment5. record result6. gossip to peers7. goto 1 The LLM’s context window is the memory. When the session resets, everything resets. There is no persistence. There is no structure. There is no causal understanding of why anything worked. Hyperspace stores: &#34;run_047: threshold 0.30, score 0.67&#34; ← flat log Hyperspace does NOT store: why threshold 0.30 worked what it interacted with under what conditions it holds what failed before it So when the Kaiming init “discovery” happened, here is what actually occurred: the LLM generating hypotheses was trained on He et al. 2015. The prompt included “try to improve initialization.” The model recalled Kaiming from pretraining weights. An agent ran the experiment. It worked. The score updated. 23 agents adopted it via gossip. Not emergence. Not intelligence. Retrieval from a pretrained model, dressed up as swarm discovery. The plateau problem is the proof. Every RSI paper — Gödel Agent, Darwin Gödel Machine, Reflexion, STOP — hits the same wall: iterations 1-10: big gains [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*ZYvHXZwBJSxTscyKSpQGBg.jpeg" medium="image"></media:content>
            	</item>
		<item>
		<title>Why Your Sales Forecast Is Always 20% Wrong (And How To Make It 12% Wrong)</title>
		<link>https://towardsai.net/p/machine-learning/why-your-sales-forecast-is-always-20-wrong-and-how-to-make-it-12-wrong</link>
		
		<dc:creator><![CDATA[Kamrun Nahar]]></dc:creator>
		<pubDate>Mon, 18 May 2026 09:23:58 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/?p=51197</guid>

					<description><![CDATA[Author(s): Kamrun Nahar Originally published on Towards AI. Real World Sales Forecasting Playbook The Single Most Useful Picture I Have Ever Seen The thing that changed how I worked was not a model. It was a 2&#215;2 grid. Two professors put it on paper in 2005. The grid has saved me from picking the wrong model approximately 600 times. The Forecastability Map. Every product on your shelf belongs to one of these four boxes. Pick the wrong box, pick the wrong model. This is the Syntetos-Boylan classification. Two axes. Two numbers per SKU. The x-axis is the Average Demand Interval (ADI). Take all the time periods in your history. Count how many had at least one sale. Now divide the total number of periods by that. If you sell something every day, ADI is 1. If you sell it every other day on average, ADI is 2. The bigger ADI gets, the rarer the SKU. The y-axis is the squared coefficient of variation (CV²) of the non-zero demand sizes. Take only the periods where you sold something. Compute the standard deviation of the quantities. Divide by the mean. Square the result. This tells you how variable the size of demand is when it does happen. The thresholds are 1.32 for ADI and 0.49 for CV². Why those exact numbers. They came from empirical analysis of real industrial data. The math is in a 2005 paper. Don’t go chasing the original PDF, the boundaries are good enough as rules of thumb. Four boxes pop out of this. SMOOTH &#x007C; INTERMITTENT &#x007C;Frequent. &#x007C; Rare.Steady. &#x007C; Steady when it happens. &#x007C;------------&#x007C;------------ &#x007C;ERRATIC &#x007C; LUMPY &#x007C;Frequent. &#x007C; Rare AND variable.Variable. &#x007C; The nightmare. Every product on your shelf needs a different forecasting approach. Treating them all the same is why your one big model fails on the long tail. Let me walk you through each box. Pour yourself something. We’ll be a while. Box One. Smooth. This is the dream. Sells every day. Roughly the same quantity. The textbook stuff. A small grocery store sells milk like this. So does a hospital pharmacy with basic painkillers. So does a power company billing residential customers. Frequent. Steady. The data scientist’s friend. For smooth demand, almost any classical method works. AutoARIMA. Exponential Smoothing (ETS). A simple regression with calendar features and a couple of lags. The fancy methods barely beat the simple methods. You will see WAPE around 8 to 15 percent and you will look like a magician. The model will not save you. The data is already easy. The mistake people make on smooth data is over-engineering. They reach for an LSTM. They tune hyperparameters for two weeks. They get a 0.4 percent improvement and call a meeting. import pandas as pdfrom statsforecast import StatsForecastfrom statsforecast.models import AutoETS, AutoARIMA, SeasonalNaivedf = pd.read_csv(&#34;smooth_skus.csv&#34;, parse_dates=[&#34;ds&#34;]) # cols unique_id ds ysf = StatsForecast( models=[AutoETS(season_length=7), AutoARIMA(season_length=7), SeasonalNaive(season_length=7)], freq=&#34;D&#34;, n_jobs=-1) # daily data, 7-day weekly cycle, all coressf.fit(df) # fits one of each model per unique_idfcst = sf.predict(h=28) # 4 weeks outprint(fcst.head()) # check it. should look boring. boring is good. Line by line, because the Reddit poster asked for thoroughness. import pandas as pd. Pandas is the spreadsheet library every Python data person uses. The as pd is a community nickname so we never have to type the long name. from statsforecast import StatsForecast. The orchestrator class from Nixtla&#39;s library. It is fast because it parallelizes across SKUs without you having to write a loop. from statsforecast.models import AutoETS, AutoARIMA, SeasonalNaive. Three classical models. AutoETS picks the best exponential smoothing variant for you. AutoARIMA does the (p,d,q) search you used to do by hand at 2 AM. SeasonalNaive is the dumb baseline that says &#34;last week&#39;s same weekday.&#34; Always include the dumb baseline. df = pd.read_csv(...). Reads the CSV. The library expects three columns. unique_id (which SKU), ds (which date), y (how many sold). StatsForecast(models=[...], freq=&#34;D&#34;, n_jobs=-1). We instantiate. freq=&#34;D&#34; is daily. n_jobs=-1 means &#34;use every CPU.&#34; The library is genuinely fast. sf.fit(df). Behind the scenes, it groups by unique_id and fits each model to each SKU&#39;s history. No loop. You&#39;re welcome. fcst = sf.predict(h=28). Predict 28 days ahead. Each model produces its own column. print(fcst.head()). Eyeball test. For a smooth SKU, the three forecasts should agree closely. If they wildly disagree, your data isn&#39;t actually smooth, and you&#39;re in the wrong box. Why this matters. If your SKU is genuinely smooth, this snippet is your whole pipeline. You don’t need LightGBM. You don’t need Prophet. You don’t need a paper from NeurIPS. The smooth demand dream. Live it while you can. Box Two. Intermittent. Now it gets interesting. A roofing supply store sells a particular size of slate tile maybe four times a month. When it sells, it sells two or three boxes at a time. Always two or three. Never twenty. Never zero point seven. Frequency is low. Quantity is consistent. For demand that is rare but steady-quantity, the classical models go quiet. ARIMA wants regular data. ETS wants a trend or a season. There isn’t one. There are just zeros, interrupted by a normal number, then more zeros. Enter John Croston. 1972. Yorkshire. He probably had a slide rule. He had a brilliant idea. Forecast two things separately. How big an order is when it happens. How often orders happen. Divide the first by the second. That’s the whole method. From Watergate-era. And it still beats every neural network on sparse data most of the time. Most of the time. We will get to the exceptions. Croston’s method has a known problem. It is positively biased. It tends to over-forecast. The smoothing parameter beta makes it worse. In 2005, the same Syntetos and Boylan who made the quadrant figured out a fix. Multiply by (1 - alpha/2). That&#39;s it. That fix is now called the Syntetos-Boylan Approximation (SBA) and it is the default people should be using. There is a further variant called TSB (Teunter-Syntetos-Babai). This one solves a different problem. Croston only updates [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*GqSwick9_tA7NOx6fhAekg.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Genetic Cubic n{C/A} Ratios For Elementary Robotics Design</title>
		<link>https://towardsai.net/p/machine-learning/genetic-cubic-nc-a-ratios-for-elementary-robotics-design</link>
		
		<dc:creator><![CDATA[Greg Oliver]]></dc:creator>
		<pubDate>Fri, 15 May 2026 04:13:21 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/genetic-cubic-nc-a-ratios-for-elementary-robotics-design</guid>

					<description><![CDATA[Last Updated on May 15, 2026 by Editorial Team Author(s): Greg Oliver Originally published on Towards AI. Architectural Cubic n{C/A} Ratios and Easy Shifts to Aid Robotics Design This post provides a toolbox of genetic Cubic coefficient ratios n{C/A} and n{C} ratios in Header Graph 1 applied to a depressed Cubic y=Ax³ — Cx+0 in black with Roots, Tp’s and in green the Sum Of gradients = — 3C at all possible 3 real roots (between Tp(y)’s) as presented in my recent post; Designing Polynomials Using Sum of Gradients at the Roots. Header Graph 1 Coefficient Ratios n{C/A}This article discusses genetic Cubic coefficient ratios essential for robotics design, providing efficient formulas to manipulate and shift depressed cubic functions within a coordinate system, thus aiding in various robotic applications, including movement and control mechanisms. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*ExMYIzXHXOAFGFBXSF1e_g.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Top 20 AdaBoost Interview Questions &#038; Answers (Part 2 of 2)</title>
		<link>https://towardsai.net/p/machine-learning/top-20-adaboost-interview-questions-answers-part-2-of-2</link>
		
		<dc:creator><![CDATA[Shahidullah Kawsar]]></dc:creator>
		<pubDate>Fri, 15 May 2026 02:01:00 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/top-20-adaboost-interview-questions-answers-part-2-of-2</guid>

					<description><![CDATA[Last Updated on May 15, 2026 by Editorial Team Author(s): Shahidullah Kawsar Originally published on Towards AI. Data Scientist &#38; Machine Learning Interview Preparation Let’s check your basic knowledge of AdaBoost. Here are 10 Q&#38;A for your next interview. Source: This image is generated by ChatGPTThe article presents a collection of 20 interview questions and answers focused on AdaBoost, a popular machine learning algorithm. It covers various aspects of the algorithm, including its functionality, applications, and the significance of tuning parameters, while also addressing common misconceptions and the implications of model choices. Each question is answered in detail, helping candidates prepare effectively for technical interviews in the data science and machine learning fields. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*zuT50dyjjnJ1gpID3Hht6g.png" medium="image"></media:content>
            	</item>
		<item>
		<title>Agentic AI Vs AI Agents — What Are the Key Differences?</title>
		<link>https://towardsai.net/p/machine-learning/agentic-ai-vs-ai-agents-what-are-the-key-differences</link>
		
		<dc:creator><![CDATA[Davin Convay]]></dc:creator>
		<pubDate>Thu, 14 May 2026 23:01:00 +0000</pubDate>
				<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/agentic-ai-vs-ai-agents-what-are-the-key-differences</guid>

					<description><![CDATA[Last Updated on May 15, 2026 by Editorial Team Author(s): Davin Convay Originally published on Towards AI. There are a lot of new terms dominating the artificial intelligence world lately, “Agentic AI” and “AI agents” being two of them. Oftentimes, they’re being used interchangeably, but the two phrases have their own distinct meanings. Organizations that understand when to deploy AI agents versus agentic ai solutions will automate intelligently while others automate blindly. The revolution isn’t just about AI doing tasks; it’s about AI pursuing goals. That difference changes everything. In this blog, we explore agentic AI vs AI agents, what makes them different, and how they will change the way we work. What is an AI Agent? An AI agent is a software program designed to perform specific tasks on behalf of users, responding to inputs with predetermined or learned behaviors. Think of AI agents as sophisticated digital assistants that excel at defined functions within established parameters. They perceive their environment through inputs, process information using programmed logic or trained models, and execute actions to achieve specific outcomes. The term “agent” implies agency, but AI agents possess limited autonomy. They operate within boundaries, following scripts, rules, or patterns learned from training data. A customer service chatbot represents a classic AI agent: it interprets queries, searches knowledge bases, and provides responses, but cannot independently decide to redesign the customer experience or proactively reach out to at-risk customers. AI agents have evolved significantly from simple rule-based systems. Modern AI agents leverage machine learning, natural language processing, and sophisticated decision trees to handle complex interactions. They can learn from experience, improving responses over time. Yet they remain fundamentally reactive, task-oriented tools waiting for activation rather than independently pursuing objectives. Examples of AI agents permeate our digital lives: ‍Chatbots and Virtual Assistants: From Siri to enterprise customer service bots, these agents respond to queries and execute simple commands. They parse language, match intents, and deliver programmed responses.‍‍ Recommendation Engines: Netflix’s content suggestions and Amazon’s product recommendations are AI agents analyzing behavior patterns to predict preferences. They excel at pattern matching but don’t independently decide to revolutionize recommendation strategies.‍‍ Robotic Process Automation (RPA) Bots: These agents automate repetitive tasks like data entry, form processing, and report generation. They follow defined workflows efficiently but cannot reimagine business processes.‍‍ Trading Bots: Algorithmic trading agents execute trades based on market signals and predetermined strategies. They react quickly to market conditions but don’t independently develop new trading philosophies.‍‍ Email Filters: Spam detection agents classify messages using learned patterns. They improve accuracy through feedback but don’t autonomously investigate new spam techniques.‍ What unites these AI agents is their fundamental characteristic: they are tools wielded by humans rather than autonomous collaborators. They augment human capabilities within defined scopes but don’t independently identify problems to solve or goals to pursue. Different Categories of AI Agents Understanding AI agent categories helps clarify why not all agents are agentic. Each category serves specific purposes, with distinct capabilities and limitations that determine their appropriate applications. Reactive Agents Reactive agents represent the simplest form, responding directly to current stimuli without memory or planning. They excel at immediate response scenarios where historical context is irrelevant. ‍Characteristics: No internal state, immediate stimulus-response, consistent behavior for identical inputs.‍‍ Examples: Basic chatbots with scripted responses, simple email autoresponders, rule-based alert systems.‍‍ Limitations: Cannot learn from experience, no context awareness, fails with complex multi-step tasks.‍‍ Use Cases: FAQ responses, simple notifications, basic data validation.‍ Proactive Agents Proactive agents anticipate needs and initiate actions without explicit user commands. They monitor conditions and trigger responses when specific criteria are met. ‍Characteristics: Environmental monitoring, threshold-based activation, predictive capabilities.‍‍ Examples: Predictive maintenance systems, inventory reorder agents, calendar scheduling assistants.‍‍ Strengths: Reduces human oversight, prevents problems before they occur, improves efficiency.‍‍ Limitations: Operates within predefined parameters, cannot adapt strategies autonomously.‍ Hybrid Agents Hybrid agents combine reactive and proactive behaviors, switching modes based on context. They respond to requests while also initiating beneficial actions. ‍Characteristics: Dual-mode operation, context-sensitive behavior, balanced autonomy.‍‍ Examples: Modern virtual assistants like Google Assistant, enterprise monitoring systems, smart home controllers.‍‍ Advantages: Versatile application, user-friendly interaction, efficient resource utilization.‍‍ Challenges: Complex design, mode-switching logic, user expectation management.‍ Specialized vs Generalist Agents The specialization spectrum determines an agent’s breadth versus depth of capabilities. ‍Specialized Agents: Excel at specific tasks with deep expertise. Example: Medical diagnosis agents trained on radiology images.‍‍ Generalist Agents: Handle diverse tasks with moderate proficiency. Example: GPT-based assistants answering various queries.‍‍ Trade-offs: Specialists offer superior performance in narrow domains. Generalists provide flexibility across multiple applications.‍ Multi-Agent Systems Multi-agent systems coordinate multiple specialized agents to achieve complex objectives. Each agent handles specific sub-tasks while communicating with others. ‍Architecture: Distributed intelligence, inter-agent communication protocols, coordinated goal pursuit.‍‍ Examples: Supply chain optimization systems, smart grid management, autonomous vehicle fleets.‍‍ Benefits: Scalability, fault tolerance, parallel processing, emergent intelligence.‍‍ Complexities: Coordination overhead, conflict resolution, communication bottlenecks.‍ Learning Agents Learning agents improve performance through experience, adapting behaviors based on feedback and outcomes. ‍Learning Mechanisms: Supervised learning from labeled data, reinforcement learning from rewards, unsupervised pattern discovery.‍‍ Examples: Recommendation systems, fraud detection agents, game-playing AI.‍‍ Evolution: From simple parameter adjustment to complex strategy development.‍‍ Limitations: Requires quality training data, can learn biases, may overfit to specific scenarios.‍ Autonomous Agents Autonomous agents operate independently within defined parameters, making decisions without human intervention. ‍Autonomy Levels: From simple script execution to complex decision-making within boundaries.‍‍ Examples: Autonomous testing bots, robotic process automation, industrial control systems.‍‍ Requirements: Robust error handling, safety constraints, performance monitoring.‍‍ Distinction: Autonomous operation doesn’t equal agentic AI; autonomy can exist without goal-setting capability.‍What is Agentic AI? Agentic AI represents a fundamental leap beyond traditional AI agents: artificial intelligence systems capable of independent goal formulation, strategic planning, and autonomous pursuit of objectives without constant human direction. While AI agents execute tasks, agentic AI owns outcomes. This distinction transforms AI from a tool into a collaborator, from an assistant into a strategic partner. The “agentic” qualifier signifies genuine agency: the capacity to act independently based on internal goals rather than [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*tfVoCqUOoXiX11sTl1FNpg.jpeg" medium="image"></media:content>
            	</item>
		<item>
		<title>LAI #127: The Infrastructure Layer of AI Is Becoming the Product</title>
		<link>https://towardsai.net/p/machine-learning/lai-127-the-infrastructure-layer-of-ai-is-becoming-the-product</link>
		
		<dc:creator><![CDATA[Towards AI Editorial Team]]></dc:creator>
		<pubDate>Thu, 14 May 2026 19:01:01 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Latest]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Towards AI - Medium]]></category>
		<guid isPermaLink="false">https://towardsai.net/p/artificial-intelligence/lai-127-the-infrastructure-layer-of-ai-is-becoming-the-product</guid>

					<description><![CDATA[Last Updated on May 15, 2026 by Editorial Team Author(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts! This week, we’re looking at the shift from “AI demos” to real systems: agents that need reliable execution, enterprises building durable AI infrastructure, and architectures that survive production constraints. We also cover: A 1-hour practical walkthrough of modern AI engineering, from prompting and RAG to agents, evaluation, and deployment, plus a production lesson on why agent retries quietly break real systems. Why recursive multi-agent systems may depend less on “more agents” and more on how agents communicate internally. How enterprises are turning years of operational complexity into an advantage in the emerging “harness era” of AI. A practical guide to deploying production-ready agents on Google Cloud using Agents CLI. Why modern AI architecture evolved layer by layer, from LLMs to RAG, agents, and MCP, in response to real system failures. Let’s get into it! What’s AI Weekly This week in What’s AI, I’m sharing something we normally only do for enterprise teams: a 1-hour deep dive into the foundations of AI engineering you need to know in 2026. We go through AI theory without the math, cover the real limitations of current LLMs, and walk through the production techniques such as prompting, context engineering, RAG, agents, fine-tuning, evaluation, and deployment. If you’re building with LLMs or planning to, this is the starting point I wish had existed when I began. Watch the full video on YouTube. AI Tip of the Day Agent tool call retries are helpful when a model request times out, a tool fails, or the system loses connection. But retries can cause serious problems if the agent repeats the same action. It might send the same email twice, issue two refunds, create duplicate support tickets, or rerun the same payment step. Checking the tool arguments is not enough. The arguments can be valid, but the action may have already happened. Give each tool action a unique ID that connects to the user request and the action being taken. Save the action status before running it. Then, before the tool runs again, check whether that same action has already finished. For external APIs, use an idempotency key when they support one. For your own database writes, add a uniqueness rule so the same action cannot be saved twice. If you’re building agentic LLM applications and want to go deeper into tool use, guardrails, and production architecture, check out our Agentic AI Engineering course. — Louis-François Bouchard, Towards AI Co-founder &#38; Head of Community Learn AI Together Community Section! Featured Community post from the Discord _creepycactus built OpenEar, a Mac dictation app. It hears you when you speak, records your meetings, and remembers every word. It runs on your chip, not the cloud, and doesn’t store any information. It is great for long prompts, meetings, voice journaling, or brain dumps. Check it out here and support a fellow community member. If you have any questions, ask in the thread! Collaboration Opportunities The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week! 1. Lucazsh is building a social media app and is looking for a frontend designer or app designer to improve the UX/UI. If this sounds like something you would enjoy working on, connect with them in the thread! 2. Muneebbaig. wants to dive deeper into ML, LLMs, and open-source AI research and produce one or two papers based on it. If you want to spend time on research projects or build your own, reach out to them in the thread! 3. Beratgurleer is working on n8n growth systems focused on lead conversion solutions and is looking for partners who can help with the technical side. If you want to enter the space and build something together, contact them in the thread! Meme of the week! Meme shared by bin4ry_d3struct0r TAI Curated Section Article of the week Groundbreaking Latent State Recursive Multi-Agent Systems is 2.4x Faster Uses 75.6% Cheaper By Mandar Karhade, MD. PhD. This article walks you through the paper ‘Recursive Multi-Agent Systems’ that bundles two ideas: passing latent hidden states between agents instead of text, and running agents in iterative critique loops. Recursive loops are well-established since Self-Refine and Reflexion in 2023. The latent channel is the actual contribution. Text-based recursion plateaus or regresses by round three because agents commit uncertainty to words; latent recursion keeps improving. The paper’s own data shows the communication channel, not loop depth, is where multi-agent accuracy stops climbing. Our must-read articles 1. Designing LLM Pipelines for Clinical Data: A Pattern for ALCOA++ and 21 CFR Part 11 Compliance By Pranav Nandan Shipping LLM features into regulated clinical workflows reveals a recurring architectural failure: the prototype works, but it can’t answer where the audit trail is, why outputs have changed, or who is accountable. The article outlines a five-layer pipeline treating the LLM as a lossy parser, using constrained decoding to physically prevent hallucinations and deterministic Python for all logic and computation. A conditional judge LLM fires on only 15% of records, and ALCOA++ and 21 CFR Part 11 compliance emerge from the architecture. 2. Harness: The Era Enterprises Were Built For By Fabio Yáñez Romero The era of prompt engineering favored lean, fast-moving teams who could ship on instinct. The harness era inverts that advantage. The article traces the arc from model weights through context engineering to the harness, a persistent runtime built on externalized memory, reusable skills, and machine-readable protocols. Enterprises that spent decades documenting procedures, governing data, and stabilizing interfaces now hold exactly the right raw material. The model becomes swappable; the harness becomes the durable intelligence layer the company owns outright. 3. How to Build and Deploy AI Agents on Google [&#8230;]]]></description>
		
		
		
		<media:content url="https://miro.medium.com/v2/resize:fit:700/1*tqRpZDXjkLVd9dE9Y4bTOw.png" medium="image"></media:content>
            	</item>
	</channel>
</rss>
